From tomerfiliba at gmail.com Tue Aug 1 19:27:58 2006 From: tomerfiliba at gmail.com (tomer filiba) Date: Tue, 1 Aug 2006 19:27:58 +0200 Subject: [Python-3000] gettype In-Reply-To: <44CC6A3E.8000003@v.loewis.de> References: <1d85506f0607061119w1c3cab60o6f762a8e3849e45c@mail.gmail.com> <44CC6A3E.8000003@v.loewis.de> Message-ID: <1d85506f0608011027v4402f905ge6bc18e25ef0aa9e@mail.gmail.com> that's surly anachronism :) o.__class__ is a little more typing and will surely scare newbies. moreover, type(x) and x.__class__ can return different things (you can fool __class__, but not type()). for my part, i'm fine with any form that makes a distinction between the metaclass "type" and the inquire-type "type". call it o.__class__, gettype() or typeof(), just don't mix that with the metaclass -tomer On 7/30/06, "Martin v. L?wis" wrote: > tomer filiba schrieb: > > so why not choose the "get%s()" notation? > > Why not o.__class__? > > Regards, > Martin > From talin at acm.org Wed Aug 2 04:29:51 2006 From: talin at acm.org (Talin) Date: Tue, 01 Aug 2006 19:29:51 -0700 Subject: [Python-3000] gettype In-Reply-To: <1d85506f0608011027v4402f905ge6bc18e25ef0aa9e@mail.gmail.com> References: <1d85506f0607061119w1c3cab60o6f762a8e3849e45c@mail.gmail.com> <44CC6A3E.8000003@v.loewis.de> <1d85506f0608011027v4402f905ge6bc18e25ef0aa9e@mail.gmail.com> Message-ID: <44D00E1F.2040209@acm.org> tomer filiba wrote: > that's surly anachronism :) > > o.__class__ is a little more typing and will surely scare newbies. > moreover, type(x) and x.__class__ can return different things > (you can fool __class__, but not type()). > > for my part, i'm fine with any form that makes a distinction between > the metaclass "type" and the inquire-type "type". > call it o.__class__, gettype() or typeof(), just don't mix that with > the metaclass From a code style perspective, I've always felt that the magical __underscore__ names should not be referred to ouside of the class implementing those names. The double underscores are an indication that this method or property is in most normal use cases referred to implicitly by use rather than explicitly by name; Thus str() invokes __str__ and so on. -- Talin From jack at psynchronous.com Wed Aug 2 05:14:37 2006 From: jack at psynchronous.com (Jack Diederich) Date: Tue, 1 Aug 2006 23:14:37 -0400 Subject: [Python-3000] gettype In-Reply-To: <44D00E1F.2040209@acm.org> References: <1d85506f0607061119w1c3cab60o6f762a8e3849e45c@mail.gmail.com> <44CC6A3E.8000003@v.loewis.de> <1d85506f0608011027v4402f905ge6bc18e25ef0aa9e@mail.gmail.com> <44D00E1F.2040209@acm.org> Message-ID: <20060802031437.GJ25353@performancedrivers.com> On Tue, Aug 01, 2006 at 07:29:51PM -0700, Talin wrote: > tomer filiba wrote: > > that's surly anachronism :) > > > > o.__class__ is a little more typing and will surely scare newbies. > > moreover, type(x) and x.__class__ can return different things > > (you can fool __class__, but not type()). > > > > for my part, i'm fine with any form that makes a distinction between > > the metaclass "type" and the inquire-type "type". > > call it o.__class__, gettype() or typeof(), just don't mix that with > > the metaclass > > From a code style perspective, I've always felt that the magical > __underscore__ names should not be referred to ouside of the class > implementing those names. The double underscores are an indication that > this method or property is in most normal use cases referred to > implicitly by use rather than explicitly by name; Thus str() invokes > __str__ and so on. The paired double underscores indicate that the function is special to the instance's class. C++ converts understand this just fine until you mention that classes are themselves instances at which point the grey matter takes a while to settle again [guilty]. After that reshuffling you are again assaulted because the stack stops. The class of a class is a type but the class of a class of a class is still a type. Turtles all the way down. See the recent thread on python-checkins for some discussion on why "isinstance(ob, type(type))" isn't just legal -- it's backwards compatible! -Jack From ark-mlist at att.net Wed Aug 2 06:56:18 2006 From: ark-mlist at att.net (Andrew Koenig) Date: Wed, 2 Aug 2006 00:56:18 -0400 Subject: [Python-3000] gettype In-Reply-To: <44D00E1F.2040209@acm.org> Message-ID: <001001c6b5ef$fdd258f0$6402a8c0@arkdesktop> > From a code style perspective, I've always felt that the magical > __underscore__ names should not be referred to ouside of the class > implementing those names. The double underscores are an indication that > this method or property is in most normal use cases referred to > implicitly by use rather than explicitly by name; Thus str() invokes > __str__ and so on. Haven't we seen this argument somewhere before? :-) (needless to say, I'm in agreement with it in this context too) From ncoghlan at iinet.net.au Thu Aug 3 14:58:44 2006 From: ncoghlan at iinet.net.au (Nick Coghlan) Date: Thu, 03 Aug 2006 22:58:44 +1000 Subject: [Python-3000] Rounding in Py3k Message-ID: <44D1F304.4020700@iinet.net.au> Some musings inspired by the rounding discussion on python-dev. The Decimal module provides all of the rounding modes from the general decimal arithmetic specification [1]. Both Decimal rounding methods (quantize() and to_integral()) return Decimal instances - a subsequent explicit conversion to int() is needed if you want a real integer (just like the builtin round()). Normal floats, OTOH, only have easy access to truncate (through int()) and round-half-up (through round()). Additionally, the Decimal 'quantize' method signature is fine if you have decimal literals, but not so good for Python where you have to write "n.quantize(d('1e-2'))" to round to two decimal places. The implicit Decimal->float conversion also allows Decimals to be rounded with the round() builtin, but that can lead to errors in rounding near the limits of floating point precision due to the use of an imprecise conversion in Decimal.__float__(): >>> n = (1 + d("5e-16")) >>> n Decimal("1.0000000000000005") >>> float(n.quantize(d('1e-15'))) 1.0 >>> round(n, 15) 1.0000000000000011 Would it be worthwhile to design a common rounding mechanism that can be used to cleanly round values to the built in floating point type, as well as being able to access the different rounding modes for decimal instances? For example, replace the builtin function round() with a non-instantiable class like the following: _TEN = decimal.Decimal(10) class round(object): @staticmethod def half_up(num, ndigits=0): if isinstance(num, decimal.Decimal): return float(num.quantize(_TEN**(-ndigits)), rounding = decimal.ROUND_HALF_UP) return float(num)._round_half_up() __call__ = half_up @staticmethod def down(num, ndigits=0): if isinstance(num, decimal.Decimal): return float(num.quantize(_TEN**(-ndigits)), rounding = decimal.ROUND_DOWN) return float(num)._round_down() # etc for the other 5 rounding modes Cheers, Nick. [1] The 7 decimal rounding modes: round-down (truncate; round towards 0) round-half-up (school rounding) round-half-even (bankers' rounding) round-ceiling (round towards positive infinity) round-floor (round towards negative infinity) round-half-down (WTF rounding :) round-up (round away from zero) -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From greg.ewing at canterbury.ac.nz Fri Aug 4 03:51:19 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 04 Aug 2006 13:51:19 +1200 Subject: [Python-3000] Rounding in Py3k In-Reply-To: <44D1F304.4020700@iinet.net.au> References: <44D1F304.4020700@iinet.net.au> Message-ID: <44D2A817.8040303@canterbury.ac.nz> Nick Coghlan wrote: > The implicit Decimal->float conversion Hang on, I thought there weren't supposed to be any implicit conversions between Decimal and float. -- Greg From greg.ewing at canterbury.ac.nz Fri Aug 4 03:51:25 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 04 Aug 2006 13:51:25 +1200 Subject: [Python-3000] Rounding in Py3k In-Reply-To: <44D1F304.4020700@iinet.net.au> References: <44D1F304.4020700@iinet.net.au> Message-ID: <44D2A81D.2050204@canterbury.ac.nz> Nick Coghlan wrote: > Would it be worthwhile to design a common rounding mechanism that can be used > to cleanly round values to the built in floating point type, as well as being > able to access the different rounding modes for decimal instances? Sounds like a job for a new protocol, such as __round__(self, mode, places). -- Greg From rrr at ronadam.com Fri Aug 4 07:33:01 2006 From: rrr at ronadam.com (Ron Adam) Date: Fri, 04 Aug 2006 00:33:01 -0500 Subject: [Python-3000] Rounding in Py3k In-Reply-To: <44D2A81D.2050204@canterbury.ac.nz> References: <44D1F304.4020700@iinet.net.au> <44D2A81D.2050204@canterbury.ac.nz> Message-ID: Greg Ewing wrote: > Nick Coghlan wrote: > >> Would it be worthwhile to design a common rounding mechanism that can be used >> to cleanly round values to the built in floating point type, as well as being >> able to access the different rounding modes for decimal instances? > > Sounds like a job for a new protocol, such as __round__(self, mode, places). > > -- > Greg Yes I agree. And viewing this in the larger sense of how it works with all numeric types is better than just sticking a function into the math module I think. (Although that might end up the way to do it.) Nicks proposal adds a private method to each of the types for each mode, which I think clutters things up a bit, but his method does create a single interface to them which is nice. I'm still not sure why "__round__" should be preferred in place of "round" as a method name. There isn't an operator associated to rounding so wouldn't the method name not have underscores? I think rounding any type should return that same type. For example: def round(n, places, mode='half-down'): return n.round(places, mode) round(i, 2) -> integer, unchanged value round(i) -> integer, precision == 0 round(i, -2) -> integer round(f, 2) -> float round(f) -> float, precision == 0 round(f, -2) -> float round(d, 2) -> decimal round(d) -> decimal, precision == max (*) round(d, -2) -> decimal (*) The default decimal rounding behavior is not the same as the default builtin round behavior. Should one be changed to match the other? Calling the desired types method directly could be a good way to handle getting an integer when a float is given. It's explicit. int.round(f, 2) -> integer int.round(f) -> integer int.round(f -2) -> integer Or if you prefer... int.__round__(f) Having modes seems to me to be the best way not to clutter the namespace although sometimes that seems like it's not an issue, and sometimes it seems like it is. Here's the list of java rounding modes for comparison. It's nearly identical to the ones in Decimal. http://java.sun.com/j2se/1.5.0/docs/api/java/math/RoundingMode.html Cheers, Ron From greg.ewing at canterbury.ac.nz Fri Aug 4 11:24:26 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 04 Aug 2006 21:24:26 +1200 Subject: [Python-3000] Rounding in Py3k In-Reply-To: References: <44D1F304.4020700@iinet.net.au> <44D2A81D.2050204@canterbury.ac.nz> Message-ID: <44D3124A.6010300@canterbury.ac.nz> Ron Adam wrote: > I'm still not sure why "__round__" should be preferred in place of > "round" as a method name. There isn't an operator associated to > rounding so wouldn't the method name not have underscores? I was thinking there would be functions such as round(), trunc(), etc. that use __round__ to do their work. That's why I called it a protocol and not just a method. -- Greg From rrr at ronadam.com Fri Aug 4 12:46:42 2006 From: rrr at ronadam.com (Ron Adam) Date: Fri, 04 Aug 2006 05:46:42 -0500 Subject: [Python-3000] Rounding in Py3k In-Reply-To: <44D3124A.6010300@canterbury.ac.nz> References: <44D1F304.4020700@iinet.net.au> <44D2A81D.2050204@canterbury.ac.nz> <44D3124A.6010300@canterbury.ac.nz> Message-ID: Greg Ewing wrote: > Ron Adam wrote: > >> I'm still not sure why "__round__" should be preferred in place of >> "round" as a method name. There isn't an operator associated to >> rounding so wouldn't the method name not have underscores? > > I was thinking there would be functions such as round(), > trunc(), etc. that use __round__ to do their work. That's > why I called it a protocol and not just a method. > > -- > Greg I understood your point. :-) If you look at the methods in int, long, and float, there are no methods that do not have double underscores. While there are many that don't in unicode and string. There also are many methods in Decimal that do not use the double underscore naming convention. I am just curious why not in general for the builtin numeric types. The style guide says... > - __double_leading_and_trailing_underscore__: "magic" objects or > attributes that live in user-controlled namespaces. E.g. __init__, > __import__ or __file__. Never invent such names; only use them > as documented. So would __round__ interact with the interpreter in some "magic" way? I take "magic" to mean the interpreter calls the method directly at times without having python coded instructions to do so. Such as when we create an object from a class and __init__ gets called by the interpreter directly. The same goes for methods like __add__ and __repr__, etc... But that doesn't explain why int, long, and float, don't have other non-magic methods. I'm not attempting taking sides for or against either way, I just want to understand the reasons as it seems like by knowing that, the correct way to do it would be clear, instead of trying to wag the dog by the tail if you know what I mean. Cheers, Ron From tomerfiliba at gmail.com Fri Aug 4 17:36:40 2006 From: tomerfiliba at gmail.com (tomer filiba) Date: Fri, 4 Aug 2006 17:36:40 +0200 Subject: [Python-3000] improved threading in py3k Message-ID: <1d85506f0608040836g1ccd894ck6b4a7b0607e7cd36@mail.gmail.com> python's threading model seems too weak imo. i'm not talking about the GIL and the fact threads run one at a time -- i'm talking about the incompleteness of the API of thread module. once a thread is created, there is no way to kill it *externally*. which is a pity, since the thread must be "willing" to die, for example: def threadfunc(): while i_am_alive: .... i_am_alive = True thread.start_new_thread(threadfunc) i_am_alive = False but of course you can't trust all threads work this way. moreover, if the thread calls an internal function that blocks but doesn't check i_am_alive, it will never exit. not to mention messing around with globals, etc. the proposed solution is introducing thread.kill, for example: >>> import time >>> import thread >>> thread.start_new_thread(time.sleep, (10,)) 476 >>> thread.kill(476) thread.kill() would raise the ThreadExit exception at the context of the given thread, which, unless caught, causes the thread to exit silently. if it is the last thread of the process, ThreadExit is equivalent to SystemExit. another issue is sys.exit()/SystemExit -- suppose a thread wants to cause the interpreter to exit. calling sys.exit in any thread but the main one will simply kill the *calling* thread. the only way around it is calling os.abort or os._exit(*)... but these functions do not perform cleanups. i would suggest raising SystemExit at the context of any thread, when the exception is not caught, will re-raise the exception at the context of the main thread, where it can be re-caught or the interpreter would exit. and of course, once the functionality of the thread module is extended, the threading module must be extended to support it as well. - - - - (*) about os._exit -- how about introducing os.exit, which would serve as the "nicer" version of os._exit? os.exit would kill the process in the same way SystemExit kills it (performing cleanups and all). in fact, the interpreter would just call os.exit() when catching SystemExit. it would also allow you to ensure the interpreter is killed, as SystemExit can be caught by external code against your will. -tomer From jcarlson at uci.edu Fri Aug 4 20:17:49 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Fri, 04 Aug 2006 11:17:49 -0700 Subject: [Python-3000] improved threading in py3k In-Reply-To: <1d85506f0608040836g1ccd894ck6b4a7b0607e7cd36@mail.gmail.com> References: <1d85506f0608040836g1ccd894ck6b4a7b0607e7cd36@mail.gmail.com> Message-ID: <20060804105349.E6C3.JCARLSON@uci.edu> "tomer filiba" wrote: > python's threading model seems too weak imo. i'm not talking about > the GIL and the fact threads run one at a time -- i'm talking about the > incompleteness of the API of thread module. I could have sworn that it could be implemented as a debugging trace function [1], but my tests [2] seem to imply that non-mainthread code doesn't actually have the trace function called. - Josiah [1] >>> import sys >>> import threading >>> >>> kill_these = {} >>> >>> def killthread(thread): ... kill_these[thread] = None ... >>> def trace(*args): ... del args ... if threading.currentThread() in kill_these: ... #pick some exception unlikely/impossible to catch ... raise MemoryError ... return trace ... >>> sys.settrace(trace) >>> def waster(): ... while 1: ... a = 1 ... b = 2 ... c = 3 ... >>> x = threading.Thread(target=waster) >>> x.start() >>> killthread(x) >>> kill_these {: None} >>> x in kill_these True >>> x in threading.enumerate() True >>> threading.enumerate() [, <_MainThread(MainThread, started)>] >>> [2] >>> import threading >>> import sys >>> seen = {} >>> def trace(*args): ... x = threading.currentThread() ... if x not in seen: ... print x ... seen[x] = None ... return trace ... >>> sys.settrace(trace) >>> def waster(): <_MainThread(MainThread, started)> ... while 1: ... a = 1 ... b = 2 ... c = 3 ... >>> x = threading.Thread(target=waster) >>> x.start() >>> This is in Python 2.4.3 on Windows. > - - - - > > (*) about os._exit -- how about introducing os.exit, which would serve > as the "nicer" version of os._exit? os.exit would kill the process in > the same way SystemExit kills it (performing cleanups and all). > in fact, the interpreter would just call os.exit() when catching SystemExit. Already exists as sys.exit() - Josiah From tomerfiliba at gmail.com Fri Aug 4 20:55:55 2006 From: tomerfiliba at gmail.com (tomer filiba) Date: Fri, 4 Aug 2006 20:55:55 +0200 Subject: [Python-3000] improved threading in py3k In-Reply-To: <20060804105349.E6C3.JCARLSON@uci.edu> References: <1d85506f0608040836g1ccd894ck6b4a7b0607e7cd36@mail.gmail.com> <20060804105349.E6C3.JCARLSON@uci.edu> Message-ID: <1d85506f0608041155rbf7b38egbae39f521a6f8a2a@mail.gmail.com> > [...] it could be implemented as a debugging trace function even if it could be, *why*? you can't really suggest that from now on, every multithreaded app must run in trace mode, right? it's a performance penalty for no good reason -- it's a question of API. just as the API lets you *create* threads, it should allow you to *kill* them, once you decide so. your code shouldn't rely on the "cooperativeness" of other functions (i.e., the thread does blocking IO using some external library, but you wish to stop it after some timeout, etc.). all i was talking about was adding a new function to the thread module, as well as a new builtin exception to completement it. it's no such a big change that you should work extra hours in inventing creative workarounds for. - - - - you said: > Already exists as sys.exit() but i said: >> it would also allow you to ensure the interpreter is killed, as SystemExit >> can be caught by external code against your will. please take the time to read my post before you reply. here is what i mean by "against your will": >>> import sys >>> >>> try: ... sys.exit() ... except: ... print "fooled you" ... fooled you >>> if my library raises SystemExit, but the user is not aware of that, he/she can block it [un]intentionally, causing undefined behavior in my library. os.exit() would really just perform cleanup and exit (not by the means of exceptions)... just like os._exit(), but not as crude. -tomer On 8/4/06, Josiah Carlson wrote: > > "tomer filiba" wrote: > > python's threading model seems too weak imo. i'm not talking about > > the GIL and the fact threads run one at a time -- i'm talking about the > > incompleteness of the API of thread module. > > I could have sworn that it could be implemented as a debugging trace > function [1], but my tests [2] seem to imply that non-mainthread code > doesn't actually have the trace function called. > > - Josiah > > [1] > > >>> import sys > >>> import threading > >>> > >>> kill_these = {} > >>> > >>> def killthread(thread): > ... kill_these[thread] = None > ... > >>> def trace(*args): > ... del args > ... if threading.currentThread() in kill_these: > ... #pick some exception unlikely/impossible to catch > ... raise MemoryError > ... return trace > ... > >>> sys.settrace(trace) > >>> def waster(): > ... while 1: > ... a = 1 > ... b = 2 > ... c = 3 > ... > >>> x = threading.Thread(target=waster) > >>> x.start() > >>> killthread(x) > >>> kill_these > {: None} > >>> x in kill_these > True > >>> x in threading.enumerate() > True > >>> threading.enumerate() > [, <_MainThread(MainThread, started)>] > >>> > > > [2] > >>> import threading > >>> import sys > >>> seen = {} > >>> def trace(*args): > ... x = threading.currentThread() > ... if x not in seen: > ... print x > ... seen[x] = None > ... return trace > ... > >>> sys.settrace(trace) > >>> def waster(): > <_MainThread(MainThread, started)> > ... while 1: > ... a = 1 > ... b = 2 > ... c = 3 > ... > >>> x = threading.Thread(target=waster) > >>> x.start() > >>> > > This is in Python 2.4.3 on Windows. > > > - - - - > > > > (*) about os._exit -- how about introducing os.exit, which would serve > > as the "nicer" version of os._exit? os.exit would kill the process in > > the same way SystemExit kills it (performing cleanups and all). > > in fact, the interpreter would just call os.exit() when catching SystemExit. > > Already exists as sys.exit() > > - Josiah > > From jcarlson at uci.edu Fri Aug 4 21:29:09 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Fri, 04 Aug 2006 12:29:09 -0700 Subject: [Python-3000] improved threading in py3k In-Reply-To: <1d85506f0608041155rbf7b38egbae39f521a6f8a2a@mail.gmail.com> References: <20060804105349.E6C3.JCARLSON@uci.edu> <1d85506f0608041155rbf7b38egbae39f521a6f8a2a@mail.gmail.com> Message-ID: <20060804121614.E6D4.JCARLSON@uci.edu> "tomer filiba" wrote: > > > [...] it could be implemented as a debugging trace function > > even if it could be, *why*? you can't really suggest that from now on, > every multithreaded app must run in trace mode, right? it's a performance > penalty for no good reason -- it's a question of API. You can remove the performance penalty by resetting the trace function to None. > just as the API lets you *create* threads, it should allow you to *kill* them, > once you decide so. your code shouldn't rely on the "cooperativeness" of > other functions (i.e., the thread does blocking IO using some external > library, but you wish to stop it after some timeout, etc.). According to recent unrelated research with regards to the Win32 API, most thread killing methods (if not all?) leaves the thread state broken in such a way that the only way to fix it is to close down the process. Then again, I could be misremembering, the Win32 API is huge. > all i was talking about was adding a new function to the thread module, > as well as a new builtin exception to completement it. it's no such a big > change that you should work extra hours in inventing creative workarounds > for. It took me 5 minutes to generate that possible solution and a test for it. I wasn't saying that the functionality was generally undesireable, just that I believed it should be possible in pure Python today (rather than waiting for Py3k as is the implication by your posting in the Py3k mailing list), and showing why it couldn't be done today. It also brings up the implied question as to whether non-mainthreads should actually execute trace functions. > you said: > > Already exists as sys.exit() > > but i said: > >> it would also allow you to ensure the interpreter is killed, as SystemExit > >> can be caught by external code against your will. > > please take the time to read my post before you reply. > here is what i mean by "against your will": I wasn't aware that sys.exit() raised SystemExit, as I tend to not use bare excepts or sys.exit() in my code (I prefer os._exit(), because when I want to quit, cleanup is the least of my worries). You could have said "sys.exit() raises SystemExit" and I would have understood my mistake. I'm curious as to what I have done to deserve the rudeness of your reply. - Josiah From tomerfiliba at gmail.com Fri Aug 4 22:21:54 2006 From: tomerfiliba at gmail.com (tomer filiba) Date: Fri, 4 Aug 2006 22:21:54 +0200 Subject: [Python-3000] improved threading in py3k In-Reply-To: <20060804121614.E6D4.JCARLSON@uci.edu> References: <20060804105349.E6C3.JCARLSON@uci.edu> <1d85506f0608041155rbf7b38egbae39f521a6f8a2a@mail.gmail.com> <20060804121614.E6D4.JCARLSON@uci.edu> Message-ID: <1d85506f0608041321h5a3b1d76gfae5bca45c37ff7e@mail.gmail.com> > I'm curious as to what I have done to deserve the rudeness of your reply. well, i'm kinda pissed off by rockets flying over my house, svn giving me a hard life, and what not. but what you have done was dismissing my post on shaky grounds. if all you meant was adding this support for the 2.x branch as a *workaround*, i truly apologize. > According to recent unrelated research with regards to the Win32 API, > most thread killing methods (if not all?) leaves the thread state broken > in such a way that the only way to fix it is to close down the process. > Then again, I could be misremembering, the Win32 API is huge. that may be so, but my suggestion wasn't *killing* the thread directly - i'm sure one can use win32api to forcefully kill threads. my idea, which is loosely based on dotNET (perhaps also applicable in java), was raising a ThreadExit exception in the context of the given thread. that way, the exception propagates up normally, and will eventually cause the thread's main function to exit silently, unless caught (just as it works today). the issue here is raising the exception in *another* thread (externally); this could only be done from a builtin-function (AFAIK); the rest of the mechanisms are already in place. - - - sorry for bursting out. -tomer On 8/4/06, Josiah Carlson wrote: > > "tomer filiba" wrote: > > > > > [...] it could be implemented as a debugging trace function > > > > even if it could be, *why*? you can't really suggest that from now on, > > every multithreaded app must run in trace mode, right? it's a performance > > penalty for no good reason -- it's a question of API. > > You can remove the performance penalty by resetting the trace function > to None. > > > > just as the API lets you *create* threads, it should allow you to *kill* them, > > once you decide so. your code shouldn't rely on the "cooperativeness" of > > other functions (i.e., the thread does blocking IO using some external > > library, but you wish to stop it after some timeout, etc.). > > According to recent unrelated research with regards to the Win32 API, > most thread killing methods (if not all?) leaves the thread state broken > in such a way that the only way to fix it is to close down the process. > Then again, I could be misremembering, the Win32 API is huge. > > > > all i was talking about was adding a new function to the thread module, > > as well as a new builtin exception to completement it. it's no such a big > > change that you should work extra hours in inventing creative workarounds > > for. > > It took me 5 minutes to generate that possible solution and a test for > it. I wasn't saying that the functionality was generally undesireable, > just that I believed it should be possible in pure Python today (rather > than waiting for Py3k as is the implication by your posting in the Py3k > mailing list), and showing why it couldn't be done today. It also > brings up the implied question as to whether non-mainthreads should > actually execute trace functions. > > > > you said: > > > Already exists as sys.exit() > > > > but i said: > > >> it would also allow you to ensure the interpreter is killed, as SystemExit > > >> can be caught by external code against your will. > > > > please take the time to read my post before you reply. > > here is what i mean by "against your will": > > I wasn't aware that sys.exit() raised SystemExit, as I tend to not use > bare excepts or sys.exit() in my code (I prefer os._exit(), because when > I want to quit, cleanup is the least of my worries). You could have > said "sys.exit() raises SystemExit" and I would have understood my > mistake. > > > I'm curious as to what I have done to deserve the rudeness of your reply. > - Josiah > > From jcarlson at uci.edu Fri Aug 4 23:02:28 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Fri, 04 Aug 2006 14:02:28 -0700 Subject: [Python-3000] improved threading in py3k In-Reply-To: <1d85506f0608041321h5a3b1d76gfae5bca45c37ff7e@mail.gmail.com> References: <20060804121614.E6D4.JCARLSON@uci.edu> <1d85506f0608041321h5a3b1d76gfae5bca45c37ff7e@mail.gmail.com> Message-ID: <20060804134148.E6D7.JCARLSON@uci.edu> "tomer filiba" wrote: > > > I'm curious as to what I have done to deserve the rudeness of your reply. > well, i'm kinda pissed off by rockets flying over my house, svn giving me a > hard life, and what not. but what you have done was dismissing my post on > shaky grounds. Ick. I can understand how you are frustrated. > > According to recent unrelated research with regards to the Win32 API, > > most thread killing methods (if not all?) leaves the thread state broken > > in such a way that the only way to fix it is to close down the process. > > Then again, I could be misremembering, the Win32 API is huge. > > that may be so, but my suggestion wasn't *killing* the thread directly - > i'm sure one can use win32api to forcefully kill threads. > my idea, which is loosely based on dotNET (perhaps also applicable in java), > was raising a ThreadExit exception in the context of the given thread. > that way, the exception propagates up normally, and will eventually cause > the thread's main function to exit silently, unless caught (just as it works > today). > > the issue here is raising the exception in *another* thread (externally); > this could only be done from a builtin-function (AFAIK); the rest of the > mechanisms are already in place. One of the use-cases you specified was that C calls could perhaps be aborted (an artificial timeout). Does there exist a mechanism that is able to abort the execution of C code from another C thread without killing the process? If so, then given that the C could be aborted at literally any point of execution, how could any cleanup be done? - Josiah From qrczak at knm.org.pl Fri Aug 4 23:42:07 2006 From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk) Date: Fri, 04 Aug 2006 23:42:07 +0200 Subject: [Python-3000] improved threading in py3k In-Reply-To: <1d85506f0608040836g1ccd894ck6b4a7b0607e7cd36@mail.gmail.com> (tomer filiba's message of "Fri, 4 Aug 2006 17:36:40 +0200") References: <1d85506f0608040836g1ccd894ck6b4a7b0607e7cd36@mail.gmail.com> Message-ID: <87r6zwp49c.fsf@qrnik.zagroda> "tomer filiba" writes: > once a thread is created, there is no way to kill it *externally*. > which is a pity, since the thread must be "willing" to die, Doing that unconditionally is impractical: the thread has no way to protect itself from being killed at moments it has invariants of shared data temporarily violated. I agree that it should not require continuous checking for a thread-local "ask to terminate" flag spread into all potentially long-running loops, i.e. it requires a language mechanism. But it must be temporarily blockable and catchable. Here is how I think the design should look like: http://www.cs.ioc.ee/tfp-icfp-gpce05/tfp-proc/06num.pdf This is the same issue as with other asynchronous exceptions like ^C. What has happened to Freund's & Mitchell's "Safe Asynchronous Exceptions For Python" ? My design is an extension of that. -- __("< Marcin Kowalczyk \__/ qrczak at knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/ From jcarlson at uci.edu Sat Aug 5 00:16:33 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Fri, 04 Aug 2006 15:16:33 -0700 Subject: [Python-3000] improved threading in py3k In-Reply-To: <87r6zwp49c.fsf@qrnik.zagroda> References: <1d85506f0608040836g1ccd894ck6b4a7b0607e7cd36@mail.gmail.com> <87r6zwp49c.fsf@qrnik.zagroda> Message-ID: <20060804150338.E6DA.JCARLSON@uci.edu> "Marcin 'Qrczak' Kowalczyk" wrote: > "tomer filiba" writes: > > > once a thread is created, there is no way to kill it *externally*. > > which is a pity, since the thread must be "willing" to die, > > Doing that unconditionally is impractical: the thread has no way > to protect itself from being killed at moments it has invariants of > shared data temporarily violated. > > I agree that it should not require continuous checking for a > thread-local "ask to terminate" flag spread into all potentially > long-running loops, i.e. it requires a language mechanism. But it > must be temporarily blockable and catchable. > > Here is how I think the design should look like: > http://www.cs.ioc.ee/tfp-icfp-gpce05/tfp-proc/06num.pdf I did not read all of that paper, but it seems to rely on the (un)masking of signals in threads, as well as the sending of signals to 'kill' a thread. One problem is that Windows doesn't really allow the sending/recieving of any non-process-killing signals, so it would be a platform-specific feature. If you want a sample implementation of that kind of thing, SAGE (http://modular.math.washington.edu/sage/) performs signal masking/unmasking to stop the execution of underlying computation threads. - Josiah From qrczak at knm.org.pl Sat Aug 5 12:29:59 2006 From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk) Date: Sat, 05 Aug 2006 12:29:59 +0200 Subject: [Python-3000] improved threading in py3k In-Reply-To: <20060804150338.E6DA.JCARLSON@uci.edu> (Josiah Carlson's message of "Fri, 04 Aug 2006 15:16:33 -0700") References: <1d85506f0608040836g1ccd894ck6b4a7b0607e7cd36@mail.gmail.com> <87r6zwp49c.fsf@qrnik.zagroda> <20060804150338.E6DA.JCARLSON@uci.edu> Message-ID: <87ac6j326w.fsf@qrnik.zagroda> Josiah Carlson writes: > I did not read all of that paper, but it seems to rely on the > (un)masking of signals in threads, as well as the sending of signals > to 'kill' a thread. They are not OS signals: it's entirely the matter of the language's runtime system. (But Unix signals can be nicely exposed as these signals for the programmer.) -- __("< Marcin Kowalczyk \__/ qrczak at knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/ From robinbryce at gmail.com Mon Aug 7 17:11:22 2006 From: robinbryce at gmail.com (Robin Bryce) Date: Mon, 7 Aug 2006 16:11:22 +0100 Subject: [Python-3000] improved threading in py3k In-Reply-To: <87ac6j326w.fsf@qrnik.zagroda> References: <1d85506f0608040836g1ccd894ck6b4a7b0607e7cd36@mail.gmail.com> <87r6zwp49c.fsf@qrnik.zagroda> <20060804150338.E6DA.JCARLSON@uci.edu> <87ac6j326w.fsf@qrnik.zagroda> Message-ID: On 05/08/06, Marcin 'Qrczak' Kowalczyk wrote: > Josiah Carlson writes: > > > I did not read all of that paper, but it seems to rely on the > > (un)masking of signals in threads, as well as the sending of signals > > to 'kill' a thread. > > They are not OS signals: it's entirely the matter of the language's > runtime system. > Have you come across the Pi-Calculus ? Every time I see this topic come up (GIL, threads, concurrency) it seems to founder on the fact[1] that this can not be solved without language support. This is not unique to python[2]. The thing that caught my attention with the Pi-Calculus is that it does not draw an artificial lines between os process, threads, functional program units or data parameters and it starts out by demonstrating very clearly why language equivalence (deterministic automata a == DAb) does not prevent *very* annoying behavioural differences. A result of the work (as far as I understood it) is that all can be treated as equivalent and strong formal tools are given for both modeling the interactions and proving things like behavioral equivalence. The book[4] references work done to show this is viable in interpreted/objecty languages as well as functional ones. Coming back a little way towards planet earth I remember the last time this sort of thing came up someone half heatedly suggested "active objects with messaging"[3] and things died off. Python has always struck me as a language for pragmatists, rather than a place to play about with esoteric academic curiosities. May be some one on this list can pick something useful to py3k out of Pi-calculus ? quoting:http://www.python.org/dev/summary/2005-09-16_2005-09-30.html#concurrency-in-python Guido threw down the gauntlet: rather than the endless discussion about this topic, someone should come up with a GIL-free Python (not necessarily CPython) and demonstrate its worth. [1] err, ok I can't locate the paper that shows this but I *swear* some one far better qualified than me has written one to this effect. [2] http://www.decadentplace.org.uk/pipermail/cpp-threads/2005-October/000715.html [3] http://www.python.org/dev/summary/2005-09-16_2005-09-30.html#concurrency-in-python also, http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/365292 [4] http://www.amazon.com/gp/product/0521658691/ref=si3_rdr_err_product/002-5641420-6196034?ie=UTF8 Cheers, Robin From talin at acm.org Tue Aug 8 18:49:08 2006 From: talin at acm.org (Talin) Date: Tue, 08 Aug 2006 09:49:08 -0700 Subject: [Python-3000] Set literals - another try Message-ID: <44D8C084.8090503@acm.org> Part 1: The concrete proposal part. I noticed that a lot of folks seemed to like the idea of making the empty set resemble the greek letter Phi, using a combination of parentheses and the vertical bar or forward slash character. So lets expand on this: slice Phi in half and say that (| and |) are delimiters for a set literal, as follows: (|) # Empty set (|a|) # Set with 1 item (|a,b|) # Set with 2 items The advantage of this proposal is that it maintains visual consistency between the 0, 1, and N element cases. Part 2: The idle speculation part, not to be considered as a actual proposal. I've often said that "whenever a programmer has the urge to invent a new programming language, that they should lie down on the couch until the feeling passes". One of the reasons for this is that many times, a programmer's motivation in creating a new language is not that they actually need a new language, but rather as a means of *criticising* an existing language. Inventing their own language gives them the opportunity to show how they would have done it. I think that kind of criticism can be valid, and that languages invented for this purpose can be useful, as long as you don't actually sit down and try to implement the thing. As a thought experiment, I decided to apply this idea to the Python set literal case - i.e. if we were going to do a massive "do over" of Python, how would we approach the problem of set literals? The syntax that comes to mind is something like this: a = b|c Where the vertical bar character means "forms a set with". Larger sets could be made using the same syntax: a = b|c|c|d You can also wrap parens around the set if you want: a = (b|c) Like tuples, a set with a single member still requires at least one delimiter: a = (b|) And the for the empty set, we're back to phi again: a = (|) However, the parens aren't generally required - the rules are pretty much the same as for tuples and the comma operator. Thus, passing sets as arguments: index = s.find_first_of( 'a'|'b'|'c'|'d' ) Of course, by doing this, we're re-assigning the meaning of the '|' operator from 'bitwise or' to 'set construction'. This only makes sense if you assume that either (a) set construction is more common than bitwise-or operations or (b) you provide some reasonable alternative way to express bitwise-or operations. Lets assume that we create some reasonable replacement and move on. Another thing to note is that the set construction operator resembles in some ways the "alternative" operator of BNF notation. In the previous example, 'find_first_of' looks for the first of the given alternatives. Since dictionaries are similar to sets, we can represent a dictionary as a set of keys and associated values. Dictionary literals already use the ':' operator to indicate a key - we can continue that with: a = ('Monday':1 | 'Tuesday':2 | 'Wednesday':3) Unlike the current language, however, you can omit the parens: a = 'Monday':1 | 'Tuesday':2 | 'Wednesday':3 (This creates a syntax ambiguity with colon, but let's move on :) One of the fun things about this line of speculation is watching how such a tiny change ripples outward, affecting the entire language definition. In this case, the change to set construction has much farther-reaching effects than what I have described here, assuming that you take each effect to its logical conclusion. I find it an enjoyable mental excersize :) -- Talin From talin at acm.org Tue Aug 8 18:52:36 2006 From: talin at acm.org (Talin) Date: Tue, 08 Aug 2006 09:52:36 -0700 Subject: [Python-3000] Range literals Message-ID: <44D8C154.9020406@acm.org> I've seen some languages that use a double-dot (..) to mean a range of items. This could be syntactic sugar for range(), like so: for x in 1..10: ... -- Talin From jcarlson at uci.edu Tue Aug 8 19:36:40 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Tue, 08 Aug 2006 10:36:40 -0700 Subject: [Python-3000] Set literals - another try In-Reply-To: <44D8C084.8090503@acm.org> References: <44D8C084.8090503@acm.org> Message-ID: <20060808100536.E706.JCARLSON@uci.edu> Talin wrote: > > Part 1: The concrete proposal part. > > I noticed that a lot of folks seemed to like the idea of making the > empty set resemble the greek letter Phi, using a combination of > parentheses and the vertical bar or forward slash character. > > So lets expand on this: slice Phi in half and say that (| and |) are > delimiters for a set literal, as follows: > > (|) # Empty set > > (|a|) # Set with 1 item > > (|a,b|) # Set with 2 items > > The advantage of this proposal is that it maintains visual consistency > between the 0, 1, and N element cases. That's quite a bit of punctuation to define a set literal. In fact, for 1+ element sets, it's only 1 character shy of the set() punctuation, while also being more difficult to type on at least US keyboards. And if I remember my set math correctly, phi wasn't the character generally used, it was usually a zero with a diagonal cross through it, making (/) a better empty set literal. But from there, the notation devolves into a place I don't want to go. > Part 2: The idle speculation part, not to be considered as a actual > proposal. > > I've often said that "whenever a programmer has the urge to invent a new > programming language, that they should lie down on the couch until the > feeling passes". Presumably you again don't remember the source of this quote, but it is still applicable. > As a thought experiment, I decided to apply this idea to the Python set > literal case - i.e. if we were going to do a massive "do over" of > Python, how would we approach the problem of set literals? > > The syntax that comes to mind is something like this: > > a = b|c The pipe character/bitwise or operator doesn't say to me "make a set". Knowing what I do about set math, the only literal that really makes sense to me is: {a,b,c,...} With the empty set being: {/} Interstingly enough, the non-empty set case has already been proposed, and if I remember correctly, was generally liked, except for the somewhat ambiguity with regards to dictionary literals. I personally don't see much of a use for set literals, considering that there is a non-ambiguous spelling of it currently; set(...), whose only cost above and beyond that of a set literal is a global name lookup. It is 'different' from some other first-class objects (tuple, list, dictionary, string, unicode, ...), but other first-class objects also require such spelling: bool, enumerate, iter, len, property, reduce. Each of which may be used sufficiently often to make sense as having a syntax for their operations, though perhaps only len having an obvious syntax of |obj| -> len(obj), though |obj| could also mean abs(obj), but presumably objects would only ever have __len__ or __abs__ and not both. I digress. -.5 for a set literal syntax at all, -1 for offering your particular set literal variant, -2 for your change propagating to dictionaries and beyond. - Josiah From jcarlson at uci.edu Tue Aug 8 19:44:17 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Tue, 08 Aug 2006 10:44:17 -0700 Subject: [Python-3000] Range literals In-Reply-To: <44D8C154.9020406@acm.org> References: <44D8C154.9020406@acm.org> Message-ID: <20060808104049.E709.JCARLSON@uci.edu> Talin wrote: > > I've seen some languages that use a double-dot (..) to mean a range of > items. This could be syntactic sugar for range(), like so: > > > for x in 1..10: > ... In the pronouncement on PEP 284: http://www.python.org/dev/peps/pep-0284/ Guido did not buy the premise that the range() format needed fixing, "The whole point (15 years ago) of range() was to *avoid* needing syntax to specify a loop over numbers. I think it's worked out well and there's nothing that needs to be fixed (except range() needs to become an iterator, which it will in Python 3.0)." Unless Guido has decided that range/xrange are the wrong way to do things, I don't think there is much discussion here. - Josiah From tomerfiliba at gmail.com Tue Aug 8 20:22:24 2006 From: tomerfiliba at gmail.com (tomer filiba) Date: Tue, 8 Aug 2006 20:22:24 +0200 Subject: [Python-3000] threading, part 2 Message-ID: <1d85506f0608081122r30f89973paf450514b00dcc92@mail.gmail.com> let me bring this anew, as the the previous discussion has gone quite off tracks. i know there are many theories/paradigms concerning parallel execution, some require language level constructs, other being external, and let's not ever start talking about the GIL. (on a side note, if i may add my opinion on the subject matter, stackless python has the best approach to concurrency -- don't lock, yield!) my previous suggestion asked for is a means to raise exceptions in the context of *other* threads. all it calls for is for a new builtin function, that would raise a given exception at the context of a given thread. there are some points to address: * native calls -- well, calling builtin functions can't be interrupted that way, and it is problematic, but not directly related to this proposal. that's a problem of machine code. * breaking the thread's state -- that's not really an issue. i'm not talking about *forcefully* killing the thread, without cleanup. after all, exceptions can occur anywhere in the code, and at any time... you code should always be aware of that, with no regard to being thread-safe. for example: def f(a, b): return a + b an innocent function, but now suppose i pass two huge strings... bad input can cause MemoryError, although unlikely. you can't take care of *everything*, you must learn to live with the occasional unexpected exception. so it's may seem brute to suggest a mechanism that raises exceptions at arbitrary points in your code-flow, but: * cleanup will be performed (objects will be reclaimed) * you can handle it anywhere in the call chain (just as any other exception) * most of the time, i'd use that to *kill* threads (the ThreadExit exception), so i don't expect the thread to recover. it should just die silently. sounds better now? -tomer -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060808/ee3aa259/attachment.htm From qrczak at knm.org.pl Tue Aug 8 21:05:24 2006 From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk) Date: Tue, 08 Aug 2006 21:05:24 +0200 Subject: [Python-3000] threading, part 2 In-Reply-To: <1d85506f0608081122r30f89973paf450514b00dcc92@mail.gmail.com> (tomer filiba's message of "Tue, 8 Aug 2006 20:22:24 +0200") References: <1d85506f0608081122r30f89973paf450514b00dcc92@mail.gmail.com> Message-ID: <877j1jf3pn.fsf@qrnik.zagroda> "tomer filiba" writes: > after all, exceptions can occur anywhere in the code, and at any time... It's impossible to write safe code when exceptions can occur at any time, except when you already happen have the needed atomic primitives available. Let's say we have a mutable doubly linked list (the list have first and last pointers, each node has next and prev pointers). Please show how to append a first node if exceptions can occur at any time. Not adding the element at all if an asynchronous exception is coming is acceptable, but corrupting the list structure is not. -- __("< Marcin Kowalczyk \__/ qrczak at knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/ From pje at telecommunity.com Tue Aug 8 21:30:30 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 08 Aug 2006 15:30:30 -0400 Subject: [Python-3000] Cross-interpreter FFI for Python 3000? In-Reply-To: Message-ID: <5.1.1.6.0.20060808151352.02604b30@sparrow.telecommunity.com> [Note: Discussion should move to the python-3000 list] At 11:28 AM 8/8/2006 -0700, Paul Prescod wrote: >I'll use up a little bit of my post-conference goodwill to push a >long-term obsession of mine...using a Python variant as the "standard" >extension/FFI model for Python (3000). I've heard variants of this idea >from many people I respect, some of who are cc:ed. > >I want to guage interest before doing any next steps. If it's >preemptively -1 then I won't bother. Therefore I would like to poll the >assembled brains about the feasibility of using something like >RPython/Pyrex as an abstraction layer to be compiled to Py2.5 PyObjects, >Py3000 PyObjects, JNI, .NET, ... > >Rationale: > >Each Python implementation needs an FFI. Any Python without a C-oriented >FFI lacks compatibility with C modules like Numeric and PIL. For this >reason, PyPy re-invented something like Pyrex as RPython. Just FYI, but if I understand correctly, PyPy is now using the ctypes API for its FFI. Also, RPython is entirely unrelated to Pyrex. RPython is Python with restrictions on how it's used, and doesn't include an FFI of its own. I would suggest that PyPy's use of ctypes, coupled with the inclusion of ctypes in the Python 2.5 stdlib, means that ctypes could reasonably be considered a defacto standard for a C FFI in Python at this point. While I *like* Pyrex a lot and use it for most extension modules I write, it is currently heavily tied to the CPython API, lacks many Python features that even RPython allows, it invents its own object model for C inheritance and imports, and has a lot of quirks due to being "not quite Python" in syntax or semantics. These characteristics are undesirable for a cross-interpreter FFI, IMO. A major advantage of using ctypes as the FFI, however, is that ctypes is a library, and thus does not require language or interpreter changes. This means, for example, that a third party could implement a ctypes clone for Jython or IronPython without burdening the core developers of those interpreters. > The two are >obviously not identical but I'm looking at the core idea of a language >that merges Python and C concepts to achieve a usable extension >mechanism. I overheard Jim musing about something similar for >IronPython. > >But most important: Python 3000 needs something like Pyrex. Python 3000 >and Python 2.6, 2.7, 2.8 may be arbitrarily different internally. If the >goal is for it to be "just a bit" incompatible then Guido's design space >is quite constrained. If it is allowed to be massively incompatible then >extension authors will scream. The Python 2.x line will co-exist with >the Python 3000 line for a while, and both with co-exist with >IronPython, Jython, PyPy and others. It would probably be best if you catch up on the current work by the PyPy team in this area, since my understanding is that PyPy is now able to compile "RPython+ctypes" code to create CPython extensions in C. This suggests that it should be possible to backends for C# and Java, because (again, if I understand correctly) the ctypes handling is done at a relatively high level of the translation tool chain, such that the backend code generators don't need to know anything about ctypes. Hopefully Armin or somebody else will jump in on this point if I'm getting something wrong about all that. > * it would be simpler to write competitive Python interpreters to test >out different design ideas...one wouldn't have to worry that such an >interpreter would be inherently a toy because of the unavailability of >third-party software Note that this is also a goal of the PyPy project, and they have many such options now, such as "pure" GC and refcounted variants, even if you entirely ignore the part where backends can generate code for a variety of languages. From jimjjewett at gmail.com Tue Aug 8 21:31:37 2006 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 8 Aug 2006 15:31:37 -0400 Subject: [Python-3000] threading, part 2 In-Reply-To: <1d85506f0608081122r30f89973paf450514b00dcc92@mail.gmail.com> References: <1d85506f0608081122r30f89973paf450514b00dcc92@mail.gmail.com> Message-ID: On 8/8/06, tomer filiba wrote: > my previous suggestion asked for is a means to raise exceptions in the > context of *other* threads. ... > * breaking the thread's state -- that's not really an issue. i'm not talking > about *forcefully* killing the thread, without cleanup. This has the same inherent problem as Java's Thread.stop -- that data shared beyond the thread may be left in an inconsistent state because the cleanup wasn't done, perhaps because a lock was held. https://java.sun.com/j2se/1.4.2/docs/guide/misc/threadPrimitiveDeprecation.html > so it's may seem brute to suggest a mechanism that raises exceptions > at arbitrary points in your code-flow, but: If you're willing to forget about native code (and you suggested that you were), then you could just check[*] every N bytecodes, the way the interpreters already checks to decide whether it should switch threads. Whether the performance overhead is worthwhile is a different question. It might be better to just add an example thread to threading.py (or Queue) that does its processing in a loop, and checks its own stop variable every time through the loop. [*] What to do in case of a raise it a bit trickier, of course -- basically, replace the next bytecode with a RAISE_VARARGS bytecode, but that might violate some current try-except assumptions. -jJ From collinw at gmail.com Tue Aug 8 21:50:20 2006 From: collinw at gmail.com (Collin Winter) Date: Tue, 8 Aug 2006 15:50:20 -0400 Subject: [Python-3000] Set literals - another try In-Reply-To: <20060808100536.E706.JCARLSON@uci.edu> References: <44D8C084.8090503@acm.org> <20060808100536.E706.JCARLSON@uci.edu> Message-ID: <43aa6ff70608081250m5e2f9b6fm547fe0b0fd265a48@mail.gmail.com> On 8/8/06, Josiah Carlson wrote: > I personally don't see much of a use for set literals, considering that > there is a non-ambiguous spelling of it currently; set(...), whose only > cost above and beyond that of a set literal is a global name lookup. I thought one of the main arguments in favor of set literals is that a literal form would allow the compiler to perform optimisations that the set(...) spelling doesn't allow. Collin Winter From jcarlson at uci.edu Tue Aug 8 22:21:39 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Tue, 08 Aug 2006 13:21:39 -0700 Subject: [Python-3000] Set literals - another try In-Reply-To: <43aa6ff70608081250m5e2f9b6fm547fe0b0fd265a48@mail.gmail.com> References: <20060808100536.E706.JCARLSON@uci.edu> <43aa6ff70608081250m5e2f9b6fm547fe0b0fd265a48@mail.gmail.com> Message-ID: <20060808131458.E70C.JCARLSON@uci.edu> "Collin Winter" wrote: > > On 8/8/06, Josiah Carlson wrote: > > I personally don't see much of a use for set literals, considering that > > there is a non-ambiguous spelling of it currently; set(...), whose only > > cost above and beyond that of a set literal is a global name lookup. > > I thought one of the main arguments in favor of set literals is that a > literal form would allow the compiler to perform optimisations that > the set(...) spelling doesn't allow. The optimization argument used to define language syntax seems a bit like the "tail wagging the dog" cliche. For immutable literals that are used a huge number of times (int, tuple, and other immutables), a literal syntax for compiler optimization makes sense. But for mutables (list, dict, etc.), literal syntax is more a convenience as than an optimization, as the compiler hasn't historically created once and copied for re-use, but pushed values on the stack and called the relevant create list bytecode. [1] - Josiah [1] Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import dis >>> def foo(): ... return [1,2,3] ... >>> def goo(): ... return (1,2,3) ... >>> dis.dis(foo) 2 0 LOAD_CONST 1 (1) 3 LOAD_CONST 2 (2) 6 LOAD_CONST 3 (3) 9 BUILD_LIST 3 12 RETURN_VALUE >>> dis.dis(goo) 2 0 LOAD_CONST 4 ((1, 2, 3)) 3 RETURN_VALUE >>> From tjreedy at udel.edu Tue Aug 8 23:12:25 2006 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 8 Aug 2006 17:12:25 -0400 Subject: [Python-3000] Cross-interpreter FFI for Python 3000? References: <5.1.1.6.0.20060808151352.02604b30@sparrow.telecommunity.com> Message-ID: For those as ignorant as I was, FFI does not here mean Friendly File Interface Fauna and Flora International Family Firm Institute Forsvarets forskningsinstitutt Film Finances, Inc. Financial Freedom Institute Focus on the Family Institute ... (all but the first from Google) but Foreign Function Interface (from the PHP FFI package). > I would suggest that PyPy's use of ctypes, coupled with the inclusion of > ctypes in the Python 2.5 stdlib, means that ctypes could reasonably be > considered a defacto standard for a C FFI in Python at this point. Intriguing idea. I know that the Pygame folks, for example, are experimenting with rewrapping the SDL (Simple Directmedia Library, the core of Pygame) in ctypes. Terry Jan Reedy From guido at python.org Tue Aug 8 23:31:59 2006 From: guido at python.org (Guido van Rossum) Date: Tue, 8 Aug 2006 14:31:59 -0700 Subject: [Python-3000] Cross-interpreter FFI for Python 3000? In-Reply-To: References: <5.1.1.6.0.20060808151352.02604b30@sparrow.telecommunity.com> Message-ID: On 8/8/06, Terry Reedy wrote: > > I would suggest that PyPy's use of ctypes, coupled with the inclusion of > > ctypes in the Python 2.5 stdlib, means that ctypes could reasonably be > > considered a defacto standard for a C FFI in Python at this point. > > Intriguing idea. I know that the Pygame folks, for example, are > experimenting with rewrapping the SDL (Simple Directmedia Library, the core > of Pygame) in ctypes. Isn't a problem with ctypes that such extensions can no longer guarantee "no segfaults"? This pretty much completely rules them out for use in sandboxes such as what Brett Cannon is currently working on. With hand-written extensions at least you can audit them to decide whether they are safe enough. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From paul.prescod at xmetal.com Tue Aug 8 23:45:18 2006 From: paul.prescod at xmetal.com (Paul Prescod) Date: Tue, 8 Aug 2006 14:45:18 -0700 Subject: [Python-3000] Cross-interpreter FFI for Python 3000? Message-ID: >... > > Just FYI, but if I understand correctly, PyPy is now using > the ctypes API for its FFI. Also, RPython is entirely > unrelated to Pyrex. RPython is Python with restrictions on > how it's used, and doesn't include an FFI of its own. As you said elsewhere, PyPy can compile an Rpython+rctypes program to a C file, just as Pyrex does. So I don't understand why you see them as "entirely unrelated". There are different syntaxes, but the goals are very similar. Pyrex uses optional type declarations (which are planned for Python 3000). RPython infers types from rctypes API calls (which will also be available in Python 3000). Perhaps it would be better if I dropped the reference to Rpython and merely talked about "extcompiler" there tool which is very parallel to the Pyrex compiler? You make some good points about Pyrex and ctypes. I'd rather explore the design space after I've heard whether this design direction has the potential to be fruitful. I infer that you think "yes". Paul Prescod From pje at telecommunity.com Wed Aug 9 00:40:15 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 08 Aug 2006 18:40:15 -0400 Subject: [Python-3000] Cross-interpreter FFI for Python 3000? In-Reply-To: Message-ID: <5.1.1.6.0.20060808180036.03b61bd8@sparrow.telecommunity.com> At 02:45 PM 8/8/2006 -0700, Paul Prescod wrote: >As you said elsewhere, PyPy can compile an Rpython+rctypes program to a >C file, just as Pyrex does. So I don't understand why you see them as >"entirely unrelated". Disclaimer again: I like and use Pyrex; I even built additional support for it into setuptools. Conversely, I've only used ctypes once and am not sure I care for its API. But as a practical matter, these preferences are irrelevant; I will end up learning to use ctypes and liking it, and so will everybody else, because ctypes' *dynamic* advantage will clean Pyrex's clock at the very moment that extcompiler is as easy to use as Pyrex is now. To summarize the differences, Pyrex is: * A *Python-like* language, rather than Python * Invents new inheritance/import facilities * Imports various bits of syntax from C, including operators, pointers, etc. * Inherently tied to the CPython API in its implementation * Has its own system of "header" files for compile-time import/include * Generates C code directly from Pyrex * Cannot be executed by standard Python PyPy's RPython+rctypes is: * 100% Python, with certain dynamicity constraints * Is not tied to any particular back end -- it can be translated to C, LLVM code, or even JavaScript if you like, as the type inference, annotation, and optimization machinery is backend-independent * Code can be run in a normal Python interpreter if a ctypes library is available The only relationship I see between the two are some overlap in use cases, and the letters "R", "P" and "Y" in the names. :) In particular, Pyrex cannot be used in the interpreter, and I can't see Guido allowing Pyrex's C syntax to infect Python-the-language, so this is likely to be a stable barrier keeping Pyrex from having this feature, unless Greg or somebody else decides to create a Pyrex interpreter, or perhaps an import hook to translates Pyrex source code to Python bytecode that invokes the ctypes API. :) (Note, by the way, that such an import hook/translator would be equally usable in PyPy, instantly making it possible to compile Pyrex to any backend supported by PyPy! I suggest you let that idea sink in for a little bit, as it helps to illustrate why making ctypes the standard FFI is the One Obvious Way To Do It.) >There are different syntaxes, but the goals are very similar. Well, you could say that about Python and Ruby, to name just two. Syntax is important. :) But that's also entirely ignoring the wide range of practical issues alluded to above, and some more I'll dig into below. >Pyrex uses optional type declarations (which are planned >for Python 3000). RPython infers types from rctypes API calls (which >will also be available in Python 3000). They're available in Python 2.5, which means code can be written for them today. The dynamic usability of ctypes from interpreted Python means that Pyrex will become a historical footnote as soon as the RPython+rctypes->CPython translator is practically usable; i.e., when it can compete with Pyrex for code generation speed (and speed of generated code), installability, documentation, and user community. At that point, the advantage of being able to debug your C interface using the interpreter's ctypes library, and then to compile the code only if/when you need to, will be a killer advantage. IMO, it doesn't make sense to fight that now-inevitable future, either on behalf of Pyrex or some imagined "better" alternative; instead, we might as well hasten that future's arrival. We can always provide better syntax for ctypes at a later date, the way 'classmethod' and friends arrived in Python 2.2 but didn't get syntax until 2.4. If you can't wait that long, write that import hook to turn Pyrex source into Python bytecode. :) > Perhaps it would be better if I >dropped the reference to Rpython and merely talked about "extcompiler" >there tool which is very parallel to the Pyrex compiler? I'm at a bit of a loss as to how to explain how very not-useful that comparison is. I would suggest reading up on PyPy architecture and Pyrex architecture a bit. From an end-user perspective you can compare them as things that take Python-looking stuff in and spit C code out, but the devil is definitely in the details. See also the lists I gave above. >You make some good points about Pyrex and ctypes. I'd rather explore the >design space after I've heard whether this design direction has the >potential to be fruitful. I infer that you think "yes". See http://dirtsimple.org/2005/10/children-of-lesser-python.html for what I think. :) In that article, I highlighted the absence of a standard Python FFI as being a stumbling block to the future evolution of the language, but noted that PyPy would likely end up with a solution. The subsequent emergence of ctypes as an FFI shared by CPython and PyPy has already solved this problem; it is merely a question of recognizing the fact. As of Python 2.5, anything else is going to have a serious uphill battle to fight -- even if it's something like Pyrex, that at least already *exists* and has at least *one* part-time maintainer. A brand-new FFI invented by committee and with nobody yet stepping up to implement or maintain it, really has no chance at all. (This is all IMO, of course, but I find it hard to imagine how anything else could succeed.) From 2006a at usenet.alexanderweb.de Wed Aug 9 01:08:50 2006 From: 2006a at usenet.alexanderweb.de (Alexander Schremmer) Date: Wed, 9 Aug 2006 01:08:50 +0200 Subject: [Python-3000] Cross-interpreter FFI for Python 3000? References: <5.1.1.6.0.20060808151352.02604b30@sparrow.telecommunity.com> Message-ID: <1ci9un3z1n806.dlg@usenet.alexanderweb.de> On Tue, 8 Aug 2006 14:31:59 -0700, Guido van Rossum wrote: > Isn't a problem with ctypes that such extensions can no longer > guarantee "no segfaults"? How would you guarantee the "no segfaults" policy for every other bindings involved? In either case, auditing an extension written using ctypes or rctypes is potentially simpler than looking at Pyrex or C code. (Think of memory management, ref counting etc.) > This pretty much completely rules them out for use in sandboxes such > as what Brett Cannon is currently working on. Of course you will have severe problems if you allow somebody to do unprotected calls to dynamic libraries. But at least I am not sure if this a problem of using CTypes ... it should be possible to e.g. flag the code using CTypes classes to be in a different security class than the user-sandboxed code. Building the barrier on the C level might be too restrictive in real world applications. > With hand-written extensions at least you can audit them to decide > whether they are safe enough. Please elaborate on that point - why isn't a ctypes extension "hand-written"? Kind regards, Alexander From pedronis at strakt.com Wed Aug 9 01:18:06 2006 From: pedronis at strakt.com (Samuele Pedroni) Date: Wed, 09 Aug 2006 01:18:06 +0200 Subject: [Python-3000] Cross-interpreter FFI for Python 3000? In-Reply-To: References: <5.1.1.6.0.20060808151352.02604b30@sparrow.telecommunity.com> Message-ID: <44D91BAE.5040507@strakt.com> Guido van Rossum wrote: > On 8/8/06, Terry Reedy wrote: > >>>I would suggest that PyPy's use of ctypes, coupled with the inclusion of >>>ctypes in the Python 2.5 stdlib, means that ctypes could reasonably be >>>considered a defacto standard for a C FFI in Python at this point. >> >>Intriguing idea. I know that the Pygame folks, for example, are >>experimenting with rewrapping the SDL (Simple Directmedia Library, the core >>of Pygame) in ctypes. > > > Isn't a problem with ctypes that such extensions can no longer > guarantee "no segfaults"? This pretty much completely rules them out > for use in sandboxes such as what Brett Cannon is currently working > on. With hand-written extensions at least you can audit them to decide > whether they are safe enough. in PyPy rctypes approach the extensions still get compiled to c code, and ctypes-like calls get resolved to normal c calls, although at some point a ctypes module is going to be exposed by PyPy, in the rctypes approach such an exposed ctypes is not a requirement at all. Rctypes gives ctypes-like C gluing to RPython, a different level from normal application-level full Python. And indeed (although with rough edges and some missing features at the moment) PyPy tool-chain can produce CPython extensions from such rpython+rctypes extension code. From ncoghlan at gmail.com Wed Aug 9 12:17:25 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 09 Aug 2006 20:17:25 +1000 Subject: [Python-3000] Set literals - another try In-Reply-To: <43aa6ff70608081250m5e2f9b6fm547fe0b0fd265a48@mail.gmail.com> References: <44D8C084.8090503@acm.org> <20060808100536.E706.JCARLSON@uci.edu> <43aa6ff70608081250m5e2f9b6fm547fe0b0fd265a48@mail.gmail.com> Message-ID: <44D9B635.9010200@gmail.com> Collin Winter wrote: > On 8/8/06, Josiah Carlson wrote: >> I personally don't see much of a use for set literals, considering that >> there is a non-ambiguous spelling of it currently; set(...), whose only >> cost above and beyond that of a set literal is a global name lookup. > I thought one of the main arguments in favor of set literals is that a > literal form would allow the compiler to perform optimisations that > the set(...) spelling doesn't allow. A different way to enable that would be to include a set of non-keyword names (a subset of the default builtin namespace) in the language definition that the compiler is explicitly permitted to treat as constants if they are not otherwise defined in the current lexical scope. Then constant-folding could turn "len('abcde')" into 5, and "str(3+2)" into '5' and "set((1, 2, 3))" into the corresponding set object. The only thing that would break is hacks like poking an alternate implementation of str or set or len into the global namespace from somewhere outside the module. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ncoghlan at gmail.com Wed Aug 9 12:45:08 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 09 Aug 2006 20:45:08 +1000 Subject: [Python-3000] threading, part 2 In-Reply-To: References: <1d85506f0608081122r30f89973paf450514b00dcc92@mail.gmail.com> Message-ID: <44D9BCB4.5010404@gmail.com> Jim Jewett wrote: > On 8/8/06, tomer filiba wrote: >> my previous suggestion asked for is a means to raise exceptions in the >> context of *other* threads. > > ... > >> * breaking the thread's state -- that's not really an issue. i'm not talking >> about *forcefully* killing the thread, without cleanup. > > This has the same inherent problem as Java's Thread.stop -- that data > shared beyond the thread may be left in an inconsistent state because > the cleanup wasn't done, perhaps because a lock was held. > > https://java.sun.com/j2se/1.4.2/docs/guide/misc/threadPrimitiveDeprecation.html > >> so it's may seem brute to suggest a mechanism that raises exceptions >> at arbitrary points in your code-flow, but: > > If you're willing to forget about native code (and you suggested that > you were), then you could just check[*] every N bytecodes, the way the > interpreters already checks to decide whether it should switch > threads. Whether the performance overhead is worthwhile is a > different question. That check is already there: int PyThreadState_SetAsyncExc( long id, PyObject *exc) Asynchronously raise an exception in a thread. The id argument is the thread id of the target thread; exc is the exception object to be raised. This function does not steal any references to exc. To prevent naive misuse, you must write your own C extension to call this. Must be called with the GIL held. Returns the number of thread states modified; if it returns a number greater than one, you're in trouble, and you should call it again with exc set to NULL to revert the effect. This raises no exceptions. New in version 2.3. In Python 2.5, you can use ctypes to get at the whole C API from Python code, and calling thread.get_ident() in the run() method will allow you to find out the thread id of your thread (you'll need to save that value somewhere so other code can get at it). All Tober is really asking for is a method on threading.Thread objects that uses this existing API to set a builtin ThreadExit exception. The thread module would consider a thread finishing with ThreadExit to be non-exceptional, so you could easily do: th.terminate() # Raise ThreadExit in th's thread of control th.join() # Should finish up pretty quickly Proper resource cleanup would be reliant on correct use of try/finally or with statements, but that's the case regardless of whether or not asynchronous exceptions are allowed. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ncoghlan at gmail.com Wed Aug 9 12:57:12 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 09 Aug 2006 20:57:12 +1000 Subject: [Python-3000] Cross-interpreter FFI for Python 3000? In-Reply-To: <5.1.1.6.0.20060808180036.03b61bd8@sparrow.telecommunity.com> References: <5.1.1.6.0.20060808180036.03b61bd8@sparrow.telecommunity.com> Message-ID: <44D9BF88.6080705@gmail.com> Phillip J. Eby wrote: > (This is all IMO, of course, but I find it hard to imagine how anything > else could succeed.) Having just made the point in another thread that it is possible to use ctypes to access the CPython API functions like PyThreadState_SetAsyncExc that have been designated "extension module only", I'm one who agrees with you - adding ctypes to the standard library effectively adopted it as Python's foreign function interface. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From jimjjewett at gmail.com Wed Aug 9 18:36:24 2006 From: jimjjewett at gmail.com (Jim Jewett) Date: Wed, 9 Aug 2006 12:36:24 -0400 Subject: [Python-3000] Set literals - another try In-Reply-To: <44D9B635.9010200@gmail.com> References: <44D8C084.8090503@acm.org> <20060808100536.E706.JCARLSON@uci.edu> <43aa6ff70608081250m5e2f9b6fm547fe0b0fd265a48@mail.gmail.com> <44D9B635.9010200@gmail.com> Message-ID: On 8/9/06, Nick Coghlan wrote: > A different way to enable that would be to include a set of non-keyword names > (a subset of the default builtin namespace) in the language definition that > the compiler is explicitly permitted to treat as constants if they are not > otherwise defined in the current lexical scope. Realistically, I want my own functions and class definitions to be treated that way (inlinable) most of the time. I don't want to start marking them with "stable". > The only thing that would break is hacks like poking an alternate > implementation of str or set or len into the global namespace from somewhere > outside the module. So what we need is a module that either rejects changes (after it is sealed) or at least provides notification (so things can be recompiled). In theory, this could even go into python 2.x (though not as the default), though it is a bit difficult in practice. (By the time you can specify an alternative dict factory, it is too late.) -jJ From guido at python.org Wed Aug 9 20:36:32 2006 From: guido at python.org (Guido van Rossum) Date: Wed, 9 Aug 2006 11:36:32 -0700 Subject: [Python-3000] Set literals - another try In-Reply-To: <44D8C084.8090503@acm.org> References: <44D8C084.8090503@acm.org> Message-ID: On 8/8/06, Talin wrote: > Part 1: The concrete proposal part. > > I noticed that a lot of folks seemed to like the idea of making the > empty set resemble the greek letter Phi, using a combination of > parentheses and the vertical bar or forward slash character. > > So lets expand on this: slice Phi in half and say that (| and |) are > delimiters for a set literal, as follows: > > (|) # Empty set > > (|a|) # Set with 1 item > > (|a,b|) # Set with 2 items > > The advantage of this proposal is that it maintains visual consistency > between the 0, 1, and N element cases. -1. This attempts to solve the lack of an empty set literal in the current best proposal, which is set(), {1}, {1, 2}, {1, 2, 3} etc. But it does so at the tremendous cost of inventing new unfamiliar brackets. > Part 2: The idle speculation part, not to be considered as a actual > proposal. [...] > The syntax that comes to mind is something like this: > > a = b|c This would be ambiguous since b|c also means set union. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Aug 9 20:43:50 2006 From: guido at python.org (Guido van Rossum) Date: Wed, 9 Aug 2006 11:43:50 -0700 Subject: [Python-3000] Set literals - another try In-Reply-To: References: <44D8C084.8090503@acm.org> <20060808100536.E706.JCARLSON@uci.edu> <43aa6ff70608081250m5e2f9b6fm547fe0b0fd265a48@mail.gmail.com> <44D9B635.9010200@gmail.com> Message-ID: > On 8/9/06, Nick Coghlan wrote: > > A different way to enable that would be to include a set of non-keyword names > > (a subset of the default builtin namespace) in the language definition that > > the compiler is explicitly permitted to treat as constants if they are not > > otherwise defined in the current lexical scope. Right. This has been considered many times. I would love it if someone wrote up a PEP for this. On 8/9/06, Jim Jewett wrote: > Realistically, I want my own functions and class definitions to be > treated that way (inlinable) most of the time. I don't want to start > marking them with "stable". I'm not sure what you mean here. Inlining user code really isn't on the table; it's unrealistic to expect this to happen any time soon (especially since you're likely to want to inline things imported from other modules too, and methds, etc.). > > The only thing that would break is hacks like poking an alternate > > implementation of str or set or len into the global namespace from somewhere > > outside the module. The PEP should consider this use case and propose a solution. I'm fine with requiring a module to write len = len near the top to declare that it wants len patchable. OTOH for open I think the compiler should *not* inline this as it is fairly common to monkey-patch it. > So what we need is a module that either rejects changes (after it is > sealed) or at least provides notification (so things can be > recompiled). In theory, this could even go into python 2.x (though > not as the default), though it is a bit difficult in practice. (By > the time you can specify an alternative dict factory, it is too late.) Recompilation upon notification seems way over the top; it's not like anything we currently do or are even considering. I'd much rather pick one of the following: (a) if the module doesn't have a global named 'len' and you add one (e.g. by "m.len = ...") the behavior is undefined (b) module objects actively reject attempts to inject new globals that would shadow built-ins in the list that Nick proposes. (BTW having such a list is a good idea. Requiring the compiler to know about *all* built-ins is not realistic since some frameworks patch the __builtin__ module.) PS. Nick, how's the book coming along? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Aug 9 20:45:34 2006 From: guido at python.org (Guido van Rossum) Date: Wed, 9 Aug 2006 11:45:34 -0700 Subject: [Python-3000] Set literals - another try In-Reply-To: <43aa6ff70608081250m5e2f9b6fm547fe0b0fd265a48@mail.gmail.com> References: <44D8C084.8090503@acm.org> <20060808100536.E706.JCARLSON@uci.edu> <43aa6ff70608081250m5e2f9b6fm547fe0b0fd265a48@mail.gmail.com> Message-ID: On 8/8/06, Collin Winter wrote: > I thought one of the main arguments in favor of set literals is that a > literal form would allow the compiler to perform optimisations that > the set(...) spelling doesn't allow. Let me clear up this misunderstanding. Optimizations have nothing to do with it (they would be invalid anyway since sets are mutable). It's a matter of writing more readable code. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Aug 9 20:53:31 2006 From: guido at python.org (Guido van Rossum) Date: Wed, 9 Aug 2006 11:53:31 -0700 Subject: [Python-3000] threading, part 2 In-Reply-To: <44D9BCB4.5010404@gmail.com> References: <1d85506f0608081122r30f89973paf450514b00dcc92@mail.gmail.com> <44D9BCB4.5010404@gmail.com> Message-ID: On 8/9/06, Nick Coghlan wrote: > That check is already there: > > int PyThreadState_SetAsyncExc( long id, PyObject *exc) > Asynchronously raise an exception in a thread. The id argument is the > thread id of the target thread; exc is the exception object to be raised. This > function does not steal any references to exc. To prevent naive misuse, you > must write your own C extension to call this. Must be called with the GIL > held. Returns the number of thread states modified; if it returns a number > greater than one, you're in trouble, and you should call it again with exc set > to NULL to revert the effect. This raises no exceptions. New in version 2.3. Note that it is intentionally not directly accessible from Python -- but this can be revised. > In Python 2.5, you can use ctypes to get at the whole C API from Python code, > and calling thread.get_ident() in the run() method will allow you to find out > the thread id of your thread (you'll need to save that value somewhere so > other code can get at it). > > All Tober is really asking for is a method on threading.Thread objects that > uses this existing API to set a builtin ThreadExit exception. The thread > module would consider a thread finishing with ThreadExit to be > non-exceptional, so you could easily do: > > th.terminate() # Raise ThreadExit in th's thread of control > th.join() # Should finish up pretty quickly > > Proper resource cleanup would be reliant on correct use of try/finally or with > statements, but that's the case regardless of whether or not asynchronous > exceptions are allowed. I'm +0 on this. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.peters at gmail.com Wed Aug 9 21:48:58 2006 From: tim.peters at gmail.com (Tim Peters) Date: Wed, 9 Aug 2006 15:48:58 -0400 Subject: [Python-3000] threading, part 2 In-Reply-To: References: <1d85506f0608081122r30f89973paf450514b00dcc92@mail.gmail.com> <44D9BCB4.5010404@gmail.com> Message-ID: <1f7befae0608091248q7f328875x7c2d03723acbf8d2@mail.gmail.com> [Nick Coghlan] >> That check is already there: >> >> int PyThreadState_SetAsyncExc( long id, PyObject *exc) >> Asynchronously raise an exception in a thread. The id argument is the >> thread id of the target thread; exc is the exception object to be raised. This >> function does not steal any references to exc. To prevent naive misuse, you >> must write your own C extension to call this. Must be called with the GIL >> held. Returns the number of thread states modified; if it returns a number >> greater than one, you're in trouble, and you should call it again with exc set >> to NULL to revert the effect. This raises no exceptions. New in version 2.3. Guido, do you have any idea now what the "number greater than one" business is about? That would happen if and only if we found more than one thread state with the given thread id in the interpreter's list of thread states, but we're counting those with both the GIL and the global head_mutex lock held. My impression has been that it would be an internal logic error if we ever saw this count exceed 1. While I'm at it, I expect: Py_CLEAR(p->async_exc); Py_XINCREF(exc); p->async_exc = exc; is better written: Py_XINCREF(exc); Py_CLEAR(p->async_exc); p->async_exc = exc; for the same reason one should always incref B before decrefing A in A = B ... >> All Tober is really asking for is a method on threading.Thread objects that >> uses this existing API to set a builtin ThreadExit exception. The thread >> module would consider a thread finishing with ThreadExit to be >> non-exceptional, so you could easily do: >> >> th.terminate() # Raise ThreadExit in th's thread of control >> th.join() # Should finish up pretty quickly >> >> Proper resource cleanup would be reliant on correct use of try/finally or with >> statements, but that's the case regardless of whether or not asynchronous >> exceptions are allowed. [Guido] > I'm +0 on this. Me too, although it won't stay that simple, and I'm clear as mud on how implementations other than CPython could implement this. From guido at python.org Wed Aug 9 22:39:25 2006 From: guido at python.org (Guido van Rossum) Date: Wed, 9 Aug 2006 13:39:25 -0700 Subject: [Python-3000] threading, part 2 In-Reply-To: <1f7befae0608091248q7f328875x7c2d03723acbf8d2@mail.gmail.com> References: <1d85506f0608081122r30f89973paf450514b00dcc92@mail.gmail.com> <44D9BCB4.5010404@gmail.com> <1f7befae0608091248q7f328875x7c2d03723acbf8d2@mail.gmail.com> Message-ID: On 8/9/06, Tim Peters wrote: > [Nick Coghlan] > >> That check is already there: > >> > >> int PyThreadState_SetAsyncExc( long id, PyObject *exc) > >> Asynchronously raise an exception in a thread. The id argument is the > >> thread id of the target thread; exc is the exception object to be raised. This > >> function does not steal any references to exc. To prevent naive misuse, you > >> must write your own C extension to call this. Must be called with the GIL > >> held. Returns the number of thread states modified; if it returns a number > >> greater than one, you're in trouble, and you should call it again with exc set > >> to NULL to revert the effect. This raises no exceptions. New in version 2.3. > > Guido, do you have any idea now what the "number greater than one" > business is about? That would happen if and only if we found more > than one thread state with the given thread id in the interpreter's > list of thread states, but we're counting those with both the GIL and > the global head_mutex lock held. My impression has been that it would > be an internal logic error if we ever saw this count exceed 1. Right, I think that's it. I guess I was in a grumpy mood when I wrote this (and Just & Alex never ended up using it!). > While I'm at it, I expect: > > Py_CLEAR(p->async_exc); > Py_XINCREF(exc); > p->async_exc = exc; > > is better written: > > Py_XINCREF(exc); > Py_CLEAR(p->async_exc); > p->async_exc = exc; > > for the same reason one should always incref B before decrefing A in > > A = B > > ... That reason that A and B might already be the same object, right? > >> All Tober is really asking for is a method on threading.Thread objects that > >> uses this existing API to set a builtin ThreadExit exception. The thread > >> module would consider a thread finishing with ThreadExit to be > >> non-exceptional, so you could easily do: > >> > >> th.terminate() # Raise ThreadExit in th's thread of control > >> th.join() # Should finish up pretty quickly > >> > >> Proper resource cleanup would be reliant on correct use of try/finally or with > >> statements, but that's the case regardless of whether or not asynchronous > >> exceptions are allowed. > > [Guido] > > I'm +0 on this. > > Me too, although it won't stay that simple, and I'm clear as mud on > how implementations other than CPython could implement this. Another good reason to keep it accessible from the C API only. Now I'm -0 on adding it. I suggest that if someone really wants this accessible from Python, they should research how Jython, IronPython, PyPy and Stackless could handle this, and report their research in a PEP. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From qrczak at knm.org.pl Thu Aug 10 00:27:16 2006 From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk) Date: Thu, 10 Aug 2006 00:27:16 +0200 Subject: [Python-3000] threading, part 2 In-Reply-To: (Guido van Rossum's message of "Wed, 9 Aug 2006 13:39:25 -0700") References: <1d85506f0608081122r30f89973paf450514b00dcc92@mail.gmail.com> <44D9BCB4.5010404@gmail.com> <1f7befae0608091248q7f328875x7c2d03723acbf8d2@mail.gmail.com> Message-ID: <871wrp8rzv.fsf@qrnik.zagroda> "Guido van Rossum" writes: >> for the same reason one should always incref B before decrefing A in >> >> A = B >> >> ... > > That reason that A and B might already be the same object, right? Or B might be a subobject of A, not referenced elsewhere. -- __("< Marcin Kowalczyk \__/ qrczak at knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/ From talin at acm.org Thu Aug 10 01:13:05 2006 From: talin at acm.org (Talin) Date: Wed, 09 Aug 2006 16:13:05 -0700 Subject: [Python-3000] Python/C++ question Message-ID: <44DA6C01.2040904@acm.org> A while back someone proposed switching to C++ as the implementation language for CPython, and the response was that this would make ABI compatibility too difficult, since the different C++ compilers don't have a common way to represent things like vtables and such. However, I was thinking - if you remove all of the ABI-breaking features of C++, such as virtual functions, name mangling, RTTI, exceptions, and so on, its still a pretty nice language compared to C - you still have things like namespaces, constructors/destructors (especially nice for stack-local objects), overloadable type conversion, automatic upcasting/downcasting, references, plus you don't have to keep repeating the word 'struct' everywhere. Think how much cleaner the Python source would be if just one C++ feature - namespaces - could be used. Imagine being able to put all of your enumeration values in their own namespace, instead of mixing them in with all the other global symbols. Think of the gazillions of cast operators you could get rid of if you could assign from PyString* to PyObject*, without having to explicitly cast between pointer types. My question is, however - would this even work? That is, if you wrapped all the source files in 'extern "C"', turned off the exception and RTTI compiler switches, suppressed the use of the C++ runtime libs and forbade use of the word 'virtual', would that effectively avoid the ABI compatibility issues? Would you be able to produce, on all supported platforms, a binary executable that was interoperable with ones produced by straight C? I actually have a personal motivation in asking this - it has been so many years since I've written in C, that I've actually *forgotten how*. Despite the fact that my very first C program, written in 1982, was a C compiler, today I find writing C programs a considerable challenge, because I don't remember exactly where the dividing line between C and C++ is - and I will either end up accidentally using a C++-specific language feature, or worse, I'll unconsciously avoid a valid C language feature because I don't remember whether it's C++ specific or not. (For example, I don't remember whether its valid to define an enumeration within a struct, which is something that I do all the time in C++.) -- Talin From guido at python.org Thu Aug 10 01:18:02 2006 From: guido at python.org (Guido van Rossum) Date: Wed, 9 Aug 2006 16:18:02 -0700 Subject: [Python-3000] Python/C++ question In-Reply-To: <44DA6C01.2040904@acm.org> References: <44DA6C01.2040904@acm.org> Message-ID: On 8/9/06, Talin wrote: > A while back someone proposed switching to C++ as the implementation > language for CPython, and the response was that this would make ABI > compatibility too difficult, since the different C++ compilers don't > have a common way to represent things like vtables and such. > > However, I was thinking - if you remove all of the ABI-breaking features > of C++, such as virtual functions, name mangling, RTTI, exceptions, and > so on, its still a pretty nice language compared to C - you still have > things like namespaces, constructors/destructors (especially nice for > stack-local objects), overloadable type conversion, automatic > upcasting/downcasting, references, plus you don't have to keep repeating > the word 'struct' everywhere. > > Think how much cleaner the Python source would be if just one C++ > feature - namespaces - could be used. Imagine being able to put all of > your enumeration values in their own namespace, instead of mixing them > in with all the other global symbols. > > Think of the gazillions of cast operators you could get rid of if you > could assign from PyString* to PyObject*, without having to explicitly > cast between pointer types. > > My question is, however - would this even work? That is, if you wrapped > all the source files in 'extern "C"', turned off the exception and RTTI > compiler switches, suppressed the use of the C++ runtime libs and > forbade use of the word 'virtual', would that effectively avoid the ABI > compatibility issues? Would you be able to produce, on all supported > platforms, a binary executable that was interoperable with ones produced > by straight C? > > I actually have a personal motivation in asking this - it has been so > many years since I've written in C, that I've actually *forgotten how*. > Despite the fact that my very first C program, written in 1982, was a C > compiler, today I find writing C programs a considerable challenge, > because I don't remember exactly where the dividing line between C and > C++ is - and I will either end up accidentally using a C++-specific > language feature, or worse, I'll unconsciously avoid a valid C language > feature because I don't remember whether it's C++ specific or not. (For > example, I don't remember whether its valid to define an enumeration > within a struct, which is something that I do all the time in C++.) For the majority of Python developers it's probably the other way around. It's been 15 years since I wrote C++, and unlike C, that language has changed a lot since then... It would be a complete rewrite; I prefer doing a gradual transmogrification of the current codebase into Py3k rather than starting from scratch (read Joel Spolsky on why). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From collinw at gmail.com Thu Aug 10 02:32:19 2006 From: collinw at gmail.com (Collin Winter) Date: Wed, 9 Aug 2006 20:32:19 -0400 Subject: [Python-3000] Draft pre-PEP: function annotations Message-ID: <43aa6ff70608091732o150a6674t4416f4b76d8bc40b@mail.gmail.com> After letting the discussions from the Spring stew in my head for a few months, here's my first draft of the proto-PEP for function annotations. This is intended to lay out in a single document the basic ideas for function annotations, to get community feedback on the fundamentals before proceeding to the nitty-gritty. As such, the implementation section isn't filled out; that's still in progress. Also, the list of references is incomplete. Both of these will be completed before the initial submission to the PEP editors. Without further ado... PEP: 3XXX Title: Function Annotations Version: $Revision: 43251 $ Last-Modified: $Date: 2006-03-23 09:28:55 -0500 (Thu, 23 Mar 2006) $ Author: Collin Winter Discussions-To: python-3000 at python.org Status: Draft Type: Standards Track Requires: 3XXX (Brett Cannon's __signature__ PEP) Content-Type: text/x-rst Created: 03-Aug-2006 Python-Version: 3.0 Post-History: Abstract ======== This PEP introduces a syntax for adding annotations to Python functions [#func-term#]_. In addition to annotations for function parameters, the syntax includes support for annotating a function's return value(s). In section one, I outline the "philosophy" and fundamentals needed to understand function annotations before launching into an in-depth discussion. In section two, the syntax for function annotations is presented, including a full explanation of the changes needed in Python's grammar. In section three, I discuss how user code will be able to access the annotation information. Section four describes a possible implementation of function annotations for Python 3.0. In section five, a C-language API for use by extension modules is discussed. Lastly, section six lists a number of ideas that were considered for inclusion but were ultimately rejected. Rationale ========= Because Python's 2.x series lacks a standard way of annotating a function's parameters and return values (e.g., with information about a what type a function's return value should be), a variety of tools and libraries have appeared to fill this gap [#tail-examp#]_. Some utilise the decorators introduced in "PEP 318", while others parse a function's doctext strings, looking for annotations there. This PEP aims to provide a single, standard way of specifying this information, reducing the confusion caused by the wide variation in mechanism and syntax that has existed until this point. Fundamentals of Function Annotations ==================================== Before launching into a discussion of the precise ins and outs of Python 3.0's function annotations, let's first talk broadly about what annotations are and are not: 1. Function annotations, both for parameters and return values, are completely optional. 2. Function annotations are nothing more than a way of associating arbitrary Python expressions with various parts of a function at compile-time. Re-read that. Once more. By itself, Python does not attach any particular meaning or significance to annotations. Left to its own, Python simply takes these expressions and uses them as the values in some theoretical parameter-name-to-annotation-expression mapping. The only way that annotations take on meaning is when they are interpreted by third-party libraries. These third-party, annotation-interpreting libraries (TAILs, for short) can do anything they want with a function's annotations. For example, one library might use string-based annotations to provide improved help messages, like so: :: def compile(source: "something compilable", filename: "where the compilable thing comes from", mode: "is this a single statement or a suite?"): ... Another library might be used to provide typechecking for Python functions and methods. This library could use annotations to indicate the function's expected input and return types, possibly something like :: def sum(*vargs: Number) -> Number: ... where ``Number`` is some description of the protocol for numeric types. However, neither the strings in the first example nor the type information in the second example have any meaning on their own; meaning comes from third-party libraries alone. 3. Following from point 2, this PEP makes no attempt to introduce any kind of standard semantics, even for the built-in types. This work will be left to third-party libraries. There is no worry that these libraries will assign semantics at random, or that a variety of libraries will appear, each with varying semantics and interpretations of what, say, a tuple of strings means. The difficulty inherent in writing annotation interpreting libraries will keep their number low and their authorship in the hands of people who, frankly, know what they're doing. Syntax ====== Parameters ---------- Annotations for parameters take the form of optional expressions that follow the parameter name. This example indicates that parameters 'a' and 'c' should both be a ``Number``, while parameter 'b' should both be a ``Mapping``: :: def foo(a: Number, b: Mapping, c: Number = 5): ... In pseudo-grammar, parameters now look like ``identifier [: expression] [= expression]``. That is, type annotations always precede a parameter's default value and both type annotations and default values are optional. Just like how equal signs are used to indicate a default value, colons are used to mark annotations. All annotation expressions are evaluated at the time the function is compiled. Annotations for excess parameters (i.e., *vargs and **kwargs) are indicated similarly. In the follow function definition, ``*vargs`` is flagged as a list of ``Number``s, and ``**kwargs`` is marked as a dict whose keys are strings and whose values are ``Sequence``s. :: def foo(*vargs: Number, **kwargs: Sequence): ... Note that, depending on what annotation-interpreting library you're using, the following might also be a valid spelling of the above: :: def foo(*vargs: [Number], **kwargs: {str: Sequence}): ... Only the first, however, has the BDFL's blessing [#blessed-excess#]_ as the One Obvious Way. Return Values ------------- The examples thus far have omitted examples of how to annotate the type of a function's return value. This is done like so: :: def sum(*vargs: Number) -> Number: ... The parameter list can now be followed by a literal ``->`` and a Python expression. Like the annotations for parameters, this expression will be evaluated when the function is compiled. The pseudo-grammar for function definition is now something like :: vargs = '*' identifier [':' expression] kwargs = '**' identifier [':' expression] parameter = identifier [':' expression] ['=' expression] funcdef = 'def' identifier '(' [parameter ',']* [vargs ','] [kwargs] ')' ['->' expression] ':' suite For a complete discussion of the changes to Python's grammar, see the section `Grammar Changes`_. Accessing Function Annotations ============================== Once compiled, a function's annotations are available via the function's ``__signature__`` attribute, introduced by PEP 3XXX. Signature objects include an attribute just for annotations, appropriately called ``annotations``. This attribute is a dictionary, mapping parameter names to an object representing the evaluated annotation expression. There is a special key in the ``annotations`` mapping, ``"return"``. This key is present only if an annotation was supplied for the function's return value. For example, the following annotation: :: def foo(a: Number, b: 5 + 6, c: list) -> String: ... would result in a ``__signature__.annotations`` mapping of :: {'a': Number, 'b': 11, 'c': list, 'return': String} The ``return`` key was chosen because it cannot conflict with the name of a parameter; any attempt to use ``return`` as a parameter name would result in a ``SyntaxError``. Implementation ============== XXX This is all very much TODO. Beyond the obvious changes to Python's grammar, the eventual implementation will probably involve a change to the MAKE_FUNCTION opcode, though the details haven't been fully worked out yet. I'm still working on a sample implementation that works separately from the __signature__ mechanism. API for Annotations in C-language Extension Modules =================================================== XXX TODO This will probably involve macros around CPython API calls to set and fetch the annotation expression for a given parameter. Rejected Proposals ================== + The BDFL rejected the author's idea for a special syntax for adding annotations to generators as being "too ugly" [#reject-gen-syn]_. + Though discussed early on ([#thread-gen#]_, [#thread-hof#]_), including special objects in the stdlib for annotating generator functions and higher-order functions was ultimately rejected as being more appropriate for third-party libraries: including them in the standard library raised too many thorny issues. + Despite considerable discussion about a standard type parameterisation syntax, it was decided that this should also be left to third-party libraries. ([#thread_imm-list#]_, [#thread-mixing#]_, [#emphasis-tpls#]_) Footnotes ========= .. _[#func-term#] - Unless specifically stated, "function" is generally used as a synonym for "callable" throughout this document. .. _[#tail-examp#] - The author's typecheck_ library makes use of decorators, while `Maxime Bourget's own typechecker`_ utilises parsed doctext strings. References ########## .. _[#blessed-excess#] - http://mail.python.org/pipermail/python-3000/2006-May/002173.html .. _[#reject-gen-syn#] - http://mail.python.org/pipermail/python-3000/2006-May/002103.html .. _typecheck - http://oakwinter.com/code/typecheck/ .. _Maxime Bourget's own typechecker - http://maxrepo.info/taxonomy/term/3,6/all .. _[#thread-gen#] - http://mail.python.org/pipermail/python-3000/2006-May/002091.html .. _[#thread-hof#] - http://mail.python.org/pipermail/python-3000/2006-May/001972.html .. _[#thread-imm-list#] - http://mail.python.org/pipermail/python-3000/2006-May/002105.html .. _[#thread-mixing#] - http://mail.python.org/pipermail/python-3000/2006-May/002209.html .. _[#emphasis-tpls#] - http://mail.python.org/pipermail/python-3000/2006-June/002438.html From talin at acm.org Thu Aug 10 02:51:02 2006 From: talin at acm.org (Talin) Date: Wed, 09 Aug 2006 17:51:02 -0700 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <43aa6ff70608091732o150a6674t4416f4b76d8bc40b@mail.gmail.com> References: <43aa6ff70608091732o150a6674t4416f4b76d8bc40b@mail.gmail.com> Message-ID: <44DA82F6.5030907@acm.org> Collin Winter wrote: > There is no worry that these libraries will assign semantics at > random, or that a variety of libraries will appear, each with varying > semantics and interpretations of what, say, a tuple of strings > means. The difficulty inherent in writing annotation interpreting > libraries will keep their number low and their authorship in the > hands of people who, frankly, know what they're doing. I find this assumption extremely dubious. > In pseudo-grammar, parameters now look like > ``identifier [: expression] [= expression]``. That is, type > annotations always precede a parameter's default value and both type > annotations and default values are optional. Just like how equal > signs are used to indicate a default value, colons are used to mark > annotations. All annotation expressions are evaluated at the time > the function is compiled. Only one annotation per parameter? What if I want to specify both a docstring *and* a type constraint? -- Talin From collinw at gmail.com Thu Aug 10 03:02:08 2006 From: collinw at gmail.com (Collin Winter) Date: Wed, 9 Aug 2006 21:02:08 -0400 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <44DA82F6.5030907@acm.org> References: <43aa6ff70608091732o150a6674t4416f4b76d8bc40b@mail.gmail.com> <44DA82F6.5030907@acm.org> Message-ID: <43aa6ff70608091802sc2cd03bg9c43a237bcf13d8@mail.gmail.com> On 8/9/06, Talin wrote: > Collin Winter wrote: > > There is no worry that these libraries will assign semantics at > > random, or that a variety of libraries will appear, each with varying > > semantics and interpretations of what, say, a tuple of strings > > means. The difficulty inherent in writing annotation interpreting > > libraries will keep their number low and their authorship in the > > hands of people who, frankly, know what they're doing. > > I find this assumption extremely dubious. Why? This is something Guido and I have discussed and agreed on. What's your reasoning? > > In pseudo-grammar, parameters now look like > > ``identifier [: expression] [= expression]``. That is, type > > annotations always precede a parameter's default value and both type > > annotations and default values are optional. Just like how equal > > signs are used to indicate a default value, colons are used to mark > > annotations. All annotation expressions are evaluated at the time > > the function is compiled. > > Only one annotation per parameter? What if I want to specify both a > docstring *and* a type constraint? If the grammar were something like ``identifier [: expression]* [= expression]`` instead, it would be possible to add multiple annotations to parameters. But what of the return value? Would you want to write def foo() -> Number -> "total number of frobnications": ... I wouldn't. The way to make this explicit, if you need it, would be something like this: def bar(a: ("number of whatzits", Number)) -> ("frobnication count", Number): then use a decorator to determine which annotation-interpreting decorators are assigned which annotations, something like this, perhaps: @chain(annotation_as_docstring, annotation_as_type) def bar(a: ("number of whatzits", Number)) -> ("frobnication count", Number): Collin Winter From tim.peters at gmail.com Thu Aug 10 03:38:20 2006 From: tim.peters at gmail.com (Tim Peters) Date: Wed, 9 Aug 2006 21:38:20 -0400 Subject: [Python-3000] threading, part 2 In-Reply-To: References: <1d85506f0608081122r30f89973paf450514b00dcc92@mail.gmail.com> <44D9BCB4.5010404@gmail.com> <1f7befae0608091248q7f328875x7c2d03723acbf8d2@mail.gmail.com> Message-ID: <1f7befae0608091838u594de27ctb83dd0845ccaa0@mail.gmail.com> [back and forth on PyThreadState_SetAsyncExc()] [Tim] >> Guido, do you have any idea now what the "number greater than one" >> business is about? >> ... >> My impression has been that it would be an internal logic error if we >> ever saw this count exceed 1. [Guido] > Right, I think that's it. I guess I was in a grumpy mood when I wrote > this I forgot that we talked about this close to two years ago: http://www.python.org/sf/1069160 As comments there say, it's still the case that it's clearly possible to provoke this into deadlocking (but unlikely if you're not deliberately trying to). > (and Just & Alex never ended up using it!). They spoke for themselves on this matter in that bug report ;-) >> While I'm at it, I expect: >> >> Py_CLEAR(p->async_exc); >> Py_XINCREF(exc); >> p->async_exc = exc; >> >> is better written: >> >> Py_XINCREF(exc); >> Py_CLEAR(p->async_exc); >> p->async_exc = exc; >> >> for the same reason one should always incref B before decrefing A in >> >> A = B >> >> ... > That reason that A and B might already be the same object, right? Right, or that B's only owned reference is on a chain only reachable from A, and in either case A's incoming refcount is 1. The suggested deadlock-avoiding rewrite in the patch comment addresses that too. ... >>> I'm +0 on [exposing] this [from Python]. >> Me too, although it won't stay that simple, and I'm clear as mud on >> how implementations other than CPython could implement this. > Another good reason to keep it accessible from the C API only. Now I'm > -0 on adding it. I suggest that if someone really wants this > accessible from Python, they should research how Jython, IronPython, > PyPy and Stackless could handle this, and report their research in a > PEP. As a full-blown language feature, I'm -1 unless that work is done first. I'm still +0 on adding it to CPython if it's given a leading-underscore name and docs to make clear that it's a CPython-specific hack that may never work under any other implementation. From greg.ewing at canterbury.ac.nz Thu Aug 10 04:47:48 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 10 Aug 2006 14:47:48 +1200 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <43aa6ff70608091732o150a6674t4416f4b76d8bc40b@mail.gmail.com> References: <43aa6ff70608091732o150a6674t4416f4b76d8bc40b@mail.gmail.com> Message-ID: <44DA9E54.5020105@canterbury.ac.nz> Collin Winter wrote: > one library might use string-based annotations to provide > improved help messages, like so: > > def compile(source: "something compilable", > filename: "where the compilable thing comes from", > mode: "is this a single statement or a suite?"): > > Another library might be used to provide typechecking for Python > functions and methods. > > def sum(*vargs: Number) -> Number: > ... And what are you supposed to do if you want to write a function that has improved help messages *and* type checking? > The difficulty inherent in writing annotation interpreting > libraries will keep their number low and their authorship in the > hands of people who, frankly, know what they're doing. Even if there are only two of them, they can still conflict. I think the idea of having totally undefined annotations is fundamentally flawed. -- Greg From greg.ewing at canterbury.ac.nz Thu Aug 10 04:49:55 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 10 Aug 2006 14:49:55 +1200 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <43aa6ff70608091802sc2cd03bg9c43a237bcf13d8@mail.gmail.com> References: <43aa6ff70608091732o150a6674t4416f4b76d8bc40b@mail.gmail.com> <44DA82F6.5030907@acm.org> <43aa6ff70608091802sc2cd03bg9c43a237bcf13d8@mail.gmail.com> Message-ID: <44DA9ED3.3040304@canterbury.ac.nz> Collin Winter wrote: > On 8/9/06, Talin wrote: > >>Collin Winter wrote: >> >>> The difficulty inherent in writing annotation interpreting >>> libraries will keep their number low and their authorship in the >>> hands of people who, frankly, know what they're doing. >> >>I find this assumption extremely dubious. > > Why? This is something Guido and I have discussed and agreed on. It smells like something akin to security by obscurity to me. -- Greg From collinw at gmail.com Thu Aug 10 04:58:55 2006 From: collinw at gmail.com (Collin Winter) Date: Wed, 9 Aug 2006 22:58:55 -0400 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <44DA9E54.5020105@canterbury.ac.nz> References: <43aa6ff70608091732o150a6674t4416f4b76d8bc40b@mail.gmail.com> <44DA9E54.5020105@canterbury.ac.nz> Message-ID: <43aa6ff70608091958u2d00db76s48260853942bed32@mail.gmail.com> On 8/9/06, Greg Ewing wrote: > Collin Winter wrote: > > one library might use string-based annotations to provide > > improved help messages, like so: > > > > def compile(source: "something compilable", > > filename: "where the compilable thing comes from", > > mode: "is this a single statement or a suite?"): > > > > Another library might be used to provide typechecking for Python > > functions and methods. > > > > def sum(*vargs: Number) -> Number: > > ... > > And what are you supposed to do if you want to write > a function that has improved help messages *and* > type checking? I already answered this in my response to Talin. The next draft will address this directly. > > The difficulty inherent in writing annotation interpreting > > libraries will keep their number low and their authorship in the > > hands of people who, frankly, know what they're doing. > > Even if there are only two of them, they can still > conflict. No-one is arguing that there won't be conflicting ideas about how to spell different annotation ideas; just look at the number of interface/role/typeclass/whatever implementations. The idea is that each developer can pick the notation/semantics that's most natural to them. I'll go even further: say one library offers a semantics you find handy for task A, while another library's ideas about type annotations are best suited for task B. Without a single standard, you're free to mix and match these libraries to give you a combination that allows you to best express the ideas you're going for. Collin Winter From guido at python.org Thu Aug 10 06:14:03 2006 From: guido at python.org (Guido van Rossum) Date: Wed, 9 Aug 2006 21:14:03 -0700 Subject: [Python-3000] threading, part 2 In-Reply-To: <1f7befae0608091838u594de27ctb83dd0845ccaa0@mail.gmail.com> References: <1d85506f0608081122r30f89973paf450514b00dcc92@mail.gmail.com> <44D9BCB4.5010404@gmail.com> <1f7befae0608091248q7f328875x7c2d03723acbf8d2@mail.gmail.com> <1f7befae0608091838u594de27ctb83dd0845ccaa0@mail.gmail.com> Message-ID: On 8/9/06, Tim Peters wrote: > [back and forth on PyThreadState_SetAsyncExc()] > > [Tim] > >> Guido, do you have any idea now what the "number greater than one" > >> business is about? > >> ... > >> My impression has been that it would be an internal logic error if we > >> ever saw this count exceed 1. > > [Guido] > > Right, I think that's it. I guess I was in a grumpy mood when I wrote > > this > > I forgot that we talked about this close to two years ago: > > http://www.python.org/sf/1069160 > > As comments there say, it's still the case that it's clearly possible > to provoke this into deadlocking (but unlikely if you're not > deliberately trying to). > > > (and Just & Alex never ended up using it!). > > They spoke for themselves on this matter in that bug report ;-) > > >> While I'm at it, I expect: > >> > >> Py_CLEAR(p->async_exc); > >> Py_XINCREF(exc); > >> p->async_exc = exc; > >> > >> is better written: > >> > >> Py_XINCREF(exc); > >> Py_CLEAR(p->async_exc); > >> p->async_exc = exc; > >> > >> for the same reason one should always incref B before decrefing A in > >> > >> A = B > >> > >> ... > > > That reason that A and B might already be the same object, right? > > Right, or that B's only owned reference is on a chain only reachable > from A, and in either case A's incoming refcount is 1. The suggested > deadlock-avoiding rewrite in the patch comment addresses that too. So why didn't we check that in? > ... > > >>> I'm +0 on [exposing] this [from Python]. > > >> Me too, although it won't stay that simple, and I'm clear as mud on > >> how implementations other than CPython could implement this. > > > Another good reason to keep it accessible from the C API only. Now I'm > > -0 on adding it. I suggest that if someone really wants this > > accessible from Python, they should research how Jython, IronPython, > > PyPy and Stackless could handle this, and report their research in a > > PEP. > > As a full-blown language feature, I'm -1 unless that work is done > first. I'm still +0 on adding it to CPython if it's given a > leading-underscore name and docs to make clear that it's a > CPython-specific hack that may never work under any other > implementation. Fine with me then. In 2.5? 2.6? Or py3k? (This is the py3k list.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From paul at prescod.net Thu Aug 10 07:19:03 2006 From: paul at prescod.net (Paul Prescod) Date: Wed, 9 Aug 2006 22:19:03 -0700 Subject: [Python-3000] Ctypes as cross-interpreter C calling interface Message-ID: <1cb725390608092219v695b7f24t92534d3aa444ca8c@mail.gmail.com> Thanks for everyone who contributed. It seems that the emerging consensus (bar a security question from Guido) is that ctypes it the way forward for calling C code in Python 3000. I'd like to clarify what this might mean: 1. Is ctypes and pure python fast enough for most real-world extension modules like PyOpenGL, PyExpat, Tkinter, and socket programming? I know that experimentation is ongoing. Are any results in? 2. If not, will Python 3000's build or runtime system use some kind of optimization technique such as static compilation (e.g. extcompiler[1]) or JIT compilation to allow parts of its library (especially new parts) to be written using ctypes instead of C? 3. Presuming that the performance issue can be worked out one way or another, are there arguments in favour of interpreter-specific C-coded extensions other than those doing explicitly interpreter-specific stuff (e.g. tweaking the GC). 4. Will the Python 3000 standard library start to migrate towards ctypes (for new extensions)? Paul Prescod [1] http://codespeak.net/pypy/dist/pypy/doc/extcompiler.html -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060809/3b2d1dd1/attachment.html From krstic at solarsail.hcs.harvard.edu Thu Aug 10 07:32:38 2006 From: krstic at solarsail.hcs.harvard.edu (Ivan Krstic) Date: Thu, 10 Aug 2006 01:32:38 -0400 Subject: [Python-3000] threading, part 2 In-Reply-To: References: <1d85506f0608081122r30f89973paf450514b00dcc92@mail.gmail.com> <44D9BCB4.5010404@gmail.com> <1f7befae0608091248q7f328875x7c2d03723acbf8d2@mail.gmail.com> <1f7befae0608091838u594de27ctb83dd0845ccaa0@mail.gmail.com> Message-ID: <44DAC4F6.3010002@solarsail.hcs.harvard.edu> Guido van Rossum wrote: > Fine with me then. In 2.5? 2.6? Or py3k? (This is the py3k list.) FWIW, we'll ship 2.5 on the OLPC (laptop.org) machines, and it looks like we'll need this. It'd be useful to have it directly in CPython, so people running our software outside the laptops don't have to fuss with an extension. -- Ivan Krstic | GPG: 0x147C722D From pje at telecommunity.com Thu Aug 10 08:28:23 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu, 10 Aug 2006 02:28:23 -0400 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: Message-ID: <5.1.1.6.0.20060810021302.0262fcd0@sparrow.telecommunity.com> At 14:47 8/10/2006 +1200, Greg Ewing wrote: >And what are you supposed to do if you want to write >a function that has improved help messages *and* >type checking? Create a type annotation object that wraps multiple objects -- or better yet, use a list or tuple of annotations. (See below.) > > The difficulty inherent in writing annotation interpreting > > libraries will keep their number low and their authorship in the > > hands of people who, frankly, know what they're doing. > >Even if there are only two of them, they can still >conflict. > >I think the idea of having totally undefined >annotations is fundamentally flawed. No, your assumption is fundamentally flawed. ;-) This is a trivial application of overloaded functions. In PEAK, there is a similar concept called "attribute metadata" that can be applied to the attributes of a class. A single overloaded function called "declareAttribute" is used to "declare" the metadata. These metadata annotations can be anything you want. Certain PEAK frameworks use them for security declarations. Others use them to mark an attribute as providing a certain interface for child components, to describe the attribute's syntax for parsing or formatting, and so on. There is no predefined semantics for these metadata objects -- none whatsoever. Each framework that needs a new kind of metadata object simply defines a class that holds whatever metadata is desired, and adds a method to the "declareAttribute" function to handle objects of that type. The added method can do anything: modify the class or descriptor in some way, register something in a registry, or whatever else you want it to do. In addition, the declareAttribute function comes with predefined methods for processing tuples and lists by iterating over them and calling declareAttribute recursively. This makes it easy to combine groups of metadata objects and reuse them. So I see no problems with this concept that overloaded functions don't trivially solve. Any operation you want to perform on function annotations need only be implemented as an overloaded function, and there is then no conflict to worry about. For example, if you are writing a documentation tool that needs to generate a short HTML string for an annotation, you just create an overloaded function for that. Then somebody using the documentation tool with arbitrary type annotation frameworks (e.g. their own) can just add methods to the documentation tool's overloaded functions to support that. Indeed, many a time I've wished that epydoc was written using overloaded functions, as it then would've been easy to extend it to gracefully handle PEAK's more esoteric descriptors and metaclasses. From behnel_ml at gkec.informatik.tu-darmstadt.de Thu Aug 10 09:28:12 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Thu, 10 Aug 2006 09:28:12 +0200 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <43aa6ff70608091732o150a6674t4416f4b76d8bc40b@mail.gmail.com> References: <43aa6ff70608091732o150a6674t4416f4b76d8bc40b@mail.gmail.com> Message-ID: <44DAE00C.400@gkec.informatik.tu-darmstadt.de> Hi, Collin Winter wrote: > def compile(source: "something compilable", > filename: "where the compilable thing comes from", > mode: "is this a single statement or a suite?"): > ... > > def sum(*vargs: Number) -> Number: > ... Admittedly, I'm not so much in the "Spring stew" discussion, but I'm not a big fan of cluttering up my function signature with "make them short to make them fit" comments. What would be wrong in adding a standard decorator for this purpose? Something like: @type_annotate("This is a filename passed as string", filename = str) @type_annotate(source = str) def compile(source, filename, mode): ... or, more explicitly: @arg_docstring(filename = "This is a filename passed as string") @arg_type(filename = str) @arg_type(source = str) def compile(source, filename, mode): ... Stefan From behnel_ml at gkec.informatik.tu-darmstadt.de Thu Aug 10 09:31:24 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Thu, 10 Aug 2006 09:31:24 +0200 Subject: [Python-3000] Ctypes as cross-interpreter C calling interface In-Reply-To: <1cb725390608092219v695b7f24t92534d3aa444ca8c@mail.gmail.com> References: <1cb725390608092219v695b7f24t92534d3aa444ca8c@mail.gmail.com> Message-ID: <44DAE0CC.8040909@gkec.informatik.tu-darmstadt.de> Paul Prescod wrote: > 2. If not, will Python 3000's build or runtime system use some kind of > optimization technique such as static compilation ( e.g. extcompiler[1]) > or JIT compilation to allow parts of its library (especially new parts) > to be written using ctypes instead of C? What's the problem? Just take PyPy and brand it as Python 3000. Stefan From l.oluyede at gmail.com Thu Aug 10 10:15:10 2006 From: l.oluyede at gmail.com (Lawrence Oluyede) Date: Thu, 10 Aug 2006 10:15:10 +0200 Subject: [Python-3000] Changing behavior of sequence multiplication by negative integer Message-ID: <9eebf5740608100115g1fa7a861rd0b9a84a7b64d4be@mail.gmail.com> I've never seen bugs determined by operations such as: "foobar" * -1 and to be honest I've never seen code like that because the semantics is somewhat senseless to me but I think the behavior of the expression evaluation of "Sequence * negative integer" should be changed from: >>> "foobar" * -1 '' >>> ["foobar"] * -1 [] >>> ("foobar") * -1 '' to something throwing an exception like when you try to multiplicate the sequence by a floating point number: >>> "foobar" * 1.0 Traceback (most recent call last): File "", line 1, in ? TypeError: can't multiply sequence by non-int It's not a big deal to me but maybe this can be addressed in the python3000 branch -- Lawrence http://www.oluyede.org/blog From ncoghlan at gmail.com Thu Aug 10 13:19:55 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 10 Aug 2006 21:19:55 +1000 Subject: [Python-3000] threading, part 2 In-Reply-To: <44DAC4F6.3010002@solarsail.hcs.harvard.edu> References: <1d85506f0608081122r30f89973paf450514b00dcc92@mail.gmail.com> <44D9BCB4.5010404@gmail.com> <1f7befae0608091248q7f328875x7c2d03723acbf8d2@mail.gmail.com> <1f7befae0608091838u594de27ctb83dd0845ccaa0@mail.gmail.com> <44DAC4F6.3010002@solarsail.hcs.harvard.edu> Message-ID: <44DB165B.2040901@gmail.com> Ivan Krstic wrote: > Guido van Rossum wrote: >> Fine with me then. In 2.5? 2.6? Or py3k? (This is the py3k list.) > > FWIW, we'll ship 2.5 on the OLPC (laptop.org) machines, and it looks > like we'll need this. It'd be useful to have it directly in CPython, so > people running our software outside the laptops don't have to fuss with > an extension. Given the time frame, I think you might be stuck with using ctypes to get at the functionality for Python 2.5. Now that Guido & Tim have mentioned it, I also vaguely recall portability to GIL-free implementations being one of the problems with the idea back when the C API function was added, so exposing this officially to Python code should probably wait until 2.6. Peter Hansen worked out the necessary incantations to invoke it through ctypes back in 2004 [1]. The difference now is that "import ctypes" will work on a vanilla 2.5 installation. Cheers, Nick. [1] http://groups.google.com/group/comp.lang.python/msg/d310502f7c7133a9 -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ncoghlan at gmail.com Thu Aug 10 13:40:32 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 10 Aug 2006 21:40:32 +1000 Subject: [Python-3000] Changing behavior of sequence multiplication by negative integer In-Reply-To: <9eebf5740608100115g1fa7a861rd0b9a84a7b64d4be@mail.gmail.com> References: <9eebf5740608100115g1fa7a861rd0b9a84a7b64d4be@mail.gmail.com> Message-ID: <44DB1B30.1030200@gmail.com> Lawrence Oluyede wrote: > I've never seen bugs determined by operations such as: > > "foobar" * -1 > > and to be honest I've never seen code like that because the semantics > is somewhat senseless to me but I think the behavior of the expression > evaluation of "Sequence * negative integer" should be changed from: > >>>> "foobar" * -1 > '' >>>> ["foobar"] * -1 > [] >>>> ("foobar") * -1 > '' > > to something throwing an exception like when you try to multiplicate > the sequence by a floating point number: > >>>> "foobar" * 1.0 > Traceback (most recent call last): > File "", line 1, in ? > TypeError: can't multiply sequence by non-int > > It's not a big deal to me but maybe this can be addressed in the > python3000 branch > The "negative coerced to 0" behaviour is to make it easy to do things like padding a sequence to a minimum length: seq = seq + pad * (min_length- len(seq)) Without the current behaviour, all such operations would need to be rewritten as: seq = seq + pad * max((min_length- len(seq)), 0) Gratuitous breakage that leads to a more verbose result gets a solid -1 from me :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From l.oluyede at gmail.com Thu Aug 10 14:03:28 2006 From: l.oluyede at gmail.com (Lawrence Oluyede) Date: Thu, 10 Aug 2006 14:03:28 +0200 Subject: [Python-3000] Changing behavior of sequence multiplication by negative integer In-Reply-To: <44DB1B30.1030200@gmail.com> References: <9eebf5740608100115g1fa7a861rd0b9a84a7b64d4be@mail.gmail.com> <44DB1B30.1030200@gmail.com> Message-ID: <9eebf5740608100503l16238585yf1f2c38b1a4a4142@mail.gmail.com> > The "negative coerced to 0" behaviour is to make it easy to do things like > padding a sequence to a minimum length: > > seq = seq + pad * (min_length- len(seq)) > > Without the current behaviour, all such operations would need to be rewritten as: > > seq = seq + pad * max((min_length- len(seq)), 0) > > Gratuitous breakage that leads to a more verbose result gets a solid -1 from me :) That sound a -1 to me too. Thanks for the explanation. I was sure there was one for that kind of behavior. -- Lawrence http://www.oluyede.org/blog From behnel_ml at gkec.informatik.tu-darmstadt.de Thu Aug 10 15:00:30 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Thu, 10 Aug 2006 15:00:30 +0200 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <44DAE00C.400@gkec.informatik.tu-darmstadt.de> References: <43aa6ff70608091732o150a6674t4416f4b76d8bc40b@mail.gmail.com> <44DAE00C.400@gkec.informatik.tu-darmstadt.de> Message-ID: <44DB2DEE.7020601@gkec.informatik.tu-darmstadt.de> Stefan Behnel wrote: > Collin Winter wrote: >> def compile(source: "something compilable", >> filename: "where the compilable thing comes from", >> mode: "is this a single statement or a suite?"): >> ... >> >> def sum(*vargs: Number) -> Number: >> ... > > Admittedly, I'm not so much in the "Spring stew" discussion, but I'm not a big > fan of cluttering up my function signature with "make them short to make them > fit" comments. > > What would be wrong in adding a standard decorator for this purpose? Something > like: > > @type_annotate("This is a filename passed as string", filename = str) > @type_annotate(source = str) > def compile(source, filename, mode): > ... > > or, more explicitly: > > @arg_docstring(filename = "This is a filename passed as string") > @arg_type(filename = str) > @arg_type(source = str) > def compile(source, filename, mode): > ... Ah, never mind, that only applies to docstrings. The type annotation would not be available to the compiler... So, it would be a good idea to split the two: docstrings and types. Where a decorator provides a readable (and extensible) solution for the first, type annotations should be part of the signature IMHO. Stefan From jimjjewett at gmail.com Thu Aug 10 16:13:14 2006 From: jimjjewett at gmail.com (Jim Jewett) Date: Thu, 10 Aug 2006 10:13:14 -0400 Subject: [Python-3000] Changing behavior of sequence multiplication by negative integer In-Reply-To: <44DB1B30.1030200@gmail.com> References: <9eebf5740608100115g1fa7a861rd0b9a84a7b64d4be@mail.gmail.com> <44DB1B30.1030200@gmail.com> Message-ID: Lawrence Oluyede wrote: > seq * -5 > and to be honest I've never seen code like that because the semantics > is somewhat senseless to me To be honest, I would almost expect the negative to mean "count from the end", so that it also reversed the sequence. It doesn't, but ... it does make for a hard-to-explain case. > ... evaluation of "Sequence * negative integer" should be changed from: > >>> "foobar" * -1 > '' > > ... to something throwing an exception like when you try to multiplicate > > the sequence by a floating point number: Agreed. On 8/10/06, Nick Coghlan wrote: > The "negative coerced to 0" behaviour is to make it easy to do things like > padding a sequence to a minimum length: > seq = seq + pad * (min_length- len(seq)) Typically, if I need to pad a sequence to a minimum length, I really need it to be a specific length. Having it already be too long is likely to cause problems later. So I really do prefer the explicit version. Also compare this to the recent decision that __index__ should *not* silently clip to a C long > Without the current behaviour, all such operations would need to be rewritten as: > seq = seq + pad * max((min_length- len(seq)), 0) I would write it as # Create a record-size pad outside the loop pad = " "*length ... seq = (seq+pad)[:length] -jJ From ncoghlan at gmail.com Thu Aug 10 16:33:27 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 11 Aug 2006 00:33:27 +1000 Subject: [Python-3000] Changing behavior of sequence multiplication by negative integer In-Reply-To: References: <9eebf5740608100115g1fa7a861rd0b9a84a7b64d4be@mail.gmail.com> <44DB1B30.1030200@gmail.com> Message-ID: <44DB43B7.2060608@gmail.com> Jim Jewett wrote: > I would write it as > > # Create a record-size pad outside the loop > pad = " "*length > ... > seq = (seq+pad)[:length] I'd generally do padding to a fixed length that way as well, but any code relying on the current 'clip to 0' behaviour would break if this changed. Without a really compelling reason to change it, it's hard to justify any breakage at all (even if there may be better ways of doing things). While I take your point about the comparison to __index__, the difference is that clipping sequence repetition to 0 has been the expected behaviour for many releases, whereas in the __index__ overflow case the expected behaviour was for the code to raise an exception. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From slawek at cs.lth.se Thu Aug 10 16:49:07 2006 From: slawek at cs.lth.se (Slawomir Nowaczyk) Date: Thu, 10 Aug 2006 16:49:07 +0200 Subject: [Python-3000] Changing behavior of sequence multiplication by negative integer In-Reply-To: References: <44DB1B30.1030200@gmail.com> Message-ID: <20060810164518.EF5E.SLAWEK@cs.lth.se> On Thu, 10 Aug 2006 10:13:14 -0400 Jim Jewett wrote: #> > seq = seq + pad * (min_length- len(seq)) #> #> Typically, if I need to pad a sequence to a minimum length, I really #> need it to be a specific length. Having it already be too long is #> likely to cause problems later. So I really do prefer the explicit #> version. Well, for whatever it is worth, if I pad the data to present it in a readable form I *most definitely* do not want values to become truncated just because they turn out to be bigger than I originally expected. An ugly result is worse than nice result, but still better than wrong result. -- Best wishes, Slawomir Nowaczyk ( Slawomir.Nowaczyk at cs.lth.se ) All I want is a warm bed, and a kind word, and unlimited power. From guido at python.org Thu Aug 10 19:50:24 2006 From: guido at python.org (Guido van Rossum) Date: Thu, 10 Aug 2006 10:50:24 -0700 Subject: [Python-3000] Ctypes as cross-interpreter C calling interface In-Reply-To: <1cb725390608092219v695b7f24t92534d3aa444ca8c@mail.gmail.com> References: <1cb725390608092219v695b7f24t92534d3aa444ca8c@mail.gmail.com> Message-ID: I worry that this may be too ambitious to add to the already significant load for the Py3k project. You've seen my timeline -- alpha in early 07, final a year later. Don't get me wrong! I think that completely changing the FFI paradigm (as opposed to evolutionary changes to the existing C API, which py3k is doing) is a very worthy project, but I'd rather conceive it as something orthogonal to the py3k transition. It doesn't have to wait for py3k, nor should py3k have to wait for it. Tying too many projects together in terms of mutual dependencies is a great way to cause total paralysis. --Guido On 8/9/06, Paul Prescod wrote: > Thanks for everyone who contributed. It seems that the emerging consensus > (bar a security question from Guido) is that ctypes it the way forward for > calling C code in Python 3000. I'd like to clarify what this might mean: > > 1. Is ctypes and pure python fast enough for most real-world extension > modules like PyOpenGL, PyExpat, Tkinter, and socket programming? I know that > experimentation is ongoing. Are any results in? > > 2. If not, will Python 3000's build or runtime system use some kind of > optimization technique such as static compilation ( e.g. extcompiler[1]) or > JIT compilation to allow parts of its library (especially new parts) to be > written using ctypes instead of C? > > 3. Presuming that the performance issue can be worked out one way or > another, are there arguments in favour of interpreter-specific C-coded > extensions other than those doing explicitly interpreter-specific stuff ( > e.g. tweaking the GC). > > 4. Will the Python 3000 standard library start to migrate towards ctypes > (for new extensions)? > > Paul Prescod > > [1] > http://codespeak.net/pypy/dist/pypy/doc/extcompiler.html > > > > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/guido%40python.org > > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Thu Aug 10 20:05:51 2006 From: guido at python.org (Guido van Rossum) Date: Thu, 10 Aug 2006 11:05:51 -0700 Subject: [Python-3000] Range literals In-Reply-To: <20060808104049.E709.JCARLSON@uci.edu> References: <44D8C154.9020406@acm.org> <20060808104049.E709.JCARLSON@uci.edu> Message-ID: I haven't changed my mind. Do you really want to add atrocities such as having both .. and ... in the language where one includes the end point and the other excludes it? How would a casual user remember which is which? --Guido On 8/8/06, Josiah Carlson wrote: > > Talin wrote: > > > > I've seen some languages that use a double-dot (..) to mean a range of > > items. This could be syntactic sugar for range(), like so: > > > > > > for x in 1..10: > > ... > > In the pronouncement on PEP 284: http://www.python.org/dev/peps/pep-0284/ > > Guido did not buy the premise that the range() format needed fixing, > "The whole point (15 years ago) of range() was to *avoid* needing syntax > to specify a loop over numbers. I think it's worked out well and there's > nothing that needs to be fixed (except range() needs to become an > iterator, which it will in Python 3.0)." > > Unless Guido has decided that range/xrange are the wrong way to do > things, I don't think there is much discussion here. > > - Josiah > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Thu Aug 10 20:40:46 2006 From: guido at python.org (Guido van Rossum) Date: Thu, 10 Aug 2006 11:40:46 -0700 Subject: [Python-3000] Rounding in Py3k In-Reply-To: References: <44D1F304.4020700@iinet.net.au> <44D2A81D.2050204@canterbury.ac.nz> <44D3124A.6010300@canterbury.ac.nz> Message-ID: On 8/4/06, Ron Adam wrote: > But that doesn't explain why int, long, and float, don't have other > non-magic methods. > > I'm not attempting taking sides for or against either way, I just want > to understand the reasons as it seems like by knowing that, the correct > way to do it would be clear, instead of trying to wag the dog by the > tail if you know what I mean. I'm probably the source of this convention. For numbers, I find foo(x) more readable than x.foo(), mostly because of the longstanding tradition in mathematics to write things like f(x) and sin(x). Originally I had extended the same convention to strings; but over time it became clear that there was a common set of operations on strings that were so fundamental that having to import a module to use them was a mistake, and there were too many to make them all built-ins. (I didn't insist on not using methods/attributes for complex, since I was already used to seeing z.re and z.im in Algol-68). I'm not convinced that there are enough common operations on the standard numbers to change my mind now. I'd rather see the built-in round() use a new protocol __round__() than switching to a round() method on various numbers; this should hopefully make it possible to use round() on Decimal instances. A question is what the API for __round__() should be. It seems Decimal uses a different API than round(). Can someone think about this more and propose a unified and backwards compatible solution? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From tomerfiliba at gmail.com Thu Aug 10 21:14:27 2006 From: tomerfiliba at gmail.com (tomer filiba) Date: Thu, 10 Aug 2006 21:14:27 +0200 Subject: [Python-3000] threading, part 2 Message-ID: <1d85506f0608101214g594d2dal282ab2ae60f29f11@mail.gmail.com> [Tim] > Me too, although it won't stay that simple, and I'm clear as mud on > how implementations other than CPython could implement this. [Guido] > Another good reason to keep it accessible from the C API only. Now I'm > -0 on adding it. I suggest that if someone really wants this > accessible from Python, they should research how Jython, IronPython, > PyPy and Stackless could handle this, and report their research in a > PEP. then how does interrupt_main work? is it implementation-agnostic? ----- >>> import thread >>> help(thread.interrupt_main) Help on built-in function interrupt_main in module thread: interrupt_main(...) interrupt_main() Raise a KeyboardInterrupt in the main thread. A subthread can use this function to interrupt the main thread. ----- just let me raise arbitrary exceptions (don't limit it to KeyboardInterrupt) -tomer From tim.peters at gmail.com Thu Aug 10 23:40:59 2006 From: tim.peters at gmail.com (Tim Peters) Date: Thu, 10 Aug 2006 17:40:59 -0400 Subject: [Python-3000] threading, part 2 In-Reply-To: References: <1d85506f0608081122r30f89973paf450514b00dcc92@mail.gmail.com> <44D9BCB4.5010404@gmail.com> <1f7befae0608091248q7f328875x7c2d03723acbf8d2@mail.gmail.com> <1f7befae0608091838u594de27ctb83dd0845ccaa0@mail.gmail.com> Message-ID: <1f7befae0608101440i30590f4dv2740e584f801881c@mail.gmail.com> [back and forth on PyThreadState_SetAsyncExc(), and the 2-year old discussion in http://www.python.org/sf/1069160 ] [Tim] >> [still-current deadlock & refcount issues not fixed at the time] [Guido] > So why didn't we check that in? The shallow answer is that you closed the report without checking it in, so ask a mirror ;-) The real answer seems to be that nobody (including me) really cared about this function, since it's both unused and untested in the core, and there were no known uses from anyone's C extensions either. [on adding it to the language] >>>>> +0 >>>> Me too, although ... I'm clear as mud on how implementations other >>>> than CPython could implement this. >>> Now I'm -0 on adding it. I suggest that if someone really wants this >>> accessible from Python, they should research how Jython, IronPython, >>> PyPy and Stackless could handle this, and report their research in a >>> PEP. >> As a full-blown language feature, I'm -1 unless that work is done >> first. I'm still +0 on adding it to CPython if it's given a >> leading-underscore name and docs to make clear that it's a >> CPython-specific hack that may never work under any other >> implementation. > Fine with me then. In 2.5? 2.6? Or py3k? (This is the py3k list.) Since the 2.5 beta series is supposedly done with, I strongly doubt Anthony wants to see a new feature snuck into 2.5c1. Someone who wants it enough could target 2.6. I'm only +0, so I'd do that only if someone wants it enough to pay for it. For 2.5, I'll check in the anal correctness changes, add a ctypes-based test case, and reword the docs to stop warning about a return value > 1 (all those are just fixing what's going to be in 2.5 anyway). From guido at python.org Fri Aug 11 01:17:52 2006 From: guido at python.org (Guido van Rossum) Date: Thu, 10 Aug 2006 16:17:52 -0700 Subject: [Python-3000] Ctypes as cross-interpreter C calling interface In-Reply-To: <1cb725390608101319j19731f91vfc472d9113a03ccf@mail.gmail.com> References: <1cb725390608092219v695b7f24t92534d3aa444ca8c@mail.gmail.com> <1cb725390608101319j19731f91vfc472d9113a03ccf@mail.gmail.com> Message-ID: (Adding python-3000 back to the CC: list.) On 8/10/06, Paul Prescod wrote: > The only reason to tie it to Py3K is because Py3K is breaking APIs anyhow. > It will be in the overlap period between Py3K and Py2x that the need for an > abstraction will be most acute. Otherwise extensions will probably end up > with a lot of #ifdef py3k #else etc. > > It isn't clear how ambitious or not this is until we drill in. For example, > if pure "ctypes" is sufficiently efficient for 90% of all extensions, then > moving in this direction for Py3K might require nothing more than a > declaration from you that new extensions should be written using ctypes > instead of the PyObject APIs unless there is a very good reason. After all, > people will take their cue from you as to what sort of coding convention is > appropriate for the standard library. Is this first step doable? Just a > declaration that (with a few exceptions) ctypes is preferable to C code for > new extensions? > > But if that's totally unreasonable because ctypes is seldom performant > enough then the project gets more ambitious because it would have to pull in > extcompiler... I don't know enough about ctypes, but assuming I have a reason to write an extension in C (e.g. Tkinter, which uses the Tcl/Tk API), how to I use ctypes to call things like PyDict_GetItem() or PyErr_SetString()? --Guido > On 8/10/06, Guido van Rossum wrote: > > I worry that this may be too ambitious to add to the already > > significant load for the Py3k project. You've seen my timeline -- > > alpha in early 07, final a year later. > > > > Don't get me wrong! I think that completely changing the FFI paradigm > > (as opposed to evolutionary changes to the existing C API, which py3k > > is doing) is a very worthy project, but I'd rather conceive it as > > something orthogonal to the py3k transition. It doesn't have to wait > > for py3k, nor should py3k have to wait for it. Tying too many projects > > together in terms of mutual dependencies is a great way to cause total > > paralysis. > > > > --Guido > > > > On 8/9/06, Paul Prescod wrote: > > > Thanks for everyone who contributed. It seems that the emerging > consensus > > > (bar a security question from Guido) is that ctypes it the way forward > for > > > calling C code in Python 3000. I'd like to clarify what this might mean: > > > > > > 1. Is ctypes and pure python fast enough for most real-world extension > > > modules like PyOpenGL, PyExpat, Tkinter, and socket programming? I know > that > > > experimentation is ongoing. Are any results in? > > > > > > 2. If not, will Python 3000's build or runtime system use some kind of > > > optimization technique such as static compilation ( e.g. extcompiler[1]) > or > > > JIT compilation to allow parts of its library (especially new parts) to > be > > > written using ctypes instead of C? > > > > > > 3. Presuming that the performance issue can be worked out one way or > > > another, are there arguments in favour of interpreter-specific C-coded > > > extensions other than those doing explicitly interpreter-specific stuff > ( > > > e.g. tweaking the GC). > > > > > > 4. Will the Python 3000 standard library start to migrate towards > ctypes > > > (for new extensions)? > > > > > > Paul Prescod > > > > > > [1] > > > > http://codespeak.net/pypy/dist/pypy/doc/extcompiler.html > > > > > > > > > > > > > > > _______________________________________________ > > > Python-3000 mailing list > > > Python-3000 at python.org > > > http://mail.python.org/mailman/listinfo/python-3000 > > > Unsubscribe: > > > > http://mail.python.org/mailman/options/python-3000/guido%40python.org > > > > > > > > > > > > > > > -- > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Aug 11 01:21:02 2006 From: guido at python.org (Guido van Rossum) Date: Thu, 10 Aug 2006 16:21:02 -0700 Subject: [Python-3000] threading, part 2 In-Reply-To: <1d85506f0608101214g594d2dal282ab2ae60f29f11@mail.gmail.com> References: <1d85506f0608101214g594d2dal282ab2ae60f29f11@mail.gmail.com> Message-ID: On 8/10/06, tomer filiba wrote: > [Tim] > > Me too, although it won't stay that simple, and I'm clear as mud on > > how implementations other than CPython could implement this. > > [Guido] > > Another good reason to keep it accessible from the C API only. Now I'm > > -0 on adding it. I suggest that if someone really wants this > > accessible from Python, they should research how Jython, IronPython, > > PyPy and Stackless could handle this, and report their research in a > > PEP. > > then how does interrupt_main work? is it implementation-agnostic? I expect that Jython doesn't implement this; it doesn't handle ^C either AFAIK. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From paul at prescod.net Fri Aug 11 01:45:00 2006 From: paul at prescod.net (Paul Prescod) Date: Thu, 10 Aug 2006 16:45:00 -0700 Subject: [Python-3000] Ctypes as cross-interpreter C calling interface In-Reply-To: References: <1cb725390608092219v695b7f24t92534d3aa444ca8c@mail.gmail.com> <1cb725390608101319j19731f91vfc472d9113a03ccf@mail.gmail.com> Message-ID: <1cb725390608101645g3a9db04dhcf76cfd03e3a15fc@mail.gmail.com> Sorry for the cc mistake. I don't know enough about ctypes, but assuming I have a reason to > write an extension in C (e.g. Tkinter, which uses the Tcl/Tk API), how > to I use ctypes to call things like PyDict_GetItem() or > PyErr_SetString()? There are two answers to your question. The simplest is that if you have a dict object called "foo" you just call 'foo["abc"]'. It's just Python. Same for the other one: you'd just call 'raise'. Ctypes is the opposite model of the standard extension stuff. You're writing in Python so Python stuff is straightforward (just Python) and C stuff is a bit weird. So if you had to populate a Python dictionary from a C struct then it is the reading from the C struct that takes a bit of doing. The writing the Python dictionary is straightforward. If there was a reason to call PyDict_GetItem directly (performance maybe???) then that's possible. You need to set up the function prototype (which you would probably do in a helper library) and then you just call PyDict_GetItem. CTypes would coerce the types. py_object is a native data type. So I think it ends up looking like from PythonConvenienceFunctions import PyDict_GetItem obj = {} key = "Guido" rc = PyDict_GetItem(obj, key) I'm sure an expert will correct me if I'm wrong... Paul Prescod -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060810/d4b50a3e/attachment.html From paul at prescod.net Fri Aug 11 01:57:59 2006 From: paul at prescod.net (Paul Prescod) Date: Thu, 10 Aug 2006 16:57:59 -0700 Subject: [Python-3000] Ctypes as cross-interpreter C calling interface In-Reply-To: <1cb725390608101645g3a9db04dhcf76cfd03e3a15fc@mail.gmail.com> References: <1cb725390608092219v695b7f24t92534d3aa444ca8c@mail.gmail.com> <1cb725390608101319j19731f91vfc472d9113a03ccf@mail.gmail.com> <1cb725390608101645g3a9db04dhcf76cfd03e3a15fc@mail.gmail.com> Message-ID: <1cb725390608101657x447df09cm888228b31e424a87@mail.gmail.com> And if you're curious about how to use ctypes without all of the helper functions set up for you, then I guess it is easiest to poke around the documentation for code samples. >>> printf.argtypes = [c_char_p, c_char_p, c_int, c_double] >>> printf("String '%s', Int %d, Double %f\n", "Hi", 10, 2.2) String 'Hi', Int 10, Double 2.200000 37 >>> >>> from ctypes import c_int, WINFUNCTYPE, windll >>> from ctypes.wintypes import HWND, LPCSTR, UINT >>> prototype = WINFUNCTYPE(c_int, HWND, LPCSTR, LPCSTR, c_uint) >>> paramflags = (1, "hwnd", 0), (1, "text", "Hi"), (1, "caption", None), (1, "flags", 0) >>> MessageBox = prototype(("MessageBoxA", windll.user32), paramflags) It's ugly but in the typical case you would hide all of the declarations in a module (maybe an auto-generated module) and just focus on your logic: >>> MessageBox() >>> MessageBox(text="Spam, spam, spam") >>> MessageBox(flags=2, text="foo bar") >> Paul Prescod -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060810/07616e7c/attachment.htm From guido at python.org Fri Aug 11 02:56:47 2006 From: guido at python.org (Guido van Rossum) Date: Thu, 10 Aug 2006 17:56:47 -0700 Subject: [Python-3000] Ctypes as cross-interpreter C calling interface In-Reply-To: <1cb725390608101645g3a9db04dhcf76cfd03e3a15fc@mail.gmail.com> References: <1cb725390608092219v695b7f24t92534d3aa444ca8c@mail.gmail.com> <1cb725390608101319j19731f91vfc472d9113a03ccf@mail.gmail.com> <1cb725390608101645g3a9db04dhcf76cfd03e3a15fc@mail.gmail.com> Message-ID: On 8/10/06, Paul Prescod wrote: > > I don't know enough about ctypes, but assuming I have a reason to > > write an extension in C (e.g. Tkinter, which uses the Tcl/Tk API), how > > to I use ctypes to call things like PyDict_GetItem() or > > PyErr_SetString()? > > There are two answers to your question. The simplest is that if you have a > dict object called "foo" you just call 'foo["abc"]'. It's just Python. Same > for the other one: you'd just call 'raise'. That doesn't make sense if you want to write your extension in C. Surely you don't propose to rewrite all of tkinter.c in Python? That would be insane. Or Numeric? That would kill performance. > Ctypes is the opposite model of the standard extension stuff. You're writing > in Python so Python stuff is straightforward (just Python) and C stuff is a > bit weird. So if you had to populate a Python dictionary from a C struct > then it is the reading from the C struct that takes a bit of doing. The > writing the Python dictionary is straightforward. > > If there was a reason to call PyDict_GetItem directly (performance maybe???) > then that's possible. You need to set up the function prototype (which you > would probably do in a helper library) and then you just call > PyDict_GetItem. CTypes would coerce the types. py_object is a native data > type. > > So I think it ends up looking like > > from PythonConvenienceFunctions import PyDict_GetItem > > obj = {} > key = "Guido" > > rc = PyDict_GetItem(obj, key) > > I'm sure an expert will correct me if I'm wrong... I guess I object against the idea that we have to write all extensions in Python using ctypes for all C calls. This is okay if there's relatively little interaction with C code. It's insane if you're doing serious C code. And what about C++ extensions? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg.ewing at canterbury.ac.nz Fri Aug 11 03:47:38 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 11 Aug 2006 13:47:38 +1200 Subject: [Python-3000] Ctypes as cross-interpreter C calling interface In-Reply-To: <1cb725390608092219v695b7f24t92534d3aa444ca8c@mail.gmail.com> References: <1cb725390608092219v695b7f24t92534d3aa444ca8c@mail.gmail.com> Message-ID: <44DBE1BA.6000204@canterbury.ac.nz> Paul Prescod wrote: > It seems that the emerging > consensus (bar a security question from Guido) is that ctypes it the way > forward for calling C code in Python 3000. I'd like to clarify what this > might mean: What's the state of play concerning ctypes support on non-x86 platforms? Until ctypes is uniformly supported on all platforms, it can't be considered a complete replacement for C-coded extensions (whether handwritten or generated by something else). -- Greg From lcaamano at gmail.com Fri Aug 11 05:01:45 2006 From: lcaamano at gmail.com (Luis P Caamano) Date: Thu, 10 Aug 2006 23:01:45 -0400 Subject: [Python-3000] threading, part 2 Message-ID: Yes, I also wonder about how non-CPython implementations would handle this but I'd just like to say that this feature, making a thread raise a specific exception from another thread asynchronously is a very useful feature. We have a subsystem that schedules requests that are dispatched in a thread each. The only way to cancel one of those requests right now is via a cooperative checking method in which we explicitly make calls through out the code to see if the request has been canceled, and in such case, the check raises an exception that triggers clean up and cancellation. Problem is we have to spread check calls all over the place. All this would be a lot easier if we could do thread.terminate() as proposed, especially for new code. On 8/9/06, "Guido van Rossum" wrote: > On 8/9/06, Tim Peters wrote: > > [Nick Coghlan] > > >> That check is already there: > > >> > > >> int PyThreadState_SetAsyncExc( long id, PyObject *exc) > > >> Asynchronously raise an exception in a thread. The id argument is the > > >> thread id of the target thread; exc is the exception object to be raised. This > > >> function does not steal any references to exc. To prevent naive misuse, you > > >> must write your own C extension to call this. Must be called with the GIL > > >> held. Returns the number of thread states modified; if it returns a number > > >> greater than one, you're in trouble, and you should call it again with exc set > > >> to NULL to revert the effect. This raises no exceptions. New in version 2.3. > > > > Guido, do you have any idea now what the "number greater than one" > > business is about? That would happen if and only if we found more > > than one thread state with the given thread id in the interpreter's > > list of thread states, but we're counting those with both the GIL and > > the global head_mutex lock held. My impression has been that it would > > be an internal logic error if we ever saw this count exceed 1. > > Right, I think that's it. I guess I was in a grumpy mood when I wrote > this (and Just & Alex never ended up using it!). > > > While I'm at it, I expect: > > > > Py_CLEAR(p->async_exc); > > Py_XINCREF(exc); > > p->async_exc = exc; > > > > is better written: > > > > Py_XINCREF(exc); > > Py_CLEAR(p->async_exc); > > p->async_exc = exc; > > > > for the same reason one should always incref B before decrefing A in > > > > A = B > > > > ... > > That reason that A and B might already be the same object, right? > > > >> All Tober is really asking for is a method on threading.Thread objects that > > >> uses this existing API to set a builtin ThreadExit exception. The thread > > >> module would consider a thread finishing with ThreadExit to be > > >> non-exceptional, so you could easily do: > > >> > > >> th.terminate() # Raise ThreadExit in th's thread of control > > >> th.join() # Should finish up pretty quickly > > >> > > >> Proper resource cleanup would be reliant on correct use of try/finally or with > > >> statements, but that's the case regardless of whether or not asynchronous > > >> exceptions are allowed. > > > > [Guido] > > > I'm +0 on this. > > > > Me too, although it won't stay that simple, and I'm clear as mud on > > how implementations other than CPython could implement this. > > Another good reason to keep it accessible from the C API only. Now I'm > -0 on adding it. I suggest that if someone really wants this > accessible from Python, they should research how Jython, IronPython, > PyPy and Stackless could handle this, and report their research in a > PEP. > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > > -- Luis P Caamano Atlanta, GA USA From greg.ewing at canterbury.ac.nz Fri Aug 11 05:48:48 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 11 Aug 2006 15:48:48 +1200 Subject: [Python-3000] Ctypes as cross-interpreter C calling interface In-Reply-To: <1cb725390608101645g3a9db04dhcf76cfd03e3a15fc@mail.gmail.com> References: <1cb725390608092219v695b7f24t92534d3aa444ca8c@mail.gmail.com> <1cb725390608101319j19731f91vfc472d9113a03ccf@mail.gmail.com> <1cb725390608101645g3a9db04dhcf76cfd03e3a15fc@mail.gmail.com> Message-ID: <44DBFE20.7040900@canterbury.ac.nz> Another thought about ctypes: What if you want to pass a Python function into C as a callback? Does ctypes have a way of handling that? -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing at canterbury.ac.nz +--------------------------------------+ From talin at acm.org Fri Aug 11 15:10:31 2006 From: talin at acm.org (Talin) Date: Fri, 11 Aug 2006 06:10:31 -0700 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <43aa6ff70608091958u2d00db76s48260853942bed32@mail.gmail.com> References: <43aa6ff70608091732o150a6674t4416f4b76d8bc40b@mail.gmail.com> <44DA9E54.5020105@canterbury.ac.nz> <43aa6ff70608091958u2d00db76s48260853942bed32@mail.gmail.com> Message-ID: <44DC81C7.1070905@acm.org> Collin Winter wrote: > The idea is that each developer can pick the notation/semantics that's > most natural to them. I'll go even further: say one library offers a > semantics you find handy for task A, while another library's ideas > about type annotations are best suited for task B. Without a single > standard, you're free to mix and match these libraries to give you a > combination that allows you to best express the ideas you're going > for. Let me tell you a story. Once upon a time, there was a little standard called Midi (Musical Instrument Digital Interface). The Midi standard was small and lightweight, containing less than a dozen commands of 2-3 bytes each. However, they realized that they needed a way to allow hardware vendors to add their own custom message types, so they created a special message type called "System Exclusive Message" or SysEx for short. The idea is that you would send a 3-byte manufacturer ID, and then any subsequent bytes would be considered to be in a vendor-specific format. The MMA (Midi Manufacturers Association) did not provide any guidelines or suggestions as to what the format of those bytes should be - it would be completely up to the vendors to decide what the format of their system exclusive message would be. Since the Midi standard did not define a way to save and load the instrument's memory, vendors typically would use the SysEx message to allow a "bulk dump" of patch information - essentially it was a way to access the instrument's internal state of sounds, programs, sequences, and so on. This would have worked fine, except for the fact that the vendors and the MMA were not the only stakeholders. Just about this time (mid-80s) there began to rise a new type of music company: companies like Mark of the Unicorn, Steinberg Audio and Blue Ribbon Soundworks that created professional music software for personal computers. Some companies made sequencer programs that would allow you to enter musical scores on the computer screen and play them back through your Midi instrument. Other companies worked on a different type of product - a "Universal Librarian", essentially a computer program which would store all of your patches and sound programs for all your different instruments. In 1987 I created a program for the Amiga called Music-X, which was a combination of sequencer and Universal Librarian. In order to create the librarian module, I needed to get information about all of the various vendor-specific protocols Interrupt - as I was typing this last sentence, I knocked over my glass of ice water onto my Powerbook G4, completely toasting the motherboard and damaging the display. 24 hours, and $2700 later, I have completed my "forced upgrade" and can now continue this posting. Lesson to be learned: Internet rants and prescription pain meds do not mix! Be warned! ...which was not that difficult, since most of the vendors wold include an appendix in the back of the users manual (generally written in very bad english) describing the SysEx protocol for that device. I was also able to get my hands on "The Big Midi Book of SysEx protocols", which was essentially the xerox of all of these various appendices, bound up in book form and sold commercially. At the time there were approximately 150 registered vendor IDs, but my idea was that I wouldn't have to implement every protocol - I figured, since all I wanted to do was load and store the resulting information, I didn't really need to *interpret* the data, I just needed to store it. Of course, I would need to interpret any transport-layer instructions (commands, block headers, checksums and so on), since a lot of instruments sent their "data dumps" as multiple SysEx messages which would need to be stored together. But I figured, since I was only supporting two vendor-specific commands for each vendor - bulk dump and bulk load - how different can they all be? Sure, there were likely to be individual variations on how things were done, but I could solve that by creating a per-instrument "personality file" - essentially a set of parameters which would tweak the behavior of my transport module. So for example, one parameter would indicate the type of checksum algorithm to be used, the second would indicate the number of checksum bytes, and so on. For instruments that I couldn't borrow to test, I would rely on my users to fill in the holes (Ah, the heady optimism of the early days of the computer revolution!) and I would then add the user-contributed parameters to each update of the product. I think by now you can start to see where this all goes wrong. I started with a small set of 3 instruments, each from a different manufacturer. I analyzed their bulk data protocols, and came up with an abstract model that encompassed all of them as a superset. Then I added a 4th synth, only to discover that its bulk dump protocol was completely different than the previous three, and so my model had to be rebuild from scratch. No problem, I thought, 3 is too small a sample size anyway. Then I added a 5th synth, and the same thing happened. And a 6th. And so on. For example, every vendor I investigated used a *completely different* algorithm for computing checksums. Some used CRCs, some did simple addition, others used XOR - and some had odd ideas of *which* bytes should be checksummed. Some of the algorithms were really bad too. Different vendors also used different byte encodings. Because Midi is designed to work in an environment where cables can be unplugged at any moment, and because all other Midi messages (other than SysEx) were at most 3 bytes long, the Midi standard required that only 7 bits of each byte could be used to carry data, the 8th bit was reserved for a "start of new message" flag. Different vendors adapted to this challenge with surprising creativity. Some would simply slice the whole dump into units of 7 bits each, crossing the normal byte boundaries. Some would only send 4 bits per Midi Byte. Some did things like: For each 7 bytes of input data, send the bottom 7 bits of each input byte as the first 7 bytes, and then send an 8th byte containing the missing top-bits from the first seven. And then there were those clever manufacturers who simply decided to design their instruments so that no control parameter could have a magnitude greater than 127. Another example of variation was in timing. Roland machines (of certain models) were notorious for rejecting messages if they were sent too fast - you had to wait at least 20 ms from the time you received a message to the time you sent the response. Others would "time out" if you waited too long. There were half-duplex and full-duplex, stateless and stateful protocols, and I could go on. The point is, that there was no way for me to come up with some sort of algorithmic way to describe all of these protocols - the only way to do was in code, with a separate implementation for each and every protocol. Nowadays, I'd simply embed Python into the program and make each personality file a Python script, but I didn't have that option back then. I toyed around with the idea of inventing a custom scripting language specifically for representing dump protocols, but the idea was infeasible at the time. So, if you have had the patience to read through this long-winded anecdote and are wondering how in the hell this relates to Colin's question, I can sum it up in a very short motto (and potential QOTW): "Never question the creative power of an infinite number of monkeys." Or to put it another way: If you create a tool, and you assume that tool will only be used in certain specific ways, but you fail to enforce that limitation, then your assumption will be dead wrong. The idea that there will only be a few type annotation providers who will all nicely cooperate with one another is just as naive as I was in the SysEx debacle. I'll have more focused things to say about this later, but I need to rest. (Had to get that out before all the rant energy dissipated.) -- Talin From krstic at solarsail.hcs.harvard.edu Fri Aug 11 08:44:56 2006 From: krstic at solarsail.hcs.harvard.edu (Ivan Krstic) Date: Fri, 11 Aug 2006 02:44:56 -0400 Subject: [Python-3000] threading, part 2 In-Reply-To: <44DB165B.2040901@gmail.com> References: <1d85506f0608081122r30f89973paf450514b00dcc92@mail.gmail.com> <44D9BCB4.5010404@gmail.com> <1f7befae0608091248q7f328875x7c2d03723acbf8d2@mail.gmail.com> <1f7befae0608091838u594de27ctb83dd0845ccaa0@mail.gmail.com> <44DAC4F6.3010002@solarsail.hcs.harvard.edu> <44DB165B.2040901@gmail.com> Message-ID: <44DC2768.7060009@solarsail.hcs.harvard.edu> Nick Coghlan wrote: > Given the time frame, I think you might be stuck with using ctypes to > get at the functionality for Python 2.5. That's probably no worse a way to do it than calling an underscored CPython function; I keep forgetting we're getting out-of-the-box ctypes goodness in 2.5. -- Ivan Krstic | GPG: 0x147C722D From theller at python.net Fri Aug 11 08:58:51 2006 From: theller at python.net (Thomas Heller) Date: Fri, 11 Aug 2006 08:58:51 +0200 Subject: [Python-3000] Ctypes as cross-interpreter C calling interface In-Reply-To: <44DBFE20.7040900@canterbury.ac.nz> References: <1cb725390608092219v695b7f24t92534d3aa444ca8c@mail.gmail.com> <1cb725390608101319j19731f91vfc472d9113a03ccf@mail.gmail.com> <1cb725390608101645g3a9db04dhcf76cfd03e3a15fc@mail.gmail.com> <44DBFE20.7040900@canterbury.ac.nz> Message-ID: Greg Ewing schrieb: > Another thought about ctypes: What if you want to pass > a Python function into C as a callback? Does ctypes > have a way of handling that? > Sure. The tutorial has an example that calls qsort with a Python comparison function: http://docs.python.org/dev/lib/ctypes-callback-functions.html Thomas From theller at python.net Fri Aug 11 09:10:01 2006 From: theller at python.net (Thomas Heller) Date: Fri, 11 Aug 2006 09:10:01 +0200 Subject: [Python-3000] Ctypes as cross-interpreter C calling interface In-Reply-To: <44DBE1BA.6000204@canterbury.ac.nz> References: <1cb725390608092219v695b7f24t92534d3aa444ca8c@mail.gmail.com> <44DBE1BA.6000204@canterbury.ac.nz> Message-ID: Greg Ewing schrieb: > Paul Prescod wrote: >> It seems that the emerging >> consensus (bar a security question from Guido) is that ctypes it the way >> forward for calling C code in Python 3000. I'd like to clarify what this >> might mean: > > What's the state of play concerning ctypes support > on non-x86 platforms? Pretty good, I would say. Look, for example, at the buildbots. Major architectures that are currently *not* supported: - Linux/BSD/arm (because the libffi/arm doesn't support closures, although ctypes on WindowsCE/arm works) - Windows/AMD64 (This is probably currently not a major platform. Sometimes I'm working on a port for this) - I know that there are some problems on solaris, although the solaris10/sparc buildbot does not report probems. > Until ctypes is uniformly supported on all platforms, > it can't be considered a complete replacement for > C-coded extensions (whether handwritten or generated > by something else). > > -- > Greg Thomas From tomerfiliba at gmail.com Fri Aug 11 09:33:00 2006 From: tomerfiliba at gmail.com (tomer filiba) Date: Fri, 11 Aug 2006 09:33:00 +0200 Subject: [Python-3000] threading, part 2 In-Reply-To: References: <1d85506f0608101214g594d2dal282ab2ae60f29f11@mail.gmail.com> Message-ID: <1d85506f0608110033k2eac1f9h10908ddbef5db8c3@mail.gmail.com> [Guido] > I expect that Jython doesn't implement this; it doesn't handle ^C either AFAIK. threads are at most platform agnostic (old unices, embedded systems, etc. are not likely to have thread support) so keeping this in mind, and having interrupt_main part of the standard thread API, which as you say, may not be implementation agnostic, why is thread.raise_exc(id, excobj) a bad API? and as i recall, dotNET's Thread.AbortThread or whatever it's called works that way (raising an exception in the other thread), so IronPython for once, should be happy with it. by the way, is the GIL part of the python standard? i.e., does IronPython implement it, although it shouldn't be necessary in dotNET? -tomer From slawomir.nowaczyk.847 at student.lu.se Fri Aug 11 12:48:32 2006 From: slawomir.nowaczyk.847 at student.lu.se (Slawomir Nowaczyk) Date: Fri, 11 Aug 2006 12:48:32 +0200 Subject: [Python-3000] threading, part 2 In-Reply-To: References: Message-ID: <20060811102346.EFC4.SLAWOMIR.NOWACZYK.847@student.lu.se> On Thu, 10 Aug 2006 23:01:45 -0400 Luis P Caamano wrote: #> Yes, I also wonder about how non-CPython implementations would handle #> this but I'd just like to say that this feature, making a thread raise #> a specific exception from another thread asynchronously is a very #> useful feature. #> #> We have a subsystem that schedules requests that are dispatched in a #> thread each. The only way to cancel one of those requests right now #> is via a cooperative checking method in which we explicitly make calls #> through out the code to see if the request has been canceled, and in #> such case, the check raises an exception that triggers clean up and #> cancellation. #> #> Problem is we have to spread check calls all over the place. All this #> would be a lot easier if we could do thread.terminate() as proposed, #> especially for new code. "All over the place"? Literally? In other words, how likely is it that your code would still be correct if you had this check after *every* single statement? Or even more often -- every N bytecodes? I believe that if asynchronous exception raising ever gets officially approved, there absolutely *needs* to be a way to block it for a piece of code that should execute atomically. It is (more or less) OK to have an unofficial way to terminate the thread, with "use on your own risk", because there are situations where it is useful and (in a cooperative environment) reasonably safe thing to do. But it should not be done lightly and never when the code is not specifically expecting it. -- Best wishes, Slawomir Nowaczyk ( Slawomir.Nowaczyk at cs.lth.se ) Live in the past and future only. From pje at telecommunity.com Fri Aug 11 17:32:55 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri, 11 Aug 2006 11:32:55 -0400 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: Message-ID: <5.1.1.6.0.20060811112118.023af398@sparrow.telecommunity.com> At 06:10 AM 8/11/2006 -0700, Talin wrote: >Or to put it another way: If you create a tool, and you assume that tool >will only be used in certain specific ways, but you fail to enforce that >limitation, then your assumption will be dead wrong. The idea that there >will only be a few type annotation providers who will all nicely >cooperate with one another is just as naive as I was in the SysEx debacle. Are you saying that function annotations are a bad idea because we won't be able to pickle them? If not, your entire argument seems specious. Actually, even if that *is* your argument, it's specious, since all that's needed to support pickling is to support pickling. All that's needed to support printing is to support printing (via __str__), and so on. Thus, by a similar process of analogy, all that's needed to support any operation is to have an extensible mechanism by which the operation is defined, so that the operation can be extended to include new types -- i.e., an overloadable function, like pickle.dump. Conversely, using your analogy, one could say that the iteration protocol is a bad idea because lots of people might then have to implement their own __iter__ methods. We should thus only have a fixed set of sequence types! In short, your argument is based on a false analogy and is nonsensical when moved out of the realm of on-the-wire protocols and into the realm of a programming language. From jcarlson at uci.edu Fri Aug 11 17:45:54 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Fri, 11 Aug 2006 08:45:54 -0700 Subject: [Python-3000] threading, part 2 In-Reply-To: <20060811102346.EFC4.SLAWOMIR.NOWACZYK.847@student.lu.se> References: <20060811102346.EFC4.SLAWOMIR.NOWACZYK.847@student.lu.se> Message-ID: <20060811082620.192E.JCARLSON@uci.edu> Slawomir Nowaczyk wrote: > I believe that if asynchronous exception raising ever gets officially > approved, there absolutely *needs* to be a way to block it for a piece > of code that should execute atomically. There is already a way of making Python source execution atomic with respect to other Python code [1]. > But it should not be done lightly and never when the code is not > specifically expecting it. If you don't want random exceptions being raised in your threads, then don't use this method that is capable of raising exceptions somewhat randomly. - Josiah [1] Remove the two sys.setcheckinterval calls to verify this works. "proper" use should probably use try/finally wrapping. >>> import sys >>> import threading >>> import time >>> >>> x = 0 >>> >>> >>> def thr(n): ... global x ... while not x: ... time.sleep(.01) ... for i in xrange(n): ... sys.setcheckinterval(sys.maxint) ... _x = x + 1 ... x, _x = _x, x ... sys.setcheckinterval(100) ... >>> >>> for i in xrange(10): ... threading.Thread(target=thr, args=(1000000,)).start() ... >>> x += 1 >>> while threading.activeCount() > 1: ... time.sleep(.1) ... >>> print x 10000001 >>> From jason.orendorff at gmail.com Fri Aug 11 17:47:39 2006 From: jason.orendorff at gmail.com (Jason Orendorff) Date: Fri, 11 Aug 2006 11:47:39 -0400 Subject: [Python-3000] threading, part 2 In-Reply-To: <1d85506f0608110033k2eac1f9h10908ddbef5db8c3@mail.gmail.com> References: <1d85506f0608101214g594d2dal282ab2ae60f29f11@mail.gmail.com> <1d85506f0608110033k2eac1f9h10908ddbef5db8c3@mail.gmail.com> Message-ID: On 8/11/06, tomer filiba wrote: > why is thread.raise_exc(id, excobj) a bad API? It breaks seemingly innocent code in subtle ways. Worse, the breakage will always be a race condition, so it'll be especially hard to reproduce and debug. class Foo: ... def close(self): self.f.close() self.closed = True Any code that uses the "closed" attribute obviously depends on it being properly set, right? This close() method gets this right. It sets "closed" if and only if the self.f.close() call succeeds. There are circumstances where this will fail: MemoryError, KeyboardInterrupt, a broken trace function, a broken __setattr__(), del __builtins__.True... but all are extreme cases. I think thread.raise_exc() should be considered extreme too. Otherwise, its existence must be considered to degrade the reliability of the above code. I'm not saying "don't add this". Maybe it's useful, particuarly as a fallback mechanism for killing a runaway thread. But it should be documented as an extreme measure. -j From jcarlson at uci.edu Fri Aug 11 18:04:54 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Fri, 11 Aug 2006 09:04:54 -0700 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <5.1.1.6.0.20060811112118.023af398@sparrow.telecommunity.com> References: <5.1.1.6.0.20060811112118.023af398@sparrow.telecommunity.com> Message-ID: <20060811084623.1931.JCARLSON@uci.edu> "Phillip J. Eby" wrote: > > At 06:10 AM 8/11/2006 -0700, Talin wrote: > >Or to put it another way: If you create a tool, and you assume that tool > >will only be used in certain specific ways, but you fail to enforce that > >limitation, then your assumption will be dead wrong. The idea that there > >will only be a few type annotation providers who will all nicely > >cooperate with one another is just as naive as I was in the SysEx debacle. > > Are you saying that function annotations are a bad idea because we won't be > able to pickle them? That is not what I got out of the message at all. > If not, your entire argument seems specious. Actually, even if that *is* > your argument, it's specious, since all that's needed to support pickling > is to support pickling. All that's needed to support printing is to > support printing (via __str__), and so on. I think you misunderstood Talin. While it was a pain for him to work his way through implementing all of the loading/etc. protocols, I believe his point was that if we allow any and all arbitrary metadata to be placed on arguments to and from functions, then invariably there will be multiple methods of doing as much. That isn't a problem unto itself, but when there ends up being multiple metadata formats, with multiple interpretations of them, and a user decides that they want to combine the functionality of two metadata formats, they may be stuck due to incompatibilities, etc. I think that it can be fixed by defining a standard mechanism for 'metadata chaining', one involving tuples and/or dictionaries. Say, for example, we have the following function definition: def foo(argn:meta=dflt): ... Since meta can take on the value of a Python expression (executed during compile-time), a tuple-based chaining would work like so: @chainmetadatatuple(meta_fcn1, meta_fcn2) def foo(argn:(meta1, meta2)=dflt): ... And a dictionary-based chaining would work like so: @chainmetadatadict(m1=meta_fcn1, m2=meta_fcn2) def foo(argn:{'m1'=meta1, 'm2'=meta2}=dflt): ... The reason to include the dict-based option is to allow for annotations to be optional. This method may or may not be good. But, if we don't define a standard method for metadata to be combined from multiple protocols, etc., then we could end up with incompatabilities. However, if we do define a standard chaining mechanism, then it can be used, and presumably we shouldn't run into problems relating to incompatible annotation, etc. - Josiah From jason.orendorff at gmail.com Fri Aug 11 18:04:09 2006 From: jason.orendorff at gmail.com (Jason Orendorff) Date: Fri, 11 Aug 2006 12:04:09 -0400 Subject: [Python-3000] threading, part 2 In-Reply-To: <20060811082620.192E.JCARLSON@uci.edu> References: <20060811102346.EFC4.SLAWOMIR.NOWACZYK.847@student.lu.se> <20060811082620.192E.JCARLSON@uci.edu> Message-ID: On 8/11/06, Josiah Carlson wrote: > Slawomir Nowaczyk wrote: > > But it should not be done lightly and never when the code is not > > specifically expecting it. > > If you don't want random exceptions being raised in your threads, then > don't use this method that is capable of raising exceptions somewhat > randomly. I agree. The only question is how dire the warnings should be. I'll answer that question with another question: Are we going to make the standard library robust against asynchronous exceptions? For example, class Thread has an attribute __stopped that is set using code similar to the example code I posted. An exception at just the wrong time would kill the thread while leaving __stopped == False. Maybe that particular case is worth fixing, but to find and fix them all? Better to put strong warnings on this one method: may cause unpredictable brokenness. -j From jcarlson at uci.edu Fri Aug 11 18:15:32 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Fri, 11 Aug 2006 09:15:32 -0700 Subject: [Python-3000] threading, part 2 In-Reply-To: References: <20060811082620.192E.JCARLSON@uci.edu> Message-ID: <20060811091309.1934.JCARLSON@uci.edu> "Jason Orendorff" wrote: > > On 8/11/06, Josiah Carlson wrote: > > Slawomir Nowaczyk wrote: > > > But it should not be done lightly and never when the code is not > > > specifically expecting it. > > > > If you don't want random exceptions being raised in your threads, then > > don't use this method that is capable of raising exceptions somewhat > > randomly. > > I agree. The only question is how dire the warnings should be. > > I'll answer that question with another question: Are we going to make > the standard library robust against asynchronous exceptions? For > example, class Thread has an attribute __stopped that is set using > code similar to the example code I posted. An exception at just the > wrong time would kill the thread while leaving __stopped == False. > > Maybe that particular case is worth fixing, but to find and fix them > all? Better to put strong warnings on this one method: may cause > unpredictable brokenness. Considering that it will not be accessable via standard Python, only through a few ctypes hoops, I believe that is a fairly ready indication that one should be wary of its use. I also think it would make sense to fix that particular instance (to not do so seems to be a bit foolish). - Josiah From qrczak at knm.org.pl Fri Aug 11 19:00:07 2006 From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk) Date: Fri, 11 Aug 2006 19:00:07 +0200 Subject: [Python-3000] threading, part 2 In-Reply-To: <20060811082620.192E.JCARLSON@uci.edu> (Josiah Carlson's message of "Fri, 11 Aug 2006 08:45:54 -0700") References: <20060811102346.EFC4.SLAWOMIR.NOWACZYK.847@student.lu.se> <20060811082620.192E.JCARLSON@uci.edu> Message-ID: <87fyg32oo8.fsf@qrnik.zagroda> Josiah Carlson writes: > There is already a way of making Python source execution atomic with > respect to other Python code [1]. It's not realistic to expect sys.setcheckinterval be implementable on other runtimes. Also, it doesn't provide a way to unblock asynchronous exceptions until a particular blocking operation completes. > If you don't want random exceptions being raised in your threads, then > don't use this method that is capable of raising exceptions somewhat > randomly. It's like saying "if you don't want integer addition overflow, then don't do addition". I do want asynchronous exceptions, but not anywhere, only in selected regions (or excluding selected regions). This can be designed well. -- __("< Marcin Kowalczyk \__/ qrczak at knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/ From jcarlson at uci.edu Fri Aug 11 20:18:56 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Fri, 11 Aug 2006 11:18:56 -0700 Subject: [Python-3000] threading, part 2 In-Reply-To: <87fyg32oo8.fsf@qrnik.zagroda> References: <20060811082620.192E.JCARLSON@uci.edu> <87fyg32oo8.fsf@qrnik.zagroda> Message-ID: <20060811105742.193A.JCARLSON@uci.edu> "Marcin 'Qrczak' Kowalczyk" wrote: > > Josiah Carlson writes: > > > There is already a way of making Python source execution atomic with > > respect to other Python code [1]. > > It's not realistic to expect sys.setcheckinterval be implementable on > other runtimes. The 'raise an exception in an alternate thread' functionality is a CPython specific functionality. If you believe that it could be implemented in all other runtimes, then you missed the discussion that stated that it would be impossible to implement in Jython. As such, because both are CPython specific features, I don't see a problem with using both if you are going to be using one of them. > Also, it doesn't provide a way to unblock asynchronous exceptions until > a particular blocking operation completes. I thought the point of this 'block asynchronous exceptions' business was to block asynchronous exceptions during a particular bit of code. Now you are saying that there needs to be a method of bypassing such blocking from other threads? > > If you don't want random exceptions being raised in your threads, then > > don't use this method that is capable of raising exceptions somewhat > > randomly. > > It's like saying "if you don't want integer addition overflow, then > don't do addition". No. Integer addition is a defined feature of the language. Raising exceptions in an alternate thread is a generally unsupported feature available to CPython, very likely not implementable in most other runtimes. It has previously been available via ctypes, but its previous non-use is a function of its lack of documentation, lack of cytpes shipping with base Python, etc. > I do want asynchronous exceptions, but not anywhere, only in selected > regions (or excluding selected regions). This can be designed well. Yes, it can be. You can add a lock to each thread (each thread gets its own lock). When a thread doesn't want to be interrupted, it .acquire()s its lock. When it is OK to interrupt it, it .release()s its lock. When you want to kill a thread, .acquire() its lock, and kill it. In effect, the above would be what is necessary to give you what you want. It can easily be defined as a set of 3 functions, whose implementation should be left out of the standard library. Including it in the standard library offers the illusion of support (in the 'this language feature is supported' sense) for raising an exception in an alternate thread, which is not the case (it is available, but not supported). - Josiah From qrczak at knm.org.pl Fri Aug 11 21:33:10 2006 From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk) Date: Fri, 11 Aug 2006 21:33:10 +0200 Subject: [Python-3000] threading, part 2 In-Reply-To: <20060811105742.193A.JCARLSON@uci.edu> (Josiah Carlson's message of "Fri, 11 Aug 2006 11:18:56 -0700") References: <20060811082620.192E.JCARLSON@uci.edu> <87fyg32oo8.fsf@qrnik.zagroda> <20060811105742.193A.JCARLSON@uci.edu> Message-ID: <87veozoyo9.fsf@qrnik.zagroda> Josiah Carlson writes: >> It's not realistic to expect sys.setcheckinterval be implementable on >> other runtimes. > > The 'raise an exception in an alternate thread' functionality is a > CPython specific functionality. If you believe that it could be > implemented in all other runtimes, then you missed the discussion that > stated that it would be impossible to implement in Jython. Indeed both are hard to implement on some runtimes. I believe there are runtimes where asynchronous exceptions are practical while blocking context switching is not (e.g. POSIX threads combined with Unix signals and C++ exceptions). In any case, blocking switching the context to any other thread is an overkill. It's hard to say how sys.setcheckinterval should behave on truly parallel runtimes, while the semantics of blockable asynchronous exceptions doesn't depend on threads being dispatched sequentially. >> Also, it doesn't provide a way to unblock asynchronous exceptions until >> a particular blocking operation completes. > > I thought the point of this 'block asynchronous exceptions' business > was to block asynchronous exceptions during a particular bit of code. > Now you are saying that there needs to be a method of bypassing such > blocking from other threads? No, I'm talking about specifying the blocking behavior by the thread to be interrupted. It makes sense to wait for e.g. accept() such that asynchronous exceptions are processed during the wait, but that they are atomically blocked as soon as a connection is accepted. Unfortunately it's yet another obstacle to some runtimes. Yet another issue is asynchronous "signals" which don't necessarily throw an exception but cause the computation to react and possibly continue (e.g. suspend a thread until it's resumed). > Yes, it can be. You can add a lock to each thread (each thread gets its > own lock). When a thread doesn't want to be interrupted, it .acquire()s > its lock. When it is OK to interrupt it, it .release()s its lock. When > you want to kill a thread, .acquire() its lock, and kill it. This works almost well. The thread sending an exception is unnecessarily blocked; this could be solved by starting another thread to send an exception. And it doesn't support the mentioned unblocking only while waiting. The problem is that there is no universally recognized convention: I can't expect third-party libraries to protect their sensitive regions by my mutex. Without an agreed convention they can't even if they want to. My design includes implicit blocking of asynchronous exception by certain language constructs, e.g. by taking *any* mutex. Most cases of taking a mutex also want to block asynchronous signals. I'm surprised that various runtimes that I would expect to be well designed provide mostly either unsafe or too restricted means of asynchronous interruption. http://java.sun.com/j2se/1.5.0/docs/guide/misc/threadPrimitiveDeprecation.html http://www.interact-sw.co.uk/iangblog/2004/11/12/cancellation -- __("< Marcin Kowalczyk \__/ qrczak at knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/ From pje at telecommunity.com Fri Aug 11 21:34:01 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri, 11 Aug 2006 15:34:01 -0400 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <20060811084623.1931.JCARLSON@uci.edu> References: <5.1.1.6.0.20060811112118.023af398@sparrow.telecommunity.com> <5.1.1.6.0.20060811112118.023af398@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20060811152032.023a8fc0@sparrow.telecommunity.com> At 09:04 AM 8/11/2006 -0700, Josiah Carlson wrote: >I think you misunderstood Talin. While it was a pain for him to work >his way through implementing all of the loading/etc. protocols, I >believe his point was that if we allow any and all arbitrary metadata to >be placed on arguments to and from functions, then invariably there will >be multiple methods of doing as much. That isn't a problem unto itself, >but when there ends up being multiple metadata formats, with multiple >interpretations of them, and a user decides that they want to combine >the functionality of two metadata formats, they may be stuck due to >incompatibilities, etc. I was giving him the benefit of the doubt by assuming he was bringing up a *new* objection that I hadn't already answered. This "incompatibility" argument has already been addressed; it is trivially solved by overloaded functions (e.g. pickle.dump(), str(), iter(), etc.). >This method may or may not be good. But, if we don't define a standard >method for metadata to be combined from multiple protocols, etc., then >we could end up with incompatabilities. Not if you use overloaded functions to define the operations you're going to perform. You and Talin are proposing a problem here that is not only hypothetical, it's non-existent. Remember, PEAK already does this kind of openly-extensible metadata for attributes, using a single-dispatch overloaded function (analagous to pickle.dump). If you want to show that it's really possible to create "incompatible" annotations, try creating some for attributes in PEAK. But, you'll quickly find that the only "meaning" that metadata has is *operational*. That is, either some behavior is influenced by the metadata, or no behavior is. If no behavior is involved, then there can be no incompatibility. If there is behavior, there is an operation to be performed, and that operation can be based on the type of the metadata. Ergo, using an overloadable function for the operation to be performed allows a meaning to be defined for the specific combination of operation and type. Therefore, there is no problem - every piece of metadata may be assigned a meaning that is relevant for each operation that needs to be performed. Now, it is of course possible that two pieces of metadata may be contradictory, redundant, overlapping, etc. However, this has nothing to do with whether the semantics of metadata are predefined. Any sufficiently-useful annotation scheme will include these possibilities, and the operations to be performed are going to have to have some defined semantics for them. This is entirely independent of whether there is more than one metadata framework in existence. From jcarlson at uci.edu Fri Aug 11 22:12:15 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Fri, 11 Aug 2006 13:12:15 -0700 Subject: [Python-3000] threading, part 2 In-Reply-To: <87veozoyo9.fsf@qrnik.zagroda> References: <20060811105742.193A.JCARLSON@uci.edu> <87veozoyo9.fsf@qrnik.zagroda> Message-ID: <20060811125449.1940.JCARLSON@uci.edu> Threading is already difficult enough to do 'right' (see the dozens of threads discussing why this is really the case), and designing software that can survive the raising of an exception at any point makes threading even more difficult. I believe that you are attempting to design an interface to make this particular feature foolproof. I think that such is a mistake; killing a thread should be frought with gotchas and should be documented as "may crash the runtime". Offering users anything more is tantamount to encouraging its use, which is counter to the reasons why it is not available via a standard threading.function call: because it shouldn't be used at all, except by people who know what the heck they are doing. I believe that if a user cannot design and implement their own system to handle when a thread can be killed or not to their own satisfaction, then they have no business killing threads. - Josiah "Marcin 'Qrczak' Kowalczyk" wrote: > Josiah Carlson writes: > > >> It's not realistic to expect sys.setcheckinterval be implementable on > >> other runtimes. > > > > The 'raise an exception in an alternate thread' functionality is a > > CPython specific functionality. If you believe that it could be > > implemented in all other runtimes, then you missed the discussion that > > stated that it would be impossible to implement in Jython. > > Indeed both are hard to implement on some runtimes. > > I believe there are runtimes where asynchronous exceptions are > practical while blocking context switching is not (e.g. POSIX threads > combined with Unix signals and C++ exceptions). > > In any case, blocking switching the context to any other thread is an > overkill. It's hard to say how sys.setcheckinterval should behave on > truly parallel runtimes, while the semantics of blockable asynchronous > exceptions doesn't depend on threads being dispatched sequentially. > > >> Also, it doesn't provide a way to unblock asynchronous exceptions until > >> a particular blocking operation completes. > > > > I thought the point of this 'block asynchronous exceptions' business > > was to block asynchronous exceptions during a particular bit of code. > > Now you are saying that there needs to be a method of bypassing such > > blocking from other threads? > > No, I'm talking about specifying the blocking behavior by the thread > to be interrupted. It makes sense to wait for e.g. accept() such that > asynchronous exceptions are processed during the wait, but that they > are atomically blocked as soon as a connection is accepted. > > Unfortunately it's yet another obstacle to some runtimes. > > Yet another issue is asynchronous "signals" which don't necessarily > throw an exception but cause the computation to react and possibly > continue (e.g. suspend a thread until it's resumed). > > > Yes, it can be. You can add a lock to each thread (each thread gets its > > own lock). When a thread doesn't want to be interrupted, it .acquire()s > > its lock. When it is OK to interrupt it, it .release()s its lock. When > > you want to kill a thread, .acquire() its lock, and kill it. > > This works almost well. The thread sending an exception is unnecessarily > blocked; this could be solved by starting another thread to send an > exception. And it doesn't support the mentioned unblocking only while > waiting. > > The problem is that there is no universally recognized convention: > I can't expect third-party libraries to protect their sensitive > regions by my mutex. Without an agreed convention they can't even > if they want to. > > My design includes implicit blocking of asynchronous exception by > certain language constructs, e.g. by taking *any* mutex. Most cases > of taking a mutex also want to block asynchronous signals. > > I'm surprised that various runtimes that I would expect to be well > designed provide mostly either unsafe or too restricted means of > asynchronous interruption. > http://java.sun.com/j2se/1.5.0/docs/guide/misc/threadPrimitiveDeprecation.html > http://www.interact-sw.co.uk/iangblog/2004/11/12/cancellation > > -- > __("< Marcin Kowalczyk > \__/ qrczak at knm.org.pl > ^^ http://qrnik.knm.org.pl/~qrczak/ > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/jcarlson%40uci.edu From jcarlson at uci.edu Fri Aug 11 22:46:42 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Fri, 11 Aug 2006 13:46:42 -0700 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <5.1.1.6.0.20060811152032.023a8fc0@sparrow.telecommunity.com> References: <20060811084623.1931.JCARLSON@uci.edu> <5.1.1.6.0.20060811152032.023a8fc0@sparrow.telecommunity.com> Message-ID: <20060811131616.1943.JCARLSON@uci.edu> "Phillip J. Eby" wrote: > At 09:04 AM 8/11/2006 -0700, Josiah Carlson wrote: > >I think you misunderstood Talin. While it was a pain for him to work > >his way through implementing all of the loading/etc. protocols, I > >believe his point was that if we allow any and all arbitrary metadata to > >be placed on arguments to and from functions, then invariably there will > >be multiple methods of doing as much. That isn't a problem unto itself, > >but when there ends up being multiple metadata formats, with multiple > >interpretations of them, and a user decides that they want to combine > >the functionality of two metadata formats, they may be stuck due to > >incompatibilities, etc. > > I was giving him the benefit of the doubt by assuming he was bringing up a > *new* objection that I hadn't already answered. This "incompatibility" > argument has already been addressed; it is trivially solved by overloaded > functions (e.g. pickle.dump(), str(), iter(), etc.). In effect, you seem to be saying "when user X wants to add their own metadata with interpretation, they need to overload the previously existing metadata interpreter". However, as has already been stated, because there is no standard metadata interpreter, nor a standard method for chaining metadata, how is user X supposed to overload the previously existing metadata interpreter? Since you brought up pickle.dump(), str(), iter(), etc., I'll point out that str(), iter(), etc., call special methods on the defined object (__str__, __iter__, etc.), and while pickle can have picklers be registered, it also has a special method interface. Because all of the metadata defined is (according to the pre-PEP) attached to a single __signature__ attribute of the function, interpretation of the metadata isn't as easy as calling str(obj), as you claim. Let us say that I have two metadata interpters. One that believes that the metadata is types and wants to verify type on function call. The other believes that the metadata is documentation. Both were written without regards to the other. Please describe to me (in code preferably) how I would be able to use both of them without having a defined metadata interpretation chaining semantic. > >This method may or may not be good. But, if we don't define a standard > >method for metadata to be combined from multiple protocols, etc., then > >we could end up with incompatabilities. > > Not if you use overloaded functions to define the operations you're going > to perform. You and Talin are proposing a problem here that is not only > hypothetical, it's non-existent. > > Remember, PEAK already does this kind of openly-extensible metadata for > attributes, using a single-dispatch overloaded function (analagous to > pickle.dump). If you want to show that it's really possible to create > "incompatible" annotations, try creating some for attributes in PEAK. Could you at least provide a link to where it is documented how to create metadata attributes in PEAK? My attempts to delve into PEAK documentation has thus far failed horribly. - Josiah From pje at telecommunity.com Fri Aug 11 23:11:00 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri, 11 Aug 2006 17:11:00 -0400 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <20060811131616.1943.JCARLSON@uci.edu> References: <5.1.1.6.0.20060811152032.023a8fc0@sparrow.telecommunity.com> <20060811084623.1931.JCARLSON@uci.edu> <5.1.1.6.0.20060811152032.023a8fc0@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20060811165113.03cabe60@sparrow.telecommunity.com> At 01:46 PM 8/11/2006 -0700, Josiah Carlson wrote: >"Phillip J. Eby" wrote: > > At 09:04 AM 8/11/2006 -0700, Josiah Carlson wrote: > > >I think you misunderstood Talin. While it was a pain for him to work > > >his way through implementing all of the loading/etc. protocols, I > > >believe his point was that if we allow any and all arbitrary metadata to > > >be placed on arguments to and from functions, then invariably there will > > >be multiple methods of doing as much. That isn't a problem unto itself, > > >but when there ends up being multiple metadata formats, with multiple > > >interpretations of them, and a user decides that they want to combine > > >the functionality of two metadata formats, they may be stuck due to > > >incompatibilities, etc. > > > > I was giving him the benefit of the doubt by assuming he was bringing up a > > *new* objection that I hadn't already answered. This "incompatibility" > > argument has already been addressed; it is trivially solved by overloaded > > functions (e.g. pickle.dump(), str(), iter(), etc.). > >In effect, you seem to be saying "when user X wants to add their own >metadata with interpretation, they need to overload the previously >existing metadata interpreter". No, they need to overload whatever *operation* is being performed *on* the metadata. For example, if I am using a decorator that adds type checking to the function, then that decorator is an example of an operation that should be overloadable. More precisely, that decorator would probably have an operation that generates type checking code for an individual type annotation -- and *that* is the operation that would need overloading. The "generate_typecheck_code()" operation would be an overloadable function. Another possible operation: printing help for a function. You would need a "format_type_annotation()" overloadable operation, and so on. There is no *single* "metadata interpreter", in other words. There are just operations you perform on metadata. If multiple people define different variants of the same operation, let's say "generate_typecheck_code()" and "generate_code_for_typecheck()", and you have some code that defines methods for one overloadable function, but you have code that wants to call the other, you just write some methods for one that call the other, or make one be the default implementation for the other. There is no need for a *single* canonical operation *or* type. This is the whole point of generic functions, really. They eliminate the need for One Framework To Rule Them All, and tend to dissolve the "framework"ness right out of frameworks. What you end up with are extensible libraries instead of frameworks. >Since you brought up pickle.dump(), str(), iter(), etc., I'll point out >that str(), iter(), etc., call special methods on the defined object >(__str__, __iter__, etc.), and while pickle can have picklers be >registered, it also has a special method interface. Because all of the >metadata defined is (according to the pre-PEP) attached to a single >__signature__ attribute of the function, interpretation of the metadata >isn't as easy as calling str(obj), as you claim. Actually, with overloadable functions, it is, since overloadable functions can be extended by anybody, without needing to monkey with the classes. Note that if Guido had originally created Python with overloadable functions, it's rather unlikely that __special__ methods would have arisen. Instead, it's much more likely that there would be syntax sugar for easily defining overloads, like "defop str(self): ...". >Let us say that I have two metadata interpters. One that believes that >the metadata is types and wants to verify type on function call. The >other believes that the metadata is documentation. Both were written >without regards to the other. Please describe to me (in code preferably) >how I would be able to use both of them without having a defined >metadata interpretation chaining semantic. See explanation above. > > Remember, PEAK already does this kind of openly-extensible metadata for > > attributes, using a single-dispatch overloaded function (analagous to > > pickle.dump). If you want to show that it's really possible to create > > "incompatible" annotations, try creating some for attributes in PEAK. > >Could you at least provide a link to where it is documented how to >create metadata attributes in PEAK? My attempts to delve into PEAK >documentation has thus far failed horribly. Here's the tutorial for defining new metadata (among other things): http://svn.eby-sarna.com/PEAK/src/peak/binding/attributes.txt?view=markup The example defines a "Message()" metadata type whose sole purpose is to print a message when the attribute is declared. What's not really explained there is that all the 'addMethod' stuff is basically adding methods to an overloaded function. Anyway, PEAK uses this simple metadata declaration system to implement both security permission declarations: http://peak.telecommunity.com/DevCenter/SecurityRules#linking-actions-to-permissions and command-line options: http://peak.telecommunity.com/DevCenter/OptionsHowTo#declaring-options In PEAK's case, a single overloaded operation is invoked when the metadata is defined, and then that overloaded operation performs whatever actions are relevant for the metadata. For function metadata, however, it's sufficient to use distinct overloaded functions for distinct operations and not actually "do" anything unless it's needed. However, if we wanted things to be able to happen just by declaring metadata (without using any decorators or performing any other operations), then yes, the language would need some equivalent to PEAK's "declareAttribute()" overloaded function. However, my understanding of the proposal was that annotations were intended to be inert and purely informational *unless* processed by a decorator or some other mechanism. From talin at acm.org Sat Aug 12 00:16:11 2006 From: talin at acm.org (Talin) Date: Fri, 11 Aug 2006 15:16:11 -0700 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <5.1.1.6.0.20060811112118.023af398@sparrow.telecommunity.com> References: <5.1.1.6.0.20060811112118.023af398@sparrow.telecommunity.com> Message-ID: <44DD01AB.20809@acm.org> Phillip J. Eby wrote: > At 06:10 AM 8/11/2006 -0700, Talin wrote: >> Or to put it another way: If you create a tool, and you assume that tool >> will only be used in certain specific ways, but you fail to enforce that >> limitation, then your assumption will be dead wrong. The idea that there >> will only be a few type annotation providers who will all nicely >> cooperate with one another is just as naive as I was in the SysEx >> debacle. > > Are you saying that function annotations are a bad idea because we won't > be able to pickle them? Huh? What does pickling have to do with anything I said? -- Talin From talin at acm.org Sat Aug 12 00:39:56 2006 From: talin at acm.org (Talin) Date: Fri, 11 Aug 2006 15:39:56 -0700 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <20060811084623.1931.JCARLSON@uci.edu> References: <5.1.1.6.0.20060811112118.023af398@sparrow.telecommunity.com> <20060811084623.1931.JCARLSON@uci.edu> Message-ID: <44DD073C.7030305@acm.org> Josiah Carlson wrote: > "Phillip J. Eby" wrote: >> At 06:10 AM 8/11/2006 -0700, Talin wrote: >>> Or to put it another way: If you create a tool, and you assume that tool >>> will only be used in certain specific ways, but you fail to enforce that >>> limitation, then your assumption will be dead wrong. The idea that there >>> will only be a few type annotation providers who will all nicely >>> cooperate with one another is just as naive as I was in the SysEx debacle. >> Are you saying that function annotations are a bad idea because we won't be >> able to pickle them? > > That is not what I got out of the message at all. > >> If not, your entire argument seems specious. Actually, even if that *is* >> your argument, it's specious, since all that's needed to support pickling >> is to support pickling. All that's needed to support printing is to >> support printing (via __str__), and so on. > > I think you misunderstood Talin. While it was a pain for him to work > his way through implementing all of the loading/etc. protocols, I > believe his point was that if we allow any and all arbitrary metadata to > be placed on arguments to and from functions, then invariably there will > be multiple methods of doing as much. That isn't a problem unto itself, > but when there ends up being multiple metadata formats, with multiple > interpretations of them, and a user decides that they want to combine > the functionality of two metadata formats, they may be stuck due to > incompatibilities, etc. > > I think that it can be fixed by defining a standard mechanism for > 'metadata chaining', one involving tuples and/or dictionaries. > > Say, for example, we have the following function definition: > def foo(argn:meta=dflt): > ... > > Since meta can take on the value of a Python expression (executed during > compile-time), a tuple-based chaining would work like so: > > @chainmetadatatuple(meta_fcn1, meta_fcn2) > def foo(argn:(meta1, meta2)=dflt): > ... > > And a dictionary-based chaining would work like so: > @chainmetadatadict(m1=meta_fcn1, m2=meta_fcn2) > def foo(argn:{'m1'=meta1, 'm2'=meta2}=dflt): > ... > > The reason to include the dict-based option is to allow for annotations > to be optional. > > > This method may or may not be good. But, if we don't define a standard > method for metadata to be combined from multiple protocols, etc., then > we could end up with incompatabilities. However, if we do define a > standard chaining mechanism, then it can be used, and presumably > we shouldn't run into problems relating to incompatible annotation, etc. > > > - Josiah Josiah is essentially correct in his interpretation of my views. I really don't understand what Phillip is talking about here. Say I want to annotate a specific argument with two pieces of information, a type and a docstring. I have two metadata interpreters, one which uses the type information to restrict the kinds of arguments that can be passed in, and another which uses the docstring to enhance the generated documentation. Now, lets say that these two metadata interpreters were written by two people, who are not in communication with each other. Each one decides that they would like to "play nice" with other competing metadata. So Author A, who wrote the annotation decorator that looks for docstrings, decides that not only will he accept docstring annotations, but if the annotation is a tuple, then he will search that tuple for any docstrings, skipping over any annotations that he doesn't understand. (Although how he is supposed to manage that is unclear - since there could also be other annotations that are simple text strings as well.) Author B, who wrote the type-enforcement module, also wants to play nice with others, but since he doesn't know A, comes up with a different solution. His idea is to create a system in which annotations automatically chain each other - so that each annotation has a "next" attribute referring to the next annotation. So programmer C, who wants to incorporate both A and B's work into his program, has a dilemma - each has a sharing mechanism, but the sharing mechanisms are different and incompatible. So he is unable to apply both A-type and B-type metadata to any given signature. What happens next is that C complains to both A and B (and in the process introducing them to each other.) A and B exchange emails, and reach the conclusion that B will modify his library to confirm to the sharing mechanism of A. What this means is that A and B have created a defacto standard. Anyone who wants to interoperate with A and B have to write their interpreter to conform to the sharing mechanism defined by A and B. But it also means that anyone outside of the ABC clique will not know about A&B's sharing convention, which means that their metadata interpreter will not be able to interoperate with A&B-style metadata. So in essence, A&B have now "captured" the space of annotations - that is, anyone who conforms to the A&B protocol can combine their annotations together; Anyone outside that group is excluded from interoperating. Finally, lets say that A&B eventually become well-known enough that their sharing convention becomes the defacto standard. Any metadata that wants to interoperate with other metadata-interpretation libraries will have to follow the A&B convention. Any metadata library that chooses to use a different convention will be at a severe disadvantage, since they won't be able to be used together with other metadata interpreters. What this means is that, despite the statements that annotations have no defined format or meaning, the fact is that they now do: The defacto A&B sharing convention. The sharing convention tells metadata interpreters how to distinguish between metadata that they can interpret, and how to skip over other metadata. So in other words, since the original author of the annotation system failed to provide a convention for multiple annotations, they force the community to fill in the parts of the standard that they left out. -- Talin From seojiwon at gmail.com Sat Aug 12 01:20:20 2006 From: seojiwon at gmail.com (Jiwon Seo) Date: Fri, 11 Aug 2006 16:20:20 -0700 Subject: [Python-3000] PEP3102 Keyword-Only Arguments Message-ID: When we have keyword-only arguments, do we allow 'keyword dictionary' argument? If that's the case, where would we want to place keyword-only arguments? Are we going to allow any of followings? 1. def foo(a, b, *, key1=None, key2=None, **map) 2. def foo(a, b, *, **map, key1=None, key2=None) 3. def foo(a, b, *, **map) -Jiwon From collinw at gmail.com Sat Aug 12 01:49:32 2006 From: collinw at gmail.com (Collin Winter) Date: Fri, 11 Aug 2006 19:49:32 -0400 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <44DD073C.7030305@acm.org> References: <5.1.1.6.0.20060811112118.023af398@sparrow.telecommunity.com> <20060811084623.1931.JCARLSON@uci.edu> <44DD073C.7030305@acm.org> Message-ID: <43aa6ff70608111649g54e82dd6kef19862f0c281254@mail.gmail.com> I'll combine my replies to Josian and Talin: On 8/11/06, Josiah Carlson wrote: > Let us say that I have two metadata interpters. One that believes that > the metadata is types and wants to verify type on function call. The > other believes that the metadata is documentation. Both were written > without regards to the other. Please describe to me (in code preferably) > how I would be able to use both of them without having a defined > metadata interpretation chaining semantic. On 8/11/06, Talin wrote: > Say I want to annotate a specific argument with two pieces of > information, a type and a docstring. I have two metadata interpreters, > one which uses the type information to restrict the kinds of arguments > that can be passed in, and another which uses the docstring to enhance > the generated documentation. [snipped: the rise of a defacto annotation-sharing standard] > What this means is that, despite the statements that annotations have no > defined format or meaning, the fact is that they now do: The defacto A&B > sharing convention. The sharing convention tells metadata interpreters > how to distinguish between metadata that they can interpret, and how to > skip over other metadata. What Josiah is hinting at -- and what Talin describes more explicitly -- is the problem of how exactly "chaining" annotation interpreters will work. The case I've thought out the most completely is that of using decorators to analyse/utilise the annotations: 1) Each decorator should be written with the assumption that it is the only decorator that will be applied to a given function (with respect to annotations). 2) Chaining will be accomplished by maintaining this illusion for each decorator. For example, if our annotation-sharing convention is that annotations will be n-tuples (n == number of annotation-interpreting decorators), where t[i] is the annotation the i-th decorator should care about, the following chain() function will do the trick (a full demo script is attached): >>> def chain(*decorators): >>> assert len(decorators) >= 2 >>> >>> def decorate(function): >>> sig = function.__signature__ >>> original = sig.annotations >>> >>> for i, dec in enumerate(decorators): >>> fake = dict((p, original[p][i]) for p in original) >>> >>> function.__signature__.annotations = fake >>> function = dec(function) >>> >>> function.__signature__.annotations = original >>> return function >>> return decorate A similar function can be worked out for using dictionaries to specify multiple annotations. I'll update the PEP draft to include a section on guidelines for writing such decorators. Collin Winter -------------- next part -------------- A non-text attachment was scrubbed... Name: chaining_decorators.py Type: text/x-python-script Size: 1497 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20060811/065a0df9/attachment.bin From tomerfiliba at gmail.com Sat Aug 12 02:13:24 2006 From: tomerfiliba at gmail.com (tomer filiba) Date: Sat, 12 Aug 2006 02:13:24 +0200 Subject: [Python-3000] threading, part 2 Message-ID: <1d85506f0608111713m15cf2e67v8b94f06c928e9125@mail.gmail.com> i mailed this to several people separately, but then i thought it could benefit the entire group: http://sebulba.wikispaces.com/recipe+thread2 it's an implementation of the proposed "thread.raise_exc", through an extension to the threading.Thread class. you can test it for yourself; if it proves useful, it should be exposed as thread.raise_exc in the stdlib (instead of the ctypes hack)... and of course it should be reflected in threading.Thread as welll. -tomer -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060812/487eb7e6/attachment.htm From greg.ewing at canterbury.ac.nz Sat Aug 12 03:06:40 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 12 Aug 2006 13:06:40 +1200 Subject: [Python-3000] threading, part 2 In-Reply-To: <20060811102346.EFC4.SLAWOMIR.NOWACZYK.847@student.lu.se> References: <20060811102346.EFC4.SLAWOMIR.NOWACZYK.847@student.lu.se> Message-ID: <44DD29A0.4000902@canterbury.ac.nz> Slawomir Nowaczyk wrote: > But it should not be done lightly and never when the code is not > specifically expecting it. What if, together with a way of blocking asynchronous exceptions, threads started out by default with them blocked? Then a thread would have to explicitly consent to being interrupted. -- Greg From pje at telecommunity.com Sat Aug 12 03:32:49 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri, 11 Aug 2006 21:32:49 -0400 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: Message-ID: <5.1.1.6.0.20060811211801.02287420@sparrow.telecommunity.com> At 3:16 PM 8/12/2006 -0700, Talin wrote: >Phillip J. Eby wrote: > > At 06:10 AM 8/11/2006 -0700, Talin wrote: > >> Or to put it another way: If you create a tool, and you assume that tool > >> will only be used in certain specific ways, but you fail to enforce that > >> limitation, then your assumption will be dead wrong. The idea that there > >> will only be a few type annotation providers who will all nicely > >> cooperate with one another is just as naive as I was in the SysEx > >> debacle. > > > > Are you saying that function annotations are a bad idea because we won't > > be able to pickle them? > >Huh? What does pickling have to do with anything I said? I'll happily answer that question as soon as you explain what *function annotations* have to do with anything you said. Bonus points if you can explain what MIDI has to do with overloaded functions. :) To put it another way, the only reason I asked about pickling was to try to find *some* meaning in your post. If pickling doesn't relate, then your post has nothing to do with function annotations, because pickling is the most similar thing to the programming problem you actually described. However, if pickling *does* relate, then the mere existence of Python's ability to do pickling proves that the MIDI issue, transferred to the Python sphere, doesn't actually exist. Thus, either way, the MIDI problems you described are moot with respect to function annotations in Python. Is that clearer? (See also my replies to Greg and Josiah on this subject.) From lcaamano at gmail.com Sat Aug 12 03:51:25 2006 From: lcaamano at gmail.com (Luis P Caamano) Date: Fri, 11 Aug 2006 21:51:25 -0400 Subject: [Python-3000] threading, part 2 Message-ID: That's how I feel too Josiah. In some ways, it's the same as writing device drivers in a pre-emptable kernel. You can get interrupted and pre-empted by the hardware at any freaking time in any piece of code and your memory might go away so you better pin it and deal with the interrupts. Forget about that and you end up with a nice kernel panic. Still, we have all kinds of device drivers on SMP, pre-emptable kernels. It can be done. [ sarcastic mode on ] Yes, if it gets exposed to the language it should come with a big warning ... now, how condescending should that warning be? "You can't use this unless you're a good programmer!" or "You better know what you're doing" or how about "A guy once pulled out all his pubic hair trying to figure out what happened when he started using this feature!"? [ sarcastic mode off] It's a gun, here's a bullet, it's a tool, go get food but try not to shoot yourself. I'm also -0 on this, not that I think my opinion counts though. I'm -0 because Tomer pointed me to a nice recipe that uses ctypes to get to the C interface. I'm happy with that and we can start using it right now. Perhaps that should be as high as it gets expose so that it would be an automatic skill test? If you can find it, you probably know how to use it and the kind of problems you might run into. On 8/11/06, Josiah Carlson wrote: > > > I believe that if a user cannot design and implement their own system to > handle when a thread can be killed or not to their own satisfaction, > then they have no business killing threads. > > > - Josiah > -- Luis P Caamano Atlanta, GA USA From talin at acm.org Sat Aug 12 04:17:37 2006 From: talin at acm.org (Talin) Date: Fri, 11 Aug 2006 19:17:37 -0700 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <5.1.1.6.0.20060811211801.02287420@sparrow.telecommunity.com> References: <5.1.1.6.0.20060811211801.02287420@sparrow.telecommunity.com> Message-ID: <44DD3A41.10507@acm.org> Phillip J. Eby wrote: > At 3:16 PM 8/12/2006 -0700, Talin wrote: >> Phillip J. Eby wrote: >> > At 06:10 AM 8/11/2006 -0700, Talin wrote: >> >> Or to put it another way: If you create a tool, and you assume that >> tool >> >> will only be used in certain specific ways, but you fail to enforce >> that >> >> limitation, then your assumption will be dead wrong. The idea that >> there >> >> will only be a few type annotation providers who will all nicely >> >> cooperate with one another is just as naive as I was in the SysEx >> >> debacle. >> > >> > Are you saying that function annotations are a bad idea because we >> won't >> > be able to pickle them? >> >> Huh? What does pickling have to do with anything I said? > > I'll happily answer that question as soon as you explain what *function > annotations* have to do with anything you said. Bonus points if you can > explain what MIDI has to do with overloaded functions. :) All right. I realize that not everyone made the connection between my parable and the current debate, and I need to spell it out more explicitly. The parable is essentially about standards-writers who fail to do their job by underspecifying certain aspects of the standard, and leave the solution to individual implementers of the standard; And its also how the implementers who try to fill in the missing pieces of the standard do so in a way that is unique and incompatible with what every other implementer is doing. The story also has to do with people who assume things about the behavior of other software developers - specifically, my assumption that other people, working in isolation from one another, would come up with the same or similar solutions to a given problem, vs. Colin's assumption that other creators of annotation interpreters would coordinate their efforts in a sensible way. What the annotation PEP and the SysEx have in common is that they are both dealing with an open-ended specification - one which allows any provider to extend the protocol in any way they wish, without any knowledge or coordination from any other provider. Both specs describe a 'container' for information, but deliberately avoid saying what's in the container. Both specs fail to provide any means for an external entity to discover the meaning of what the objects in the container are - instead, external entities must have a priori knowledge of the contained data. My criticism of Colin's PEP was that it hand-waved over some fairly major problems, and the logic behind the hand-wave was that, well, developers won't do that - there's only going to be a small number of such developers, and they will all deal with each other. I wanted to illustrate how disastrous such an assumption could be. Another lesson of the story has to do with the failure of the MMA committee to specify any guidelines or hints as to how their open-ended protocol should be used. If the MMA had simply put a paragraph in the original standard saying "You are free to create any protocol format you want, but here's an example of how a bulk dump protocol might work" (followed by a description of such), then what would have happened is that most of the instrument makers would simply have used the example as a starting point. This would have saved millions of man-hours of confusion and chaos over the last 20 years. Dozens of companies created Universal Librarian products, and all of them had to deal with the astounding diversity of protocols, which could have been avoided by one little non-binding paragraph in the standard. In other words, I criticize both the MMA's spec and Colin's for the sin of underspecification - that is, allowing critical decisions that *should* have been made by the standard writer to instead be made by the standard implementers, with the result that each implementer comes up with their own unique solution to a problem which should have been solved in the original standard doc. -- Talin From collinw at gmail.com Sat Aug 12 04:43:43 2006 From: collinw at gmail.com (Collin Winter) Date: Fri, 11 Aug 2006 22:43:43 -0400 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <44DD3A41.10507@acm.org> References: <5.1.1.6.0.20060811211801.02287420@sparrow.telecommunity.com> <44DD3A41.10507@acm.org> Message-ID: <43aa6ff70608111943o1fb05d1eq753157bc4fc53ccb@mail.gmail.com> On 8/11/06, Talin wrote: > The story also has to do with people who assume things about the > behavior of other software developers - specifically, my assumption that > other people, working in isolation from one another, would come up with > the same or similar solutions to a given problem, vs. Colin's assumption > that other creators of annotation interpreters would coordinate their > efforts in a sensible way. I make no assumptions that people writing annotation interpreters will coordinate their efforts. My assertion that "[t]here is no worry that these libraries will assign semantics at random, or that a variety of libraries will appear, each with varying semantics and interpretations of what, say, a tuple of strings means." is not based on coordination but rather the marketplace. If someone starts assigning semantics that aren't "pythonic", that don't fit in with how the majority of Python programmers think, no-one will use their library and it will die. The drive to write, release and maintain open-source software is predicated on a desire to have people use your product, to find it useful. To that end, I expect that the creators of annotation interpreters will take care to maximise the utility (and hence the audience) for their library. > What the annotation PEP and the SysEx have in common is that they are > both dealing with an open-ended specification - one which allows any > provider to extend the protocol in any way they wish, without any > knowledge or coordination from any other provider. In your long parable, you've ignored the key difference between the open-ended-ness of my PEP and that of SysEx: there are much greater environmental constraints on people writing interpreters for function annotations. The only constraints for developers using SysEx are "anything you can turn into bytes". > Another lesson of the story has to do with the failure of the MMA > committee to specify any guidelines or hints as to how their open-ended > protocol should be used. I agree that the PEP needs to include some guidance for those writing annotation interpreters (such as how to anticipate being used in conjunction with other interpreters), but I see no merit in setting in stone a list of officially endorsed uses for function annotations. > In other words, I criticize both the MMA's spec and Colin's for the sin > of underspecification - that is, allowing critical decisions that > *should* have been made by the standard writer to instead be made by the > standard implementers, with the result that each implementer comes up > with their own unique solution to a problem which should have been > solved in the original standard doc. Are you referring to the fact that the PEP doesn't dictate how lists, tuples, etc are to be interpreted, or still to the fact that I didn't include a paragraph talking about interpreter chaining? > -- Talin Collin Winter PS: My name has 2 L's in it. From pje at telecommunity.com Sat Aug 12 04:52:57 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri, 11 Aug 2006 22:52:57 -0400 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: Message-ID: <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com> At 03:39 PM 8/12/2006 -0700, Talin wrote: >Say I want to annotate a specific argument with two pieces of >information, a type and a docstring. I have two metadata interpreters, >one which uses the type information to restrict the kinds of arguments >that can be passed in, and another which uses the docstring to enhance >the generated documentation. > >Now, lets say that these two metadata interpreters were written by two >people, who are not in communication with each other. Each one decides >that they would like to "play nice" with other competing metadata. > >So Author A, who wrote the annotation decorator that looks for >docstrings, decides that not only will he accept docstring annotations, >but if the annotation is a tuple, then he will search that tuple for any >docstrings, skipping over any annotations that he doesn't understand. >(Although how he is supposed to manage that is unclear - since there >could also be other annotations that are simple text strings as well.) > >Author B, who wrote the type-enforcement module, also wants to play nice >with others, but since he doesn't know A, comes up with a different >solution. His idea is to create a system in which annotations >automatically chain each other - so that each annotation has a "next" >attribute referring to the next annotation. > >So programmer C, who wants to incorporate both A and B's work into his >program, has a dilemma - each has a sharing mechanism, but the sharing >mechanisms are different and incompatible. So he is unable to apply both >A-type and B-type metadata to any given signature. Not at all. A and B need only use overloadable functions, and the problem is trivially resolved by adding overloads. The author of C can add an overload to "A" that will handle objects with 'next' attributes, or add one to "B" that handles tuples, or both. I've not bothered to reply to the rest of your email, since it depends on assumptions that I've already shown to be invalid. From pje at telecommunity.com Sat Aug 12 05:01:38 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri, 11 Aug 2006 23:01:38 -0400 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: Message-ID: <5.1.1.6.0.20060811225402.0228c178@sparrow.telecommunity.com> At 07:49 PM 8/12/2006 -0400, "Collin Winter" wrote: >What Josiah is hinting at -- and what Talin describes more explicitly >-- is the problem of how exactly "chaining" annotation interpreters >will work. I'd prefer we not use the word "interpreters" to describe operations that use annotations. It carries a lot of excess baggage. >The case I've thought out the most completely is that of using >decorators to analyse/utilise the annotations: > >1) Each decorator should be written with the assumption that it is the >only decorator that will be applied to a given function (with respect >to annotations). > >2) Chaining will be accomplished by maintaining this illusion for each >decorator. For example, if our annotation-sharing convention is that >annotations will be n-tuples (n == number of annotation-interpreting >decorators), where t[i] is the annotation the i-th decorator should >care about, the following chain() function will do the trick (a full >demo script is attached): I don't see the point of this. A decorator should be responsible for manipulating the signature of its return value. Meanwhile, the semantics for combining annotations should be defined by an overloaded function like "combineAnnotations(a1,a2)" that returns a new annotation. There is no need to have a special chaining decorator. May I suggest that you try using Guido's Py3K overloaded function prototype? I expect you'll find that if you play around with it a bit, it will considerably simplify your view of what's required to do this. It truly isn't necessary to predefine what an annotation is, or even any structural constraints on how they will be combined, since the user is able to define for any given type how such things will be handled. From qrczak at knm.org.pl Sat Aug 12 06:06:53 2006 From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk) Date: Sat, 12 Aug 2006 06:06:53 +0200 Subject: [Python-3000] threading, part 2 In-Reply-To: <20060811125449.1940.JCARLSON@uci.edu> (Josiah Carlson's message of "Fri, 11 Aug 2006 13:12:15 -0700") References: <20060811105742.193A.JCARLSON@uci.edu> <87veozoyo9.fsf@qrnik.zagroda> <20060811125449.1940.JCARLSON@uci.edu> Message-ID: <877j1emwbm.fsf@qrnik.zagroda> Josiah Carlson writes: > Threading is already difficult enough to do 'right' (see the dozens > of threads discussing why this is really the case), and designing > software that can survive the raising of an exception at any point > makes threading even more difficult. That's why I'm proposing to provide ways to limit those "any points". > I believe that you are attempting to design an interface to make > this particular feature foolproof. No, I'm merely attempting to make it usable. > I think that such is a mistake; killing a thread should be frought > with gotchas and should be documented as "may crash the runtime". You are proposing to make it unusable? > Offering users anything more is tantamount to encouraging its use, > which is counter to the reasons why it is not available via a > standard threading.function call: because it shouldn't be used at > all, except by people who know what the heck they are doing. Indeed, you are proposing to make it unusable. > I believe that if a user cannot design and implement their own > system to handle when a thread can be killed or not to their own > satisfaction, then they have no business killing threads. I have already implemented it. In my own language, where I have full control over the runtime. Some Haskell people made the first design a few years ago, and implemented it in Glasgow Haskell Compiler. http://citeseer.ist.psu.edu/415348.html Some people saw that it was good, that the existing handling of KeyboardInterrupt in Python is unsafe, and they adapted the design for Python (without actually implementeing it as far as I know). http://www.cs.williams.edu/~freund/papers/02-lwl2.ps I built on their experience, extended the design, and implemented it in my language Kogut, so I can play with it and see how it works in practice. http://www.cs.ioc.ee/tfp-icfp-gpce05/tfp-proc/06num.pdf I'm quite confident that something like this is the right design, even if some details could be changed. Now it would be nice if Python had usable asynchronous exceptions too. If we are not brave enough, we can implement at least an equivalent of POSIX thread cancellation. It would be better than nothing, though not as useful, because the default mode allows interruption only at certain blocking primitives. In this scenario Unix signals need a different policy so a pure computation not performing I/O nor thread synchronization can be interrupted; Unix signals usually cause the whole process to abort so data integrity was less of a concern. A language with GC and exceptions can do better, with a unified policy for thread cancellation and Unix signals and other asynchronous events. It can be done such that well-written libraries are safely interruptible even if exceptions may occur almost anywhere. Protection should be built into certain operations (e.g. try...finally extended with an "initially" clause, or taking a mutex), so that there is less work needed to make code safe to be interrupted; then quite often it's already safe. -- __("< Marcin Kowalczyk \__/ qrczak at knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/ From collinw at gmail.com Sat Aug 12 06:33:28 2006 From: collinw at gmail.com (Collin Winter) Date: Sat, 12 Aug 2006 00:33:28 -0400 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <5.1.1.6.0.20060811225402.0228c178@sparrow.telecommunity.com> References: <5.1.1.6.0.20060811225402.0228c178@sparrow.telecommunity.com> Message-ID: <43aa6ff70608112133w7eb2d0c6x287c021b108974b@mail.gmail.com> > I don't see the point of this. A decorator should be responsible for > manipulating the signature of its return value. Meanwhile, the semantics > for combining annotations should be defined by an overloaded function like > "combineAnnotations(a1,a2)" that returns a new annotation. There is no > need to have a special chaining decorator. > > May I suggest that you try using Guido's Py3K overloaded function > prototype? I expect you'll find that if you play around with it a bit, it > will considerably simplify your view of what's required to do this. It > truly isn't necessary to predefine what an annotation is, or even any > structural constraints on how they will be combined, since the user is able > to define for any given type how such things will be handled. I've looked at Guido's overloaded function prototype, and while I think I'm in the direction of understanding, I'm not quite there 100%. Could you illustrate (in code) what you've got in mind for how to apply overloaded functions to this problem space? Collin Winter From talin at acm.org Sat Aug 12 06:49:52 2006 From: talin at acm.org (Talin) Date: Fri, 11 Aug 2006 21:49:52 -0700 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com> References: <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com> Message-ID: <44DD5DF0.40405@acm.org> Phillip J. Eby wrote: > Not at all. A and B need only use overloadable functions, and the > problem is trivially resolved by adding overloads. The author of C can > add an overload to "A" that will handle objects with 'next' attributes, > or add one to "B" that handles tuples, or both. I'm still not sure what you are talking about - what is being overloaded here? Let me give you a better example. Suppose I have a 'docstring' annotation and a 'getopt' annotation. The docstring annotation associates a string with each argument, which can be inspected by an external documentation scanner to produce documentation for that argument. Thus: def myfunc( x : "The x coordinate", y : "The y coordinate" ) ... The 'getopt' annotation is used in conjunction with the 'getopt' decorator, which converts from command-line arguments to python method arguments. The idea is that you have a class that is acting as a back end to a command-line shell. Each method in the class corresponds to a single command. The annotations allow you to associate specific flags or switches with particular arguments. So: class MyHandler( CommandLineHandler ): @getopt def list( infile:"i" = sys.stdin, outfile:"o" = sys.stdout ): ... With the getopt handler in place, I can type the following shell command: list -i -o If either the -i or -o switch is omitted, then the corresponding argument is either stdin or stdout. Additionally, the getopt module can generate 'usage' information for the function in question: Usage: list [-i infile] [-o outfile] Now, what happens if I want to use both docstrings and the getopt decorator on the same function? The both expect to see annotations that are strings! How does the doc extractor and the getopt decorator know which strings belong to them, and which strings they should ignore? -- Talin From slawomir.nowaczyk.847 at student.lu.se Sat Aug 12 08:22:17 2006 From: slawomir.nowaczyk.847 at student.lu.se (Slawomir Nowaczyk) Date: Sat, 12 Aug 2006 08:22:17 +0200 Subject: [Python-3000] threading, part 2 In-Reply-To: References: Message-ID: <20060812082034.EFEC.SLAWOMIR.NOWACZYK.847@student.lu.se> On Fri, 11 Aug 2006 21:51:25 -0400 Luis P Caamano wrote: #> That's how I feel too Josiah. In some ways, it's the same as writing #> device drivers in a pre-emptable kernel. You can get interrupted and #> pre-empted by the hardware at any freaking time in any piece of code #> and your memory might go away so you better pin it and deal with the #> interrupts. Forget about that and you end up with a nice kernel #> panic. Still, we have all kinds of device drivers on SMP, #> pre-emptable kernels. It can be done. Of course it can... but do we *really* want programming in Python3k to be comparable in difficulty to writing device drivers? -- Best wishes, Slawomir Nowaczyk ( Slawomir.Nowaczyk at cs.lth.se ) Numeric stability is probably not all that important when you're guessing. From ncoghlan at gmail.com Sat Aug 12 08:58:47 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 12 Aug 2006 16:58:47 +1000 Subject: [Python-3000] threading, part 2 In-Reply-To: <20060812082034.EFEC.SLAWOMIR.NOWACZYK.847@student.lu.se> References: <20060812082034.EFEC.SLAWOMIR.NOWACZYK.847@student.lu.se> Message-ID: <44DD7C27.9000006@gmail.com> Slawomir Nowaczyk wrote: > On Fri, 11 Aug 2006 21:51:25 -0400 > Luis P Caamano wrote: > > #> That's how I feel too Josiah. In some ways, it's the same as writing > #> device drivers in a pre-emptable kernel. You can get interrupted and > #> pre-empted by the hardware at any freaking time in any piece of code > #> and your memory might go away so you better pin it and deal with the > #> interrupts. Forget about that and you end up with a nice kernel > #> panic. Still, we have all kinds of device drivers on SMP, > #> pre-emptable kernels. It can be done. > > Of course it can... but do we *really* want programming in Python3k to > be comparable in difficulty to writing device drivers? > No, but "programming in Py3k" and "trying to asynchronously terminate an active thread in Py3k without active cooperation from that thread" are not really the same thing. Making easy things easy and difficult things possible is a good goal - making difficult things appear to be deceptively easy is a good way to cause problems down the road :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ncoghlan at gmail.com Sat Aug 12 09:58:08 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 12 Aug 2006 17:58:08 +1000 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com> References: <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com> Message-ID: <44DD8A10.1040808@gmail.com> Phillip J. Eby wrote: > At 03:39 PM 8/12/2006 -0700, Talin wrote: >> So programmer C, who wants to incorporate both A and B's work into his >> program, has a dilemma - each has a sharing mechanism, but the sharing >> mechanisms are different and incompatible. So he is unable to apply both >> A-type and B-type metadata to any given signature. > > Not at all. A and B need only use overloadable functions, Stop right there. "A and B need only use overloadable functions"? That sounds an awful lot like placing a constraint on the way annotation libraries are implemented in order to facilitate a single program using multiple annotation libraries - which is exactly what Talin is saying is needed! Talin is saying "the annotation PEP needs to recommend a mechanism that allows a single program to use multiple annotation libraries". And you're saying "a good mechanism for allow a program to use multiple annotation libraries is for every annotation library to expose an overloades 'interpret_annotation' function that the application can hook in order to handle new annotation types". I think you're right that overloaded functions are a possible solution to this problem, but that doesn't obviate the need for the PEP to address the question explicitly (and using overloaded functions for this strikes me as hitting a very small nail with a very large hammer). With the function overloading solution, you would need to do three things in order to get two frameworks to cooperate: 1. Define your own Annotation type and register it with the frameworks you are using 2. Define a decorator to wrap the annotations in a function __signature__ into your custom annotation type 3. Apply your decorator to functions before the decorators for the annotation libraries are invoked Overloading a standard type (like tuple) wouldn't work, as you might have two different modules, both using the same annotation library, that want it to interpret tuples in two different ways (e.g. in module A, the library's info is at index 0, while in module B it is at index 1). So, for example: @library_A_type_processor @library_B_docstring_processor @handle_annotations def func(a: (int, "an int"), b: (str, "a string")) -> (str, "returns a string, too!): # do something def handle_annotations(f): note_dict = f.__signature__.annotations for param, note in note_dict.items(): note_dict[param] = MyAnnotation(note) return f However, what we're really talking about here is a scenario where you're defining your *own* custom annotation processor: you want the first part of the tuple in the expression handled by the type processing library, and the second part handled by the docstring processing library. Which says to me that the right solution is for the annotation to be split up into its constituent parts before the libraries ever see it. This could be done as Collin suggests by tampering with __signature__.annotations before calling each decorator, but I think it is cleaner to do it by defining a particular signature for decorators that are intended to process annotations. Specifically, such decorators should accept a separate dictionary to use in preference to the annotations on the function itself: process_function_annotations(f, annotations=None): # Process the function f # If annotations is not None, use it # otherwise, get the annotations from f.__signature__ Then our function declaration and decorator would look like: @handle_annotations def func(a: (int, "an int"), b: (str, "a string")) -> (str, "returns!): # do something def handle_annotations(f): decorators = library_A_type_processor, library_B_docstring_processor note_dicts = {}, {} for param, note in f.__signature__.annotations.iteritems(): for note_dict, subnote in zip(note_dicts, note): note_dict[param] = subnote for decorator, note_dict in zip(decorators, note_dicts): f = decorator(f, note_dict) return f Writing a factory function to handle chaining of an arbitrary number of annotation interpreting libraries would be trivial, with the set of decroators provided as positional arguments if your notes are in a tuple, and as a keyword arguments if the notes are in a dictionary. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ncoghlan at gmail.com Sat Aug 12 10:13:44 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 12 Aug 2006 18:13:44 +1000 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <43aa6ff70608091732o150a6674t4416f4b76d8bc40b@mail.gmail.com> References: <43aa6ff70608091732o150a6674t4416f4b76d8bc40b@mail.gmail.com> Message-ID: <44DD8DB8.3050102@gmail.com> Collin Winter wrote: > Return Values > ------------- > > The examples thus far have omitted examples of how to annotate the > type of a function's return value. This is done like so: > > :: > def sum(*vargs: Number) -> Number: > ... > > > The parameter list can now be followed by a literal ``->`` and > a Python expression. Like the annotations for parameters, this > expression will be evaluated when the function is compiled. I'd like to request that the annotation for the return type be *inside* the parentheses for the parameter list. Why, you ask? Because, as soon as the annotations are at all verbose, you're going to want to split the function definition up so that each parameter gets its own line. For the parameters, this works beautifully because parenthesis matching keeps the compiler from getting upset: def sum(seq: "the sequence of values to be added", init=0: "the initial value of the total"): # do it But now try to document the return type on its own line: def sum(seq: "the sequence of values to be added", init=0: "the initial value of the total") -> "the summation of the sequence": # do it Kaboom - SyntaxError on the second line because of the missing colon. However, if the return type annotation is *inside* the parentheses and separated by a comma, there's no problem: def sum(seq: "the sequence of values to be added", init=0: "the initial value of the total", -> "the summation of the sequence"): # do it Having to use a line continuation just to be able to annotate the return type on a separate line would be an annoyance. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From jcarlson at uci.edu Sat Aug 12 10:35:02 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Sat, 12 Aug 2006 01:35:02 -0700 Subject: [Python-3000] threading, part 2 In-Reply-To: References: Message-ID: <20060812012526.195B.JCARLSON@uci.edu> "Luis P Caamano" wrote: > It's a gun, here's a bullet, it's a tool, go get food but try not to > shoot yourself. > > I'm also -0 on this, not that I think my opinion counts though. I'm > -0 because Tomer pointed me to a nice recipe that uses ctypes to get > to the C interface. I'm happy with that and we can start using it > right now. Perhaps that should be as high as it gets expose so that > it would be an automatic skill test? If you can find it, you probably > know how to use it and the kind of problems you might run into. Remember that the meat of Tomer's recipe, the ctypes call, is the only thing that is going to be documented in Python 2.5 . The functionality of being able to kill threads with exceptions has existed since Python 2.3 (if I understood previous postings correctly), but has been generally undocumented. Because it is literally just a documentation change, and not actually additional functionality, means that it *can* go into Python 2.5 . All other feature additions are too late in the Beta cycle (Beta 3 is next week) to be added, unless someone manages to convince the release manager that it should be allowed (I would put money on it not going to happen). - Josiah > On 8/11/06, Josiah Carlson wrote: > > > > > > I believe that if a user cannot design and implement their own system to > > handle when a thread can be killed or not to their own satisfaction, > > then they have no business killing threads. > > > > > > - Josiah > > > > > -- > Luis P Caamano > Atlanta, GA USA > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/jcarlson%40uci.edu From jcarlson at uci.edu Sat Aug 12 11:07:29 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Sat, 12 Aug 2006 02:07:29 -0700 Subject: [Python-3000] threading, part 2 In-Reply-To: <877j1emwbm.fsf@qrnik.zagroda> References: <20060811125449.1940.JCARLSON@uci.edu> <877j1emwbm.fsf@qrnik.zagroda> Message-ID: <20060812013530.195E.JCARLSON@uci.edu> "Marcin 'Qrczak' Kowalczyk" wrote: > Josiah Carlson writes: > > Threading is already difficult enough to do 'right' (see the dozens > > of threads discussing why this is really the case), and designing > > software that can survive the raising of an exception at any point > > makes threading even more difficult. > > That's why I'm proposing to provide ways to limit those "any points". > > > I believe that you are attempting to design an interface to make > > this particular feature foolproof. > > No, I'm merely attempting to make it usable. > > You are proposing to make it unusable? > > Indeed, you are proposing to make it unusable. Because you or anyone else can define a standard mechanism of handling these points where threads are allowed to be killed, and you can publish it on the internet via the Python cookbook, etc., having nothing in the standard library specifically supporting the operation isn't making anything unusable. I'm not proposing to make it unusable, merely that it should not be made any easier to use. See Nick Coughlan's comment with regards to '...easy things easy...'. > > I believe that if a user cannot design and implement their own > > system to handle when a thread can be killed or not to their own > > satisfaction, then they have no business killing threads. > > I have already implemented it. In my own language, where I have > full control over the runtime. I'm glad that you have managed to implement it in your programming language. But this discussion isn't about Kogut, Haskell, etc., this is about Python. Specifically what should and should not be available in the Python standard library. I've said it before, but apparently the following point is ignored, so I'll say it again. The 'kill thread' mechanism isn't available via some threading.kill_thread(thr) function because Guido and other core developers *of* Python do not want it to be generally acceptable for users to kill arbitrary threads. The introduction of methods of controlling where a thread could be killed into the standard library would be encouraging the 'kill thread' usage. It would be far safer (and much less work for the developers of Python) for users to just learn how to handle thread quitting using any of the standard methods of doing so (check the value of a variable, wait for a signal, etc.). Never mind that any feature is going to have to wait 18+ months before Python 2.6 comes out in order to get your proposed changes in. > Now it would be nice if Python had usable asynchronous exceptions too. Python has had usable asynchronous exceptions since Python 2.3 [1]. > If we are not brave enough, we can implement at least an equivalent > of POSIX thread cancellation. It would be better than nothing, though > not as useful, because the default mode allows interruption only at > certain blocking primitives. In this scenario Unix signals need a > different policy so a pure computation not performing I/O nor thread > synchronization can be interrupted; Unix signals usually cause the > whole process to abort so data integrity was less of a concern. > > A language with GC and exceptions can do better, with a unified policy > for thread cancellation and Unix signals and other asynchronous events. > It can be done such that well-written libraries are safely interruptible > even if exceptions may occur almost anywhere. Protection should be > built into certain operations (e.g. try...finally extended with an > "initially" clause, or taking a mutex), so that there is less work > needed to make code safe to be interrupted; then quite often it's > already safe. I don't have much of a comment with regards to attempted unification of signals, etc., as Windows signal handling is effectively useless (and my primary development platform tends to be Windows). - Josiah [1] Python 2.3.5 (#62, Feb 8 2005, 16:23:02) [MSC v.1200 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import ctypes >>> import threading >>> import time >>> def foo(): ... try: ... while 1: ... time.sleep(.01) ... finally: ... print "I quit!" ... >>> x = threading.Thread(target=foo) >>> x.start() >>> for i,j in threading._active.items(): ... if j is x: ... break ... >>> ctypes.pythonapi.PyThreadState_SetAsyncExc(i, ctypes.py_object(Exception)) 1 >>> I quit! Exception in thread Thread-2:Traceback (most recent call last): File "C:\python23\lib\threading.py", line 442, in __bootstrap self.run() File "C:\python23\lib\threading.py", line 422, in run self.__target(*self.__args, **self.__kwargs) File "", line 4, in foo Exception From tim.peters at gmail.com Sat Aug 12 12:29:07 2006 From: tim.peters at gmail.com (Tim Peters) Date: Sat, 12 Aug 2006 06:29:07 -0400 Subject: [Python-3000] threading, part 2 --- + a bit of ctypes FFI worry Message-ID: <1f7befae0608120329wc646164w25ca4875da4cc5c0@mail.gmail.com> [Josiah Carlson] > ... > Python 2.3.5 (#62, Feb 8 2005, 16:23:02) [MSC v.1200 32 bit (Intel)] on win32 > Type "help", "copyright", "credits" or "license" for more information. > >>> import ctypes > >>> import threading > >>> import time > >>> def foo(): > ... try: > ... while 1: > ... time.sleep(.01) > ... finally: > ... print "I quit!" > ... > >>> x = threading.Thread(target=foo) > >>> x.start() > >>> for i,j in threading._active.items(): > ... if j is x: > ... break > ... > >>> ctypes.pythonapi.PyThreadState_SetAsyncExc(i, ctypes.py_object(Exception)) As I discovered to my chagrin when I added a similar test to the test suite a few days ago, that's got a subtle error on most 64-bit boxes. When the ctypes docs talk about passing and returning integers, they never explain what "integers" /means/, but it seems the docs implicitly have a 32-bit-only view of the world here. In reality "integer" seems to mean the native C `int` type. But a Python thread id is a native C `long` (== a Python short integer), and the code above fails in a baffling way on most 64-bit boxes: the call returns 0 instead; i.e. the thread id isn't found, and no exception gets set. So I believe that needs to be: ctypes.pythonapi.PyThreadState_SetAsyncExc( ctypes.c_long(i), ctypes.py_object(Exception)) to make it portable. It's unclear to me how to write portable ctypes code in the presence of a gazillion integer typedefs and #defines, such as for Py_ssize_t. That doesn't map to a fixed C integral type cross-platform, so what can you do? You're not required to answer that ;-) Thread ids may bite us someday too. Python casts the platform's notion of a thread id to C `long`, but there's no guarantee this won't lose information (or is even legal) on all platforms. We'd probably be safer casting to, e.g., Py_uintptr_t (some thread implementions return an index into a kernel or library thread-info table, but at least some in my lifetime returned a pointer to a thread-info struct, and that's definitely fatter than C `long` on some boxes). > 1 > >>> I quit! > Exception in thread Thread-2:Traceback (most recent call last): > File "C:\python23\lib\threading.py", line 442, in __bootstrap > self.run() > File "C:\python23\lib\threading.py", line 422, in run > self.__target(*self.__args, **self.__kwargs) > File "", line 4, in foo > Exception It's really cool that you can do this from ctypes, eh? That's exactly the right level of abstraction for this attractive nuisance too ;-) From greg.ewing at canterbury.ac.nz Sat Aug 12 13:05:08 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 12 Aug 2006 23:05:08 +1200 Subject: [Python-3000] threading, part 2 --- + a bit of ctypes FFI worry In-Reply-To: <1f7befae0608120329wc646164w25ca4875da4cc5c0@mail.gmail.com> References: <1f7befae0608120329wc646164w25ca4875da4cc5c0@mail.gmail.com> Message-ID: <44DDB5E4.9010903@canterbury.ac.nz> Tim Peters wrote: > It's unclear to me how to write portable ctypes code in the presence > of a gazillion integer typedefs and #defines, such as for Py_ssize_t. A start would be to have constants in the ctypes module for Py_ssize_t and other such Python-defined API types. -- Greg From l.oluyede at gmail.com Sat Aug 12 13:11:47 2006 From: l.oluyede at gmail.com (Lawrence Oluyede) Date: Sat, 12 Aug 2006 13:11:47 +0200 Subject: [Python-3000] threading, part 2 --- + a bit of ctypes FFI worry In-Reply-To: <44DDB5E4.9010903@canterbury.ac.nz> References: <1f7befae0608120329wc646164w25ca4875da4cc5c0@mail.gmail.com> <44DDB5E4.9010903@canterbury.ac.nz> Message-ID: <9eebf5740608120411m40da5724r11700fdbe509914@mail.gmail.com> On 8/12/06, Greg Ewing wrote: > Tim Peters wrote: > > > It's unclear to me how to write portable ctypes code in the presence > > of a gazillion integer typedefs and #defines, such as for Py_ssize_t. > > A start would be to have constants in the ctypes module > for Py_ssize_t and other such Python-defined API types. rctypes and pypy tools are somewhat one step further than ctypes machinery. In rctypes you can easily do something like: size_t = ctypes_platform.SimpleType("size_t", c_ulong) In this way you have platform safe data type to use in your code. The second argument of SimpleType() is a hint for the tool. You can also use ConstantInteger() and DefinedCostantInteger() to get values of costants in headers file like this: BUFSIZ = ctypes_platform.ConstantInteger("BUFSIZ") Maybe one day this can be ported to CPython ctypes from the RPython one. -- Lawrence http://www.oluyede.org/blog From aahz at pythoncraft.com Sat Aug 12 15:42:44 2006 From: aahz at pythoncraft.com (Aahz) Date: Sat, 12 Aug 2006 06:42:44 -0700 Subject: [Python-3000] Python 2.5 release schedule (was: threading, part 2) In-Reply-To: <20060812012526.195B.JCARLSON@uci.edu> References: <20060812012526.195B.JCARLSON@uci.edu> Message-ID: <20060812134244.GA29374@panix.com> [added python-dev to make sure everyone sees this] On Sat, Aug 12, 2006, Josiah Carlson wrote: > > All other feature additions are too late in the Beta cycle (Beta 3 is > next week) For some reason, this is the second time I've seen this claim. Beta 3 was released August 3 and next week is rc1. We are right now in complete feature lockdown; even documenting an existing API IMO requires approval from the Release Manager. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." --Brian W. Kernighan From aahz at pythoncraft.com Sat Aug 12 15:44:28 2006 From: aahz at pythoncraft.com (Aahz) Date: Sat, 12 Aug 2006 06:44:28 -0700 Subject: [Python-3000] threading, part 2 In-Reply-To: References: Message-ID: <20060812134428.GB29374@panix.com> On Fri, Aug 11, 2006, Luis P Caamano wrote: > > That's how I feel too Josiah. In some ways, it's the same as writing > device drivers in a pre-emptable kernel. You can get interrupted and > pre-empted by the hardware at any freaking time in any piece of code > and your memory might go away so you better pin it and deal with the > interrupts. Forget about that and you end up with a nice kernel > panic. Still, we have all kinds of device drivers on SMP, > pre-emptable kernels. It can be done. But Python is not the language/platform to do it. (Yeah, someone else said that already, but I think it needs emphasis.) -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." --Brian W. Kernighan From pje at telecommunity.com Sat Aug 12 17:36:51 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Sat, 12 Aug 2006 11:36:51 -0400 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <44DD5DF0.40405@acm.org> References: <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com> <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com> At 09:49 PM 8/11/2006 -0700, Talin wrote: >Phillip J. Eby wrote: >>Not at all. A and B need only use overloadable functions, and the >>problem is trivially resolved by adding overloads. The author of C can >>add an overload to "A" that will handle objects with 'next' attributes, >>or add one to "B" that handles tuples, or both. > > >I'm still not sure what you are talking about - what is being overloaded here? > >Let me give you a better example. Suppose I have a 'docstring' annotation >and a 'getopt' annotation. The docstring annotation associates a string >with each argument, which can be inspected by an external documentation >scanner to produce documentation for that argument. > >Thus: > > def myfunc( x : "The x coordinate", y : "The y coordinate" ) > ... > >The 'getopt' annotation is used in conjunction with the 'getopt' >decorator, which converts from command-line arguments to python method >arguments. The idea is that you have a class that is acting as a back end >to a command-line shell. Each method in the class corresponds to a single >command. The annotations allow you to associate specific flags or switches >with particular arguments. So: > >class MyHandler( CommandLineHandler ): > > @getopt > def list( infile:"i" = sys.stdin, outfile:"o" = sys.stdout ): > ... > >With the getopt handler in place, I can type the following shell command: > > list -i -o > >If either the -i or -o switch is omitted, then the corresponding argument >is either stdin or stdout. > >Additionally, the getopt module can generate 'usage' information for the >function in question: > > Usage: list [-i infile] [-o outfile] > >Now, what happens if I want to use both docstrings and the getopt >decorator on the same function? The both expect to see annotations that >are strings! How does the doc extractor and the getopt decorator know >which strings belong to them, and which strings they should ignore? Each one defines an overloaded function that performs the operation. E.g. "getArgumentOption(annotation)" and "getArgumentDoc(annotation)". If somebody wants to use both decorators on the same function, they add methods to one or both of those functions to define how to handle their own type. For example, I could create a "documented option" class that has attributes for the docstring and option character, and register methods with both getArgumentOption and getArgumentDoc to extact the right attributes from it. From pje at telecommunity.com Sat Aug 12 18:12:26 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Sat, 12 Aug 2006 12:12:26 -0400 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <44DD8A10.1040808@gmail.com> References: <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com> <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20060812113701.02343408@sparrow.telecommunity.com> At 05:58 PM 8/12/2006 +1000, Nick Coghlan wrote: >Phillip J. Eby wrote: >>At 03:39 PM 8/12/2006 -0700, Talin wrote: >>>So programmer C, who wants to incorporate both A and B's work into his >>>program, has a dilemma - each has a sharing mechanism, but the sharing >>>mechanisms are different and incompatible. So he is unable to apply both >>>A-type and B-type metadata to any given signature. >>Not at all. A and B need only use overloadable functions, > >Stop right there. "A and B need only use overloadable functions"? That >sounds an awful lot like placing a constraint on the way annotation >libraries are implemented in order to facilitate a single program using >multiple annotation libraries - which is exactly what Talin is saying is >needed! You could perhaps look at it that way. However, I'm simply using overloadable functions as a trivial example of how easy this is to handle without specifying a single mechanism. There are numerous overloaded function implementations available, for example, including ad-hoc registry-based ones (like the ones used by pickle) and other mechanisms besides overloaded functions that do the same thing. PEP 246 adaptation, for example, as used by Twisted and Zope. My point is that: 1. trivial standard extension mechanisms (that are already in use in today's Python) allow libraries to offer compatibility between approaches, without choosing any blessed implementation or even approach to combination 2. there is no need to define a fixed semantic framework for annotations. Guidelines for combinability (e.g. a standard interpretation for tuples or lists) might be a good idea, but it isn't *necessary* to mandate a single interpretation. >(and using overloaded functions for this strikes me as hitting a very >small nail with a very large hammer). Remember: Python is built from the ground up on overloaded functions. len(), iter(), str(), repr(), hash(), int(), ... You name it in builtins or operator, it's pretty much an overloaded function. These functions differ from "full" overloaded functions in only these respects: 1. There is no framework to let you define new ones 2. They are single-dispatch only (except for the binary arithmetic operators, which have a crude double-dispatching protocol) 3. They do not allow third-party registration; classes must define __special__ methods to register implementations (Some other overloaded functions in Python, such as pickle.dump and copy.copy, *do* allow third-party registrations, but they have ad-hoc implementations rather than using a common base implementation.) So, saying that overloaded functions are a large hammer may or may not be meaningful, but it's certainly true that they are in *enormous* use in today's Python, even for very small nails like determining the length of an object. :) Indeed, the *default* way of doing almost anything in Python that involves multiple possible implementations is to define an overloaded function -- regardless of how small the nail might be. >However, what we're really talking about here is a scenario where you're >defining your *own* custom annotation processor: you want the first part >of the tuple in the expression handled by the type processing library, and >the second part handled by the docstring processing library. > >Which says to me that the right solution is for the annotation to be split >up into its constituent parts before the libraries ever see it. > >This could be done as Collin suggests by tampering with >__signature__.annotations before calling each decorator, but I think it is >cleaner to do it by defining a particular signature for decorators that >are intended to process annotations. Now you're embedding a particular implementation again. The way to do this that imposes the least constraints on users, is to just have an 'iter_annotations()' overloadable function, and let it iterate over lists and tuples, and yield anything else, e.g.: @iter_annotations.when(tuple) @iter_annotations.when(list) def iter_annotation_sequence(annotation): for a in annotation: for aa in iter_annotations(a): yield aa Now, if you have some custom annotation type that contains other annotations, you need only add a method to iter_annotations, and everything works. In contrast, your approach is too limiting because you're *creating a framework* that then everyone has to conform to. I want annotations to be framework-free. I don't even think that the stdlib needs to provide an iter_annotations function, because there's no reason not to just define a method similar to the above for the specific operations you're doing. In fact the general rule of overloadable functions is that the closer to the domain semantics the function is, the better. For example, a 'generateCodeFor(annotation)' overloaded function that can walk annotation sequences itself is a better idea than writing a non-overloaded function that uses iter_annotations() and then generates code for individual annotations, because it allows for better overloads. For example, if you have a type that contains something that would ordinarily be considered separate annotation objects, but which the code generator could combine in some way to produce more optimal code. Walking the annotations and then generating code would rob you of the opportunity to define an optimization overload in this case. And *that* is why I don't think the stdlib should impose any semantics on annotations -- semantic imposition doesn't *fix* incompatibility, it *creates* it. How? Because if somebody needs to do something that doesn't fit within the imposed semantics, they are forced to create their own, and they now must reinvent everything so it works with their own! This is the history of Python frameworks in a nutshell, and it's entirely avoidable. We should leave the semantics open, precisely so that it will force people to make their code *extensible*. As a side benefit, it provides a nice example of when and how to use overloaded functions effectively. From pje at telecommunity.com Sat Aug 12 18:39:15 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Sat, 12 Aug 2006 12:39:15 -0400 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <43aa6ff70608112133w7eb2d0c6x287c021b108974b@mail.gmail.com > References: <5.1.1.6.0.20060811225402.0228c178@sparrow.telecommunity.com> <5.1.1.6.0.20060811225402.0228c178@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20060812121239.03c1da60@sparrow.telecommunity.com> At 12:33 AM 8/12/2006 -0400, Collin Winter wrote: >>I don't see the point of this. A decorator should be responsible for >>manipulating the signature of its return value. Meanwhile, the semantics >>for combining annotations should be defined by an overloaded function like >>"combineAnnotations(a1,a2)" that returns a new annotation. There is no >>need to have a special chaining decorator. >> >>May I suggest that you try using Guido's Py3K overloaded function >>prototype? I expect you'll find that if you play around with it a bit, it >>will considerably simplify your view of what's required to do this. It >>truly isn't necessary to predefine what an annotation is, or even any >>structural constraints on how they will be combined, since the user is able >>to define for any given type how such things will be handled. > >I've looked at Guido's overloaded function prototype, and while I >think I'm in the direction of understanding, I'm not quite there 100%. > >Could you illustrate (in code) what you've got in mind for how to >apply overloaded functions to this problem space? You just define an overloadable function for whatever operation you want to perform on annotations. Then you define methods that implement the operation for known types, and a default method that ignores unknown types. Then you're done. If somebody wants to do more than one thing with the annotations on their functions, then everything "just works", since there is only one annotation per argument (per the PEP), and each operation is ignoring types it doesn't understand. This leaves only one problem: the possibility of incompatible interpretations for a given type of annotation -- and it is easily solved by using some container or wrapper type, for which methods can be added to the respective operations. So, let's say I'm using two decorators that have a common (and incompatible) interpretation for type "str". I need only create a type that is unique to my program, and then define methods for the overloaded functions those decorators expose. QED: any incompatibility can be trivially solved by introducing a new type. However, the most likely source of conflict is the need to specify multiple, unrelated annotations for a given argument. So, it's likely that most operations will want to interpret a list of annotations as just that: a list of annotations. But there is no *requirement* that they do so. Someone writing a library of their own that has a special use for lists is under no obligation to adhere to that pattern. Remember: any conflict can be trivially solved by introducing a new type. If you'd like me to sketch this out in code, fine, but you define the specific example you'd like to see. To me, this all seems as obvious and straightforward as 2+2=4 implying that 4-2=2. And it doesn't even have anything specifically to do with overloaded functions! If you replace overloaded functions with functions that expect to call certain method names on the objects, *the exact same principles apply*. As long as each operation gets a unique method name, any conflict can be trivially solved by introducing a new type that implements both methods. The key here is that introspection and explicit dispatching are bad. Code like this: def decorate(func): ... if isinstance(annotation,str): # do something with string is wrong, wrong, *wrong*. It should simply be doing the equivalent of: annotation.doWhatIWant() Except in the overloaded function case, it's 'doWhatIWant(annotation)'. The latter spelling has the advantage that you don't have to be able to modify the 'str' class to add a 'doWhatIWant()' method. Is this clearer now? This is known, by the way, as the "tell, don't ask" pattern. In Python, we use the variant terms "duck typing" and "EAFP" (easier to ask forgiveness than permission), but "tell, don't ask" refers specifically to the idea that you should never dig around in an object's guts to perform an operation, and instead always delegate the operation to it. Of course, delegation is impossible in the case of a "third-party" object being used -- i.e., one that can't be modified to add the necessary method. Overloaded functions remove that restriction. (This, by the way, is why I think Python should ultimately add an overloading syntax -- so that we could ultimately replace things like 'def __str__(self)' with something like 'defop str(self)'. But that's not relevant to the immediate discussion.) From paul at prescod.net Sat Aug 12 21:38:06 2006 From: paul at prescod.net (Paul Prescod) Date: Sat, 12 Aug 2006 12:38:06 -0700 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com> References: <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com> <44DD5DF0.40405@acm.org> <5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com> Message-ID: <1cb725390608121238v427fe287s303e2acdda97bab5@mail.gmail.com> Phillip. I'm having trouble following the discussion. I briefly caught up when Talin got very concrete with syntax and I would appreciate if you could offer some correspondingly remedial training. Talin's example is that metadata inventor A documents that his/her users should use this syntax for parameter docstrings: def myfunc( x : "The x coordinate", y : "The y coordinate" ) ... Then metadata inventor B documents this his/her users should use this syntax for getopt strings: class MyHandler( CommandLineHandler ): @getopt def list( infile:"i" = sys.stdin, outfile:"o" = sys.stdout ): Now the user is faced with the challenge of making these two work together in order to get the best of both worlds. What does the user type? The mechanism of overloading, function dispatching etc. is uninteresting to me until I understand what goes in the user's Python file. Syntax is important. Paul Prescod -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060812/a022b238/attachment.htm From pje at telecommunity.com Sat Aug 12 23:10:17 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Sat, 12 Aug 2006 17:10:17 -0400 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <1cb725390608121238v427fe287s303e2acdda97bab5@mail.gmail.co m> References: <5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com> <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com> <44DD5DF0.40405@acm.org> <5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com> At 12:38 PM 8/12/2006 -0700, Paul Prescod wrote: >Phillip. I'm having trouble following the discussion. I briefly caught up >when Talin got very concrete with syntax and I would appreciate if you >could offer some correspondingly remedial training. > >Talin's example is that metadata inventor A documents that his/her users >should use this syntax for parameter docstrings: > >def myfunc( x : "The x coordinate", y : "The y coordinate" ) > ... > >Then metadata inventor B documents this his/her users should use this >syntax for getopt strings: > >class MyHandler( CommandLineHandler ): > > @getopt > def list( infile:"i" = sys.stdin, outfile:"o" = sys.stdout ): > >Now the user is faced with the challenge of making these two work together >in order to get the best of both worlds. What does the user type? As long as both inventors used overloadable functions, the user can type almost *anything they want to*, as long as: 1. It's consistent, 2. It's unambiguous, and 3. They've defined the appropriate overloads. For example, they might use a 'docopt' class that allows both to be specified, or a pair of 'doc' and 'opt' objects in a list. >The mechanism of overloading, function dispatching etc. is uninteresting >to me until I understand what goes in the user's Python file. Syntax is >important. Indeed it is. Hence the importance of not forcing some particular semantics, so as to allow the user to use the types and semantics of their choosing. By the way, it should be understood that when I say "overloadable function", I simply mean some type-extensible dispatching mechanism. If you exclude built-in types from consideration, and simply have special attribute or method names, then duck typing works just as well. You can have decorators that use hasattr() and such to do their dirty work. It's only if you want to have sensible meaning for built-in types that there even begins to be an illusion that conflicts are an issue. However, the only built-in types likely to even be used in such a way are lists, dictionaries, tuples, and strings. If there's more than one way to interpret them, depending on the operation, their use is inherently ambiguous, and it's up to the person combining them to supply the differentiation. However, if you have: def myfunc( x : doc("The x coordinate"), y : doc("The y coordinate") ) There is no ambiguity. Likewise: def cat( infile:opt("i") = sys.stdin, outfile:opt("o") = sys.stdout ): is unambiguous. And the interpetation of: def cat(infile: [doc("input stream"), opt("i")] = sys.stdin, outfile: [doc("output stream"), opt("o")] = sys.stdout ): is likewise unambiguous, unless the creator of the documentation or option features has defined some other interpretation for a list than "recursively apply to contained items". In which case, you need only do something like: def cat(infile: docopt("input stream", "i") = sys.stdin, outfile: docopt("output stream", "o") = sys.stdout ): with an appropriate definition of methods for the 'docopt' type. Since many people seem to be unfamiliar with overloaded functions, I would just like to take this opportunity to remind you that the actual overload mechanism is irrelevant. If you gave 'doc' objects a 'printDocString()' method and 'opt' objects a 'setOptionName()' method, the exact same logic regarding extensibility applies. The 'docopt' type would simply implement both methods. This is normal, simple standard Python stuff; nothing at all fancy. The only thing that overloaded functions add to this is that they allow you to (in effect) add methods to existing types without monkeypatching. Thus, you can define overloads for built-in types, and types you didn't implement yourself. Even if overloaded functions didn't exist, it wouldn't be necessary to invent them just to allow arbitrary annotation semantics! It simply requires that operations that *use* annotations always follow the "tell, don't ask" pattern, whether it's done by duck typing, EAFP, or overloaded functions. From talin at acm.org Sat Aug 12 23:07:18 2006 From: talin at acm.org (Talin) Date: Sat, 12 Aug 2006 14:07:18 -0700 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <1cb725390608121238v427fe287s303e2acdda97bab5@mail.gmail.com> References: <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com> <44DD5DF0.40405@acm.org> <5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com> <1cb725390608121238v427fe287s303e2acdda97bab5@mail.gmail.com> Message-ID: <44DE4306.4070304@acm.org> Paul Prescod wrote: > Phillip. I'm having trouble following the discussion. I briefly caught up > when Talin got very concrete with syntax and I would appreciate if you > could > offer some correspondingly remedial training. > > Talin's example is that metadata inventor A documents that his/her users > should use this syntax for parameter docstrings: > > def myfunc( x : "The x coordinate", y : "The y coordinate" ) > ... One important point I want to mention. I deliberately did *not* show a decorator for this above example. The reason for this is that the docstring annotations are not intended for consumption by a decorator function - they are intended for consumption by an external program that extracts documentation. More specifically, this external doc extractor program would be part of a standard package of documentation tools, written by an entirely different author than the person actually writing 'myfunc'. This doc extractor knows nothing about decorators, and is unconcerned with their presence. So I'd like Phillip to incorporate that into his explanation of how that is all supposed to work. -- Talin From talin at acm.org Sun Aug 13 00:00:45 2006 From: talin at acm.org (Talin) Date: Sat, 12 Aug 2006 15:00:45 -0700 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com> References: <5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com> <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com> <44DD5DF0.40405@acm.org> <5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com> <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com> Message-ID: <44DE4F8D.6050503@acm.org> Phillip J. Eby wrote: > At 12:38 PM 8/12/2006 -0700, Paul Prescod wrote: > However, if you have: > > def myfunc( x : doc("The x coordinate"), y : doc("The y coordinate") ) > > There is no ambiguity. Likewise: > > def cat( infile:opt("i") = sys.stdin, outfile:opt("o") = sys.stdout ): > > is unambiguous. And the interpetation of: > > def cat(infile: [doc("input stream"), opt("i")] = sys.stdin, > outfile: [doc("output stream"), opt("o")] = sys.stdout > ): By doing this, you've already introduced an implicit requirement for annotations: Rather than saying that annotations can be "any format you want", the actual restriction is "any format you want that is distinguishable from other formats." More specifically, the rule is that annotations intended for different consumers must be distinguishable from each other via rule. This is in direct contradiction with the statement in the PEP that says that annotations have no predefined syntax or semantics -- they are required to have, at minimum, semantics sufficient to allow rule-based discrimination. (BTW, I propose the term "Annotation Consumer" to mean a body of code that is intended to process annotations. You can have decorator-based consumers, as well as external consumers that are not part of the decorator stack and which inspect the function signature directly, without invoking the decorators.) Lets use the term 'discriminator' to indicate any means, using function overloading or whatever, of determining which consumers should process which annotations. Lets also define the term 'discriminator protocol' to mean any input specifications to the discriminator - so in the above example, 'doc()' and 'opt()' are part of the discriminator protocol. Now, you are trying very hard not to specify a standard discriminator protocol, but the fact is that if you don't do it, someone else will. Nobody wants to have to write their own discriminator for each application. And you can't mix discriminator protocols unless those protocols are a priori compatible. Thus, there is very strong pressure to create a single, standard discriminator, or at least a standard discriminator protocol. The pressure is based on the fact that most users would rather deal with a protocol that someone else has written rather than writing their own. And because mixing protocols has the potential for discrimination errors, a heterogeneous environment with multiple protocols will inevitably degenerate into one where a single protocol has a monopoly. So why don't you save us all the trouble and pain and just define the standard discrimination mechanism up front? As I have shown, its going to happen anyway - its inevitable - and delaying the decision simply means a lot of heartache for a lot of folk until the one true discriminator takes over. (Which is another thing that I was trying to illustrate with my SysEx story.) As a footnote, I'd like to make a philosophical point about designing protocols. A 'protocol' (not in the technical sense, but in the sense of human relations) is simply an agreement to curtail the range of one's behavior to a restricted subset of what one is capable of, in order to facilitate cooperation between individuals. Language is a protocol - as I am typing this message, I implicitly agree to use words of English, rather than random made-up syllables, in order to facilitate understanding of my meaning. Now, the curious and paradoxical thing about protocols is that in order to give the most freedom, you have to take some freedom away. Taking away certain freedoms can give you *more* freedom, because it allows you to predict and rely on the behaviors of the other participants in the protocol, enabling you to accomplish things that you wouldn't be able to do otherwise. For a given situation, there will be some "sweet spot", some balance between openness and restriction, that will give the largest amount of "effective" freedom and capability to the participants. Here's an example: Cultures which have a strong mercantile ethic for fair dealing and enforcement of contracts tend to have vastly more efficient national economies. In countries where the mercantile ethic is poor, transaction costs are much higher - each individual has to spend effort vetting and enforcing each potential transaction, instead of being able to simply trust the other person. So by voluntarily restricting ones behavior to not unfairly take advantage of others and thus gain a temporary local advantage, one gains a huge advantage on the aggregate level. For this reason, I am skeptical of the benefit of completely open-ended protocols. The value of the protocol is in the agreement between individuals - if the individuals don't agree on much, then there's not much value to be had. -- Talin From paul at prescod.net Sun Aug 13 02:05:56 2006 From: paul at prescod.net (Paul Prescod) Date: Sat, 12 Aug 2006 17:05:56 -0700 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com> References: <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com> <44DD5DF0.40405@acm.org> <5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com> <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com> Message-ID: <1cb725390608121705s6e43b02fo28b4e83865c914ab@mail.gmail.com> It seems to me that there are two very reasonable positions being expressed. Is the following (non-normative) text a compromise? "In order for processors of function annotations to work interoperably, they must use a common interpretation of objects used as annotations on a particular function. For example, one might interpret string annotations as docstrings. Another might interpet them as path segments for a web framework. For this reason, function annotation processors SHOULD avoid assigning processor-specific meanings to types defined outside of the processor's framework. For example, a Django processor could process annotations of a type defined in a Zope package, but Zope's creators should be considered the authorities on the type's meaning for the same reasons that they would be considered authorities on the semantics of classes or methods in their packages. This implies that the interpretation of built-in types would be controlled by Python's developers and documented in Python's documentation. This is just a best practice. Nothing in the language can or should enforce this practice and there may be a few domains where there is a strong argument for violating it (e.g. an education environment where saving keystrokes may be more important than easing interopability)." "In Python 3000, semantics will be attached to the following types: basestring and its subtypes are to be used for documentation (though they are not necessarily the exclusive source of documentation about the type). List and its subtypes are to be used for attaching multiple independent annotations." (does chaining make sense in this context?) Paul Prescod -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060812/978a3b2c/attachment.htm From greg.ewing at canterbury.ac.nz Sun Aug 13 03:26:26 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 13 Aug 2006 13:26:26 +1200 Subject: [Python-3000] threading, part 2 --- + a bit of ctypes FFI worry In-Reply-To: <9eebf5740608120411m40da5724r11700fdbe509914@mail.gmail.com> References: <1f7befae0608120329wc646164w25ca4875da4cc5c0@mail.gmail.com> <44DDB5E4.9010903@canterbury.ac.nz> <9eebf5740608120411m40da5724r11700fdbe509914@mail.gmail.com> Message-ID: <44DE7FC2.4030501@canterbury.ac.nz> Lawrence Oluyede wrote: > rctypes and pypy tools are somewhat one step further than ctypes > machinery. In rctypes you can easily do something like: > > size_t = ctypes_platform.SimpleType("size_t", c_ulong) Does this work dynamically, or does it rely on C code being generated and the C compiler working out the details? -- Greg From l.oluyede at gmail.com Sun Aug 13 03:42:44 2006 From: l.oluyede at gmail.com (Lawrence Oluyede) Date: Sun, 13 Aug 2006 03:42:44 +0200 Subject: [Python-3000] threading, part 2 --- + a bit of ctypes FFI worry In-Reply-To: <44DE7FC2.4030501@canterbury.ac.nz> References: <1f7befae0608120329wc646164w25ca4875da4cc5c0@mail.gmail.com> <44DDB5E4.9010903@canterbury.ac.nz> <9eebf5740608120411m40da5724r11700fdbe509914@mail.gmail.com> <44DE7FC2.4030501@canterbury.ac.nz> Message-ID: <9eebf5740608121842x4c1492baq9e049302905c2837@mail.gmail.com> > Does this work dynamically, or does it rely on > C code being generated and the C compiler working > out the details? It relies on C... that somewhat hinders the usefulness of the process. There's also the code generator option but we're again onto a compilation stuff. -- Lawrence http://www.oluyede.org/blog From pje at telecommunity.com Sun Aug 13 04:21:47 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Sat, 12 Aug 2006 22:21:47 -0400 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <1cb725390608121705s6e43b02fo28b4e83865c914ab@mail.gmail.co m> References: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com> <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com> <44DD5DF0.40405@acm.org> <5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com> <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20060812221550.0258ce68@sparrow.telecommunity.com> At 05:05 PM 8/12/2006 -0700, Paul Prescod wrote: >It seems to me that there are two very reasonable positions being >expressed. Is the following (non-normative) text a compromise? > >"In order for processors of function annotations to work interoperably, >they must use a common interpretation of objects used as annotations on a >particular function. For example, one might interpret string annotations >as docstrings. Another might interpet them as path segments for a web >framework. For this reason, function annotation processors SHOULD avoid >assigning processor-specific meanings to types defined outside of the >processor's framework. For example, a Django processor could process >annotations of a type defined in a Zope package, but Zope's creators >should be considered the authorities on the type's meaning for the same >reasons that they would be considered authorities on the semantics of >classes or methods in their packages. This implies that the interpretation >of built-in types would be controlled by Python's developers and >documented in Python's documentation. This is just a best practice. >Nothing in the language can or should enforce this practice and there may >be a few domains where there is a strong argument for violating it ( e.g. >an education environment where saving keystrokes may be more important >than easing interopability)." I mostly like this; the main issue I see is that as long as we're recommending best practices, we should recommend using tell-don't-ask (via duck typing protocols, adaptation, or overloaded functions) so that their libraries can be enhanced and extended by other developers. >"In Python 3000, semantics will be attached to the following types: >basestring and its subtypes are to be used for documentation (though they >are not necessarily the exclusive source of documentation about the type). >List and its subtypes are to be used for attaching multiple independent >annotations." I'm not sure why we would use strings for documentation, but I'm not opposed since it eliminates the question of multiple interpretations for strings. >(does chaining make sense in this context?) I don't know if I know what you mean by "chaining". Good use of tell-don't-ask means that any interpretation of annotations nested in other annotations would be defined by the enclosing annotation (or in an overload for it). From pje at telecommunity.com Sun Aug 13 04:23:00 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Sat, 12 Aug 2006 22:23:00 -0400 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: Message-ID: <5.1.1.6.0.20060812215907.0226e808@sparrow.telecommunity.com> At 03:00 PM 8/12/2006 -0700, Talin wrote: >Phillip J. Eby wrote: > > At 12:38 PM 8/12/2006 -0700, Paul Prescod wrote: > > > However, if you have: > > > > def myfunc( x : doc("The x coordinate"), y : doc("The y coordinate") ) > > > > There is no ambiguity. Likewise: > > > > def cat( infile:opt("i") = sys.stdin, outfile:opt("o") = sys.stdout ): > > > > is unambiguous. And the interpetation of: > > > > def cat(infile: [doc("input stream"), opt("i")] = sys.stdin, > > outfile: [doc("output stream"), opt("o")] = sys.stdout > > ): > >By doing this, you've already introduced an implicit requirement for >annotations: Rather than saying that annotations can be "any format you >want", the actual restriction is "any format you want that is >distinguishable from other formats." And your point is what? > More specifically, the rule is that >annotations intended for different consumers must be distinguishable >from each other via rule. This is in direct contradiction with the >statement in the PEP that says that annotations have no predefined >syntax or semantics -- they are required to have, at minimum, semantics >sufficient to allow rule-based discrimination. You've lost me here entirely. If we didn't want unambiguous semantics, we'd write programs in English, not Python. :) >(BTW, I propose the term "Annotation Consumer" to mean a body of code >that is intended to process annotations. You can have decorator-based >consumers, as well as external consumers that are not part of the >decorator stack and which inspect the function signature directly, >without invoking the decorators.) Um, okay. I'm not sure what benefit this new term adds over "operation that uses annotations", which is what I've been using, but whatever. >Lets use the term 'discriminator' to indicate any means, using function >overloading or whatever, of determining which consumers should process >which annotations. Lets also define the term 'discriminator protocol' to > mean any input specifications to the discriminator - so in the above >example, 'doc()' and 'opt()' are part of the discriminator protocol. Um, what? Why are you adding all this complication to a simple idea? Duck typing is normal, simple, standard Python programming practice. We use objects with methods all the time, and check for the existence of attributes all the time. I don't understand why you insist on making that more complicated than it is. It's really simple. Annotations are objects. Objects can be inspected, or selected by type. You can do what you want to with them. How complex is that? (Meanwhile, I'm going to ignore all the red herrings about freedom and commerce and other rigamarole that has absolutely nothing to do with argument annotations.) Going forward, may I suggest you take a look at Java and C# argument annotations before continuing to pursue this spurious line of reasoning? I'm curious to see what your explanation will be for why these other languages doesn't have the problems that you claim will inevitably occur. Meanwhile, if library authors write bad code because they don't understand basic OO concepts like duck typing and "tell, don't ask", then their users will educate them when they complain about not being able to use multiple annotation types. Providing good examples and recommending best practices is one thing, but mandating a particular semantics is another. From exarkun at divmod.com Sun Aug 13 05:21:49 2006 From: exarkun at divmod.com (Jean-Paul Calderone) Date: Sat, 12 Aug 2006 23:21:49 -0400 Subject: [Python-3000] [Python-Dev] What is the status of file.readinto? In-Reply-To: Message-ID: <20060813032149.1717.1953938655.divmod.quotient.21274@ohm> On Sat, 12 Aug 2006 19:28:44 -0700, Guido van Rossum wrote: >On 8/12/06, "Martin v. L?wis" wrote: >> I can only guess why it may go away; my guess it will go away when >> the buffer interface is removed from Python (then it becomes >> unimplementable). > >In Py3k, the I/O APIs will be redesigned, especially the binary ones. >My current idea is to have read() on a binary file return a bytes >object. If readinto() continues to be necessary, please make sure the >Py3k list (python-3000 at python.org) knows about your use case. We >aren't quite writing up the I/O APIs in PEP-form, but when we do, that >would be the right time to speak up. > The ability to read into pre-allocated memory is fairly important for high-performance applications. This should be preserved somehow (and preferably given a real, supported API). Jean-Paul From ironfroggy at gmail.com Sun Aug 13 05:50:26 2006 From: ironfroggy at gmail.com (Calvin Spealman) Date: Sat, 12 Aug 2006 23:50:26 -0400 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <1cb725390608121705s6e43b02fo28b4e83865c914ab@mail.gmail.com> References: <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com> <44DD5DF0.40405@acm.org> <5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com> <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com> <1cb725390608121705s6e43b02fo28b4e83865c914ab@mail.gmail.com> Message-ID: <76fd5acf0608122050v75aa6dbbs32bf05f85222fa7e@mail.gmail.com> I am getting very tired of gmail's ingoring of the mailing-list headers in context of replying! Anyway, here is what I accidentally sent as personal messages related to this thread. Replying to Talin's long story about MIDI devices: WOW I won't even pretend to reply with anything near a similar sized body of text. Condolences go out to you for the water and laptop, by the way. Anyways... Although this is a humourous story (post it somewhere readily with some more fleshiness, maybe!) and I enjoyed reading it quite a bit, I saw where it was going very early on and disagreed immedately with the point I see you trying to get across. The thing is, the situations are too different to compare so bluntly. The era from which this story comes was a different world, which was far more brutal for any attempts at loose cooperation than we can do today, what with the internet and this being lots of open source software, not a hundred and fifty competing MIDI vendors who think compatibility would just make it easier to loose customers. The simplicity of the matter is that there won't be that many annotation libraries, and mixing them will be possible. When someone writes the good type annottation handling library, other people (even those writing other annotation libraries) will use it, until it reaches the point that it will get put into the standard library. And, lets no one pretend that will not happen. De facto and even just mildly common libraries almost always get pushed into the standard library eventually, but having some time in the wild is good for evolution to take its course. And to what Paul Said here: On 8/12/06, Paul Prescod wrote: > It seems to me that there are two very reasonable positions being expressed. > Is the following (non-normative) text a compromise? > > "In order for processors of function annotations to work interoperably, they > must use a common interpretation of objects used as annotations on a > particular function. For example, one might interpret string annotations as > docstrings. Another might interpet them as path segments for a web > framework. For this reason, function annotation processors SHOULD avoid > assigning processor-specific meanings to types defined outside of the > processor's framework. For example, a Django processor could process > annotations of a type defined in a Zope package, but Zope's creators should > be considered the authorities on the type's meaning for the same reasons > that they would be considered authorities on the semantics of classes or > methods in their packages. This implies that the interpretation of built-in > types would be controlled by Python's developers and documented in Python's > documentation. This is just a best practice. Nothing in the language can or > should enforce this practice and there may be a few domains where there is a > strong argument for violating it ( e.g. an education environment where > saving keystrokes may be more important than easing interopability)." > > "In Python 3000, semantics will be attached to the following types: > basestring and its subtypes are to be used for documentation (though they > are not necessarily the exclusive source of documentation about the type). > List and its subtypes are to be used for attaching multiple independent > annotations." > > (does chaining make sense in this context?) > > Paul Prescod I've been looking for a good place to pipe in with the suggestion of defining that a dictionary as an annotation is taken as a mapping of annotation type names to the annotation itself, such as using {'doc': "The single character argument for the command line.", 'type': int} as an annotation for some parameter in a function. However, reading through all the posts I missed recooperating from a long trip I just returned from, I think this coupled with taking _any iterable_ (not just list and subtypes) and the whole "your type, your annotation" guideline, is definately sufficient for all uses. From jimjjewett at gmail.com Sun Aug 13 05:56:15 2006 From: jimjjewett at gmail.com (Jim Jewett) Date: Sat, 12 Aug 2006 23:56:15 -0400 Subject: [Python-3000] PEP3102 Keyword-Only Arguments In-Reply-To: References: Message-ID: On 8/11/06, Jiwon Seo wrote: > When we have keyword-only arguments, do we allow 'keyword dictionary' > argument? If that's the case, where would we want to place > keyword-only arguments? > Are we going to allow any of followings? > 1. def foo(a, b, *, key1=None, key2=None, **map) Seems perfectly reasonable. I think the controversy was over whether or not to allow keyword-only without a default. > 2. def foo(a, b, *, **map, key1=None, key2=None) Seems backward, though I suppose we could adjust if we needed to. > 3. def foo(a, b, *, **map) What would the * even mean, since there aren't any named keywords to separate? -jJ From talin at acm.org Sun Aug 13 06:05:27 2006 From: talin at acm.org (Talin) Date: Sat, 12 Aug 2006 21:05:27 -0700 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <5.1.1.6.0.20060812215907.0226e808@sparrow.telecommunity.com> References: <5.1.1.6.0.20060812215907.0226e808@sparrow.telecommunity.com> Message-ID: <44DEA507.9040900@acm.org> Phillip J. Eby wrote: > At 03:00 PM 8/12/2006 -0700, Talin wrote: >> Phillip J. Eby wrote: >> > At 12:38 PM 8/12/2006 -0700, Paul Prescod wrote: >> >> > However, if you have: >> > >> > def myfunc( x : doc("The x coordinate"), y : doc("The y >> coordinate") ) >> > >> > There is no ambiguity. Likewise: >> > >> > def cat( infile:opt("i") = sys.stdin, outfile:opt("o") = >> sys.stdout ): >> > >> > is unambiguous. And the interpetation of: >> > >> > def cat(infile: [doc("input stream"), opt("i")] = sys.stdin, >> > outfile: [doc("output stream"), opt("o")] = sys.stdout >> > ): >> >> By doing this, you've already introduced an implicit requirement for >> annotations: Rather than saying that annotations can be "any format you >> want", the actual restriction is "any format you want that is >> distinguishable from other formats." > > And your point is what? > My point is that this statement in the Collin's PEP is wrong: > There is no worry that these libraries will assign semantics at > random, or that a variety of libraries will appear, each with varying > semantics and interpretations of what, say, a tuple of strings > means. The difficulty inherent in writing annotation interpreting > libraries will keep their number low and their authorship in the > hands of people who, frankly, know what they're doing. The way I read this is "there is no need for annotations to be designed so as not to interfere with one another, nor does there need to be any mechanism defined in this PEP for resolving such interference". I and others have provided extensive use cases to show that unless care is taken, different annotations *will* step on each others toes. >> More specifically, the rule is that >> annotations intended for different consumers must be distinguishable >> from each other via rule. This is in direct contradiction with the >> statement in the PEP that says that annotations have no predefined >> syntax or semantics -- they are required to have, at minimum, semantics >> sufficient to allow rule-based discrimination. > > You've lost me here entirely. If we didn't want unambiguous semantics, > we'd write programs in English, not Python. :) Again, look at the language of the PEP. >> (BTW, I propose the term "Annotation Consumer" to mean a body of code >> that is intended to process annotations. You can have decorator-based >> consumers, as well as external consumers that are not part of the >> decorator stack and which inspect the function signature directly, >> without invoking the decorators.) > > Um, okay. I'm not sure what benefit this new term adds over "operation > that uses annotations", which is what I've been using, but whatever. > I'm just trying to get a handle on this stuff so that we can *talk* about it. >> Lets use the term 'discriminator' to indicate any means, using function >> overloading or whatever, of determining which consumers should process >> which annotations. Lets also define the term 'discriminator protocol' to >> mean any input specifications to the discriminator - so in the above >> example, 'doc()' and 'opt()' are part of the discriminator protocol. > > Um, what? Why are you adding all this complication to a simple idea? I'm not adding anything to the concept, I am trying to come up with a way to *talk* about the concept. So far the whole conversation has gotten very confused because we're dealing with some highly abstract stuff here. > Duck typing is normal, simple, standard Python programming practice. We > use objects with methods all the time, and check for the existence of > attributes all the time. > > I don't understand why you insist on making that more complicated than > it is. It's really simple. Annotations are objects. Objects can be > inspected, or selected by type. You can do what you want to with them. > > How complex is that? It gets complex when you have more than one inspector or selector. What we are arguing about is how much the various inspectors/selectors need to know about each other. And while the answer is hopefully "not much", I hope that I have shown that it cannot be "nothing at all". There has to be some ground rules for cooperation, or cooperation is impossible, that's basic logic. > (Meanwhile, I'm going to ignore all the red herrings about freedom and > commerce and other rigamarole that has absolutely nothing to do with > argument annotations.) Don't think of it as red herrings. Think of it as, um, "highly non-linear train of thought". :) > Going forward, may I suggest you take a look at Java and C# argument > annotations before continuing to pursue this spurious line of > reasoning? I'm curious to see what your explanation will be for why > these other languages doesn't have the problems that you claim will > inevitably occur. Dude, you don't want to know how many man-years of C# programming I've done :) Lets take C# attributes as an example. C# Attributes have the following syntactical/semantic structure: 1) They must be derived from the base class "Attribute". (This by itself is not really significant.) 2) Attributes are distinguished by type, or in some cases by value. 3) The types do not overlap. 4) A given consumer of attributes can always distinguish attributes which are relevant to their purposes to attributes which are not, even against hypothetical future annotations which have not yet been established. As a user, when I add an attribute to a method, I know that (a) there is a known consumer of that attribute, (b) That it is impossible for an attribute which is not intended for that consumer to be confused for one that is. if I set [Browseable(false)] on a property, I know exactly how that attribute is going to be interpreted, and by what component. If someone comes along later and adds a new annotation called "SortOfBrowseable", which has many of the same attributes as Browseable, there will never be the possibility that there annotation and mine can get confused with each other. (As opposed to Python, where it's relatively easy to have classes that masquerade as one another.) The Annotation PEP, on the other hand, makes none of these guarantees, because it tries hard not to guarantee anything. It doesn't specify the mechanism by which one annotation is distinguished from another; Unlike the C# attributes which are organized into a tree of types, the annotations have no organization and no categorization defined. Because there is no prohibition against category overlap, that means that the annotations that I write today might one day in the future match against a newly-created category, with results that I can't predict. I also want to point out that C# attributes are very different from Python decorators, so you can't use analogies between them. Decorators are active agents - that is, they hook into the process of defining a method. Because of this, decorators have the option of having all of their semantic meaning buried within the decorator itself. In essence, the rule by which decorators "play nice" with each other is already defined - each gets a shot at modifying the function object, and each receives the result of the previous decorator. C# attributes and function annotations, on the other hand, are purely passive - they have no knowledge of what they are attached to, and their only meaning is derived from external use. They themselves don't have to play nice with each other, but the interpreters / inspectors / consumers do. > Meanwhile, if library authors write bad code because they don't > understand basic OO concepts like duck typing and "tell, don't ask", > then their users will educate them when they complain about not being > able to use multiple annotation types. > > Providing good examples and recommending best practices is one thing, > but mandating a particular semantics is another. From jcarlson at uci.edu Sun Aug 13 06:16:18 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Sat, 12 Aug 2006 21:16:18 -0700 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com> References: <1cb725390608121238v427fe287s303e2acdda97bab5@mail.gmail.co m> <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com> Message-ID: <20060812205512.197A.JCARLSON@uci.edu> "Phillip J. Eby" wrote: > However, if you have: > > def myfunc( x : doc("The x coordinate"), y : doc("The y coordinate") ) > > There is no ambiguity. Likewise: > > def cat( infile:opt("i") = sys.stdin, outfile:opt("o") = sys.stdout ): > > is unambiguous. And the interpetation of: > > def cat(infile: [doc("input stream"), opt("i")] = sys.stdin, > outfile: [doc("output stream"), opt("o")] = sys.stdout > ): > > is likewise unambiguous, unless the creator of the documentation or option > features has defined some other interpretation for a list than "recursively > apply to contained items". In which case, you need only do something like: > > def cat(infile: docopt("input stream", "i") = sys.stdin, > outfile: docopt("output stream", "o") = sys.stdout > ): I now understand where you were coming from with regards to this being equivalent to pickle (at least pickle + copy_reg). I think that if you would have posted this particular sample a couple days ago, there wouldn't have been the discussion (argument?) about incompatible mechanisms for annotation processing. With that said, the above is a protocol. Just like __len__, __str__, copy_reg, __reduce__, __setstate__, etc., are protocols. It may not be fully specified (when annotations are to be processed, if at all, by whom, where the annotation registry is, etc.), but it is still a protocol. Do we need any more specification for the PEP and 2.6/3k? I don't know, maybe. You claim no, with the history of PEAK and other languages as proof that doing anything more is unnecessary. And I can understand why you would resist any further specification: PEAK has been doing annotations for quite a while, and additional specifications could make transitioning to these annotations a pain in the ass for you and your users. I'm personally not convinced that no further specification is desired or necessary (provided we include a variant of the above example annotations), but I also cannot convince myself that specifying anything further would be flexible enough to not be a mistake. - Josiah From pje at telecommunity.com Sun Aug 13 07:05:13 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun, 13 Aug 2006 01:05:13 -0400 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <44DEA507.9040900@acm.org> References: <5.1.1.6.0.20060812215907.0226e808@sparrow.telecommunity.com> <5.1.1.6.0.20060812215907.0226e808@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20060813005228.022737c8@sparrow.telecommunity.com> At 09:05 PM 8/12/2006 -0700, Talin wrote: >What we are arguing about is how much the various inspectors/selectors >need to know about each other. And while the answer is hopefully "not >much", I hope that I have shown that it cannot be "nothing at all". As I've previously stated, they need to know enough to ignore what they don't understand. And, to be useful, they should allow user extension via duck typing or overloading. > There has to be some ground rules for cooperation, or cooperation is > impossible, that's basic logic. See the ground rules provided above. >>Going forward, may I suggest you take a look at Java and C# argument >>annotations before continuing to pursue this spurious line of >>reasoning? I'm curious to see what your explanation will be for why >>these other languages doesn't have the problems that you claim will >>inevitably occur. > >Dude, you don't want to know how many man-years of C# programming I've done :) > >Lets take C# attributes as an example. C# Attributes have the following >syntactical/semantic structure: > > 1) They must be derived from the base class "Attribute". (This by > itself is not really significant.) > 2) Attributes are distinguished by type, or in some cases by value. > 3) The types do not overlap. > 4) A given consumer of attributes can always distinguish attributes > which are relevant to their purposes to attributes which are not, even > against hypothetical future annotations which have not yet been established. I fail to see how this is different from what I've already said. >As a user, when I add an attribute to a method, I know that (a) there is a >known consumer of that attribute, (b) That it is impossible for an >attribute which is not intended for that consumer to be confused for one >that is. if I set [Browseable(false)] on a property, I know exactly how >that attribute is going to be interpreted, and by what component. If >someone comes along later and adds a new annotation called >"SortOfBrowseable", which has many of the same attributes as Browseable, >there will never be the possibility that there annotation and mine can get >confused with each other. Again, so far it sounds just like the existing proposal. > (As opposed to Python, where it's relatively easy to have classes that > masquerade as one another.) That's a feature, not a bug. :) >The Annotation PEP, on the other hand, makes none of these guarantees, >because it tries hard not to guarantee anything. It doesn't specify the >mechanism by which one annotation is distinguished from another; Unlike >the C# attributes which are organized into a tree of types, the >annotations have no organization and no categorization defined. Because >there is no prohibition against category overlap, that means that the >annotations that I write today might one day in the future match against a >newly-created category, with results that I can't predict. Not if the annotation consumers simply use a tell-don't-ask pattern -- a pattern which I've repeatedly explained, and which can be trivially implemented with either duck typing or overloading. >I also want to point out that C# attributes are very different from Python >decorators, so you can't use analogies between them. That statement makes me think that the reason we're not communicating is that you are talking about something else than I am. I never compared Python decorators and C# attributes. In fact, I've rarely mentioned decorators at all and have tried as much as possible to push decorators *out* of the conversation, because they are irrelevant. Documentation tools, for example, are unlikely to use decorators. Metaclasses also aren't decorators, but both documentation tools and metaclasses are likely candidates for consuming annotation data. Thus, I prefer to talk about "operations using annotations" since decorators are only a kind of "delivery vector" for such annotation-consuming operations. >C# attributes and function annotations, on the other hand, are purely >passive - they have no knowledge of what they are attached to, and their >only meaning is derived from external use. They themselves don't have to >play nice with each other, but the interpreters / inspectors / consumers do. And precisely the same things are true of Python function annotations. I'm still lost as to why you think there's something different going on here. Python decorators simply provide a vector for immediate annotation processing -- one that is entirely orthogonal to the notion of annotations themselves. From pje at telecommunity.com Sun Aug 13 07:21:01 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun, 13 Aug 2006 01:21:01 -0400 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <20060812205512.197A.JCARLSON@uci.edu> References: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com> <1cb725390608121238v427fe287s303e2acdda97bab5@mail.gmail.co m> <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20060813010634.0228ee30@sparrow.telecommunity.com> At 09:16 PM 8/12/2006 -0700, Josiah Carlson wrote: >"Phillip J. Eby" wrote: > > However, if you have: > > > > def myfunc( x : doc("The x coordinate"), y : doc("The y coordinate") ) > > > > There is no ambiguity. Likewise: > > > > def cat( infile:opt("i") = sys.stdin, outfile:opt("o") = sys.stdout ): > > > > is unambiguous. And the interpetation of: > > > > def cat(infile: [doc("input stream"), opt("i")] = sys.stdin, > > outfile: [doc("output stream"), opt("o")] = sys.stdout > > ): > > > > is likewise unambiguous, unless the creator of the documentation or option > > features has defined some other interpretation for a list than > "recursively > > apply to contained items". In which case, you need only do something like: > > > > def cat(infile: docopt("input stream", "i") = sys.stdin, > > outfile: docopt("output stream", "o") = sys.stdout > > ): > >I now understand where you were coming from with regards to this being >equivalent to pickle (at least pickle + copy_reg). I think that if you >would have posted this particular sample a couple days ago, there >wouldn't have been the discussion (argument?) about incompatible >mechanisms for annotation processing. Well, it just seemed to me that that was the One Obvious Way To Do It; more specifically, I couldn't conceive of any *other* way to do it! >With that said, the above is a protocol. Just like __len__, __str__, >copy_reg, __reduce__, __setstate__, etc., are protocols. It may not be >fully specified (when annotations are to be processed, if at all, by >whom, where the annotation registry is, etc.), but it is still a >protocol. Actually, it's a family of *patterns* for creating protocols. It's not a protocol, incompletely specified or otherwise. Note that the actual implementation of the tell-don't-ask pattern can be via: 1. duck typing (i.e., prearranged method names) 2. adaptation 3. overloaded functions (any of several implementations) 4. ad hoc type-based registries So it isn't even a *meta*-protocol, just a pattern family. >Do we need any more specification for the PEP and 2.6/3k? I don't know, >maybe. You claim no, with the history of PEAK and other languages as >proof that doing anything more is unnecessary. And I can understand why >you would resist any further specification: PEAK has been doing >annotations for quite a while, and additional specifications could make >transitioning to these annotations a pain in the ass for you and your >users. Not really; PEAK's annotations are currently only on *attributes* and *classes*, not functions, arguments, or return values. I was merely using it as an example of how overloaded functions allow heterogeneous annotations to coexist without needing any prearranged common semantics. But I don't believe we know enough *today* to be able to safely define a rigid specification without ruling out possibly-valid uses. By making a less-rigid specification, we force annotation consumers to code defensively... which is really the right thing to do in a heterogeneous environment anyway. >I'm personally not convinced that no further specification is desired or >necessary (provided we include a variant of the above example >annotations), As I said, I'd prefer to see the tell-don't-ask pattern specifically cited and recommended, perhaps with examples. I'll note, however, that the only consequence of *not* following that pattern is that you create a non-extensible, non-interoperable framework -- of which Python has huge numbers already. This is not so damaging an outcome as to be worrisome, any more than we worry about people creating incompatible metaclasses today! >but I also cannot convince myself that specifying anything >further would be flexible enough to not be a mistake. Right - that's the bit I'm concerned about. Python also usually doesn't impose such policy constraints on mechanism. For example, function attributes can be or contain anything, and nobody has argued that there need to be prespecified combination semantics, despite the fact that multiple tools can be consumers of the attributes. From jimjjewett at gmail.com Sun Aug 13 07:29:52 2006 From: jimjjewett at gmail.com (Jim Jewett) Date: Sun, 13 Aug 2006 01:29:52 -0400 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <20060812205512.197A.JCARLSON@uci.edu> References: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com> <20060812205512.197A.JCARLSON@uci.edu> Message-ID: On 8/13/06, Josiah Carlson wrote: > "Phillip J. Eby" wrote: > > However, if you have: > > def myfunc( x : doc("The x coordinate"), y : doc("The y coordinate") ) > > There is no ambiguity. Sure there is. There will probably be several frameworks using the magic name "doc". This isn't a problem for the person writing myfunc, and therefore isn't a problem for immediate decorators. It is a problem for inspection code that wants to present information about arbitrary 3rd-party libraries. And once you get into multiple annotations, there will be some frameworks that say "the doc annotation is mine, I'll ignore the opt annotation" and others that say "oh, a dictionary of annotations, I need to do this with name doc and that with name opt" And of course, people won't really write doc("The x coordinate") unless they're already thinking of other uses for a string; they'll just write "The x coordinate" and someone later (perhaps from a different package) will have to untangle what they meant -- short expressions will end up being ambiguous almost from the start. Eventually, ways will be found to sort things out. But there will be less pain and backwards incompatibility if these issues are considered from the start. > Do we need any more specification for the PEP and 2.6/3k? I don't know, > maybe. You claim no, with the history of PEAK and other languages as > proof that doing anything more is unnecessary. The history of complaints about PEAK being hard to understand and inadequately documented suggests that a fair number of people would prefer additional guidance and handholding. If annotations could only be used safely by people who can understand PEAK, then offering syntactic sugar to everyone would be asking for trouble. -jJ From paul at prescod.net Sun Aug 13 08:00:36 2006 From: paul at prescod.net (Paul Prescod) Date: Sat, 12 Aug 2006 23:00:36 -0700 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <44DEA507.9040900@acm.org> References: <5.1.1.6.0.20060812215907.0226e808@sparrow.telecommunity.com> <44DEA507.9040900@acm.org> Message-ID: <1cb725390608122300q3b20db1apc707e537c36fd0ee@mail.gmail.com> I made a proposal that Phillip was mostly okay with. What do other participants in the thread think? Would it move towards resolving this thread? "In order for processors of function annotations to work interoperably, they must use a common interpretation of objects used as annotations on a particular function. For example, one might interpret string annotations as docstrings. Another might interpet them as path segments for a web framework. For this reason, function annotation processors SHOULD avoid assigning processor-specific meanings to types defined outside of the processor's framework. For example, a Django processor could process annotations of a type defined in a Zope package, but Zope's creators should be considered the authorities on the type's meaning for the same reasons that they would be considered authorities on the semantics of classes or methods in their packages. This implies that the interpretation of built-in types would be controlled by Python's developers and documented in Python's documentation. This is just a best practice. Nothing in the language can or should enforce this practice and there may be a few domains where there is a strong argument for violating it ( e.g. an education environment where saving keystrokes may be more important than easing interopability)." "In Python 3000, semantics will be attached to the following types: basestring and its subtypes are to be used for documentation (though they are not necessarily the exclusive source of documentation about the type). List and its subtypes are to be used for attaching multiple independent annotations." Paul Prescod -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060812/d1576e7a/attachment.htm From pje at telecommunity.com Sun Aug 13 08:06:50 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun, 13 Aug 2006 02:06:50 -0400 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: References: <20060812205512.197A.JCARLSON@uci.edu> <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com> <20060812205512.197A.JCARLSON@uci.edu> Message-ID: <5.1.1.6.0.20060813013329.0226d240@sparrow.telecommunity.com> At 01:29 AM 8/13/2006 -0400, Jim Jewett wrote: >On 8/13/06, Josiah Carlson wrote: > >>"Phillip J. Eby" wrote: >> > However, if you have: > >> > def myfunc( x : doc("The x coordinate"), y : doc("The y coordinate") ) > >> > There is no ambiguity. > >Sure there is. There will probably be several frameworks using the >magic name "doc". > >This isn't a problem for the person writing myfunc, and therefore >isn't a problem for immediate decorators. It is a problem for >inspection code that wants to present information about arbitrary >3rd-party libraries. By this argument, we shouldn't have metaclasses or function attributes, because they have the same "problem". However, it's only a problem if you insist on writing brain-damaged code. If you want interoperability here, you must write tell-don't-ask code. This is true for *any* use case where frameworks might share objects; there is absolutely *nothing* special about annotations in this regard! I'm really baffled by the controversy over this; is it really the case that so many people don't know what tell-don't-ask code is or why you want it? I guess maybe it's something that's only grasped by people who have experience writing code intended for interoperability. After you run into the issue a few times, you look for a solution, and end up with either duck typing, interfaces/adaptation, overloaded functions, or ad hoc registries. ALL of these solutions are *more* than adequate to handle a simple thing like argument annotations. That's why I keep describing this as a trivial thing: even *pickling* is more complicated than this is. This is no more complex than len() or iter() or filter()! However, it appears that mine is a minority opinion. Unfortunately, I'm at a bit of a communication disadvantage, because if somebody wants to believe something is complicated, there is nothing that anybody can do to change their mind. If you don't consider the possibility that it is way simpler than you think, you will never be able to see it. The other possibility, of course, is that all of you have some horrendously complex use case in mind that I just don't "get". But so far all the examples that anybody else has put forth have been practically whimsical in their triviality -- while I've been explaining how the same principles will even work for complex things like type-checking code generation, let alone the trivial examples. So I don't think that's it. And at least Paul and Josiah have shown that they "get" what I'm saying, so I don't think that the answer is simply that I'm crazy, either. [Meanwhile, I'm not going to respond to the rest of your message, since it contained some things that appeared to me to be a mixture of ad hominem attack and straw man argument. I hope that was not actually your intent.] From ironfroggy at gmail.com Sun Aug 13 08:07:19 2006 From: ironfroggy at gmail.com (Calvin Spealman) Date: Sun, 13 Aug 2006 02:07:19 -0400 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <1cb725390608122300q3b20db1apc707e537c36fd0ee@mail.gmail.com> References: <5.1.1.6.0.20060812215907.0226e808@sparrow.telecommunity.com> <44DEA507.9040900@acm.org> <1cb725390608122300q3b20db1apc707e537c36fd0ee@mail.gmail.com> Message-ID: <76fd5acf0608122307m11d3128ah3791ded3b3df2cd@mail.gmail.com> On 8/13/06, Paul Prescod wrote: > I made a proposal that Phillip was mostly okay with. What do other > participants in the thread think? Would it move towards resolving this > thread? > > "In order for processors of function annotations to work interoperably, they > must use a common interpretation of objects used as annotations on a > particular function. For example, one might interpret string annotations as > docstrings. Another might interpet them as path segments for a web > framework. For this reason, function annotation processors SHOULD avoid > assigning processor-specific meanings to types defined outside of the > processor's framework. For example, a Django processor could process > annotations of a type defined in a Zope package, but Zope's creators should > be considered the authorities on the type's meaning for the same reasons > that they would be considered authorities on the semantics of classes or > methods in their packages. This implies that the interpretation of built-in > types would be controlled by Python's developers and documented in Python's > documentation. This is just a best practice. Nothing in the language can or > should enforce this practice and there may be a few domains where there is a > strong argument for violating it ( e.g. an education environment where > saving keystrokes may be more important than easing interopability)." > > > "In Python 3000, semantics will be attached to the following types: > basestring and its subtypes are to be used for documentation (though they > are not necessarily the exclusive source of documentation about the type). > List and its subtypes are to be used for attaching multiple independent > annotations." > > Paul Prescod +1 This needs resolved, and willy-nilly use of built-in types or someone else's types.. doesn't seem like anyone could be supportive of that. From paul at prescod.net Sun Aug 13 08:39:32 2006 From: paul at prescod.net (Paul Prescod) Date: Sat, 12 Aug 2006 23:39:32 -0700 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <5.1.1.6.0.20060812221550.0258ce68@sparrow.telecommunity.com> References: <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com> <44DD5DF0.40405@acm.org> <5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com> <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com> <5.1.1.6.0.20060812221550.0258ce68@sparrow.telecommunity.com> Message-ID: <1cb725390608122339m6087c604l85faeb89d6061524@mail.gmail.com> On 8/12/06, Phillip J. Eby wrote: > > > I mostly like this; the main issue I see is that as long as we're > recommending best practices, we should recommend using tell-don't-ask (via > duck typing protocols, adaptation, or overloaded functions) so that their > libraries can be enhanced and extended by other developers. Would you mind suggesting text for the PEP as an addendum to what I proposed? And an example of both bad and good practice? >"In Python 3000, semantics will be attached to the following types: > >basestring and its subtypes are to be used for documentation (though they > >are not necessarily the exclusive source of documentation about the > type). > >List and its subtypes are to be used for attaching multiple independent > >annotations." > > I'm not sure why we would use strings for documentation, but I'm not > opposed since it eliminates the question of multiple interpretations for > strings. I don't understand your point. Is there a better use for strings? Or a better type to associate with documentation? Or you just don't see a need for inline parameter documentation? The PEP itself used string docstrings as an example. >(does chaining make sense in this context?) > > I don't know if I know what you mean by "chaining". Good use of > tell-don't-ask means that any interpretation of annotations nested in > other > annotations would be defined by the enclosing annotation (or in an > overload > for it). Yes, it's clear what nesting means. I'm not asking about nesting. The question was whether there should be any relationship implied by the fact that an annotation appears to the left or right of another annotation in a list of annotations. def a(b: [doc('x'), type('y')]): pass Is there any sense in which the function 'x' should be passed context information that would help it wrap or communicate with 'y'? The most likely answer is "no" but function decorators do chain so I just wanted to raise the issue in case anyone wanted to make the case that parameter and return code annotations should as well. Paul Prescod -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060812/31cb1cab/attachment.html From paul at prescod.net Sun Aug 13 08:47:29 2006 From: paul at prescod.net (Paul Prescod) Date: Sat, 12 Aug 2006 23:47:29 -0700 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <43aa6ff70608111649g54e82dd6kef19862f0c281254@mail.gmail.com> References: <5.1.1.6.0.20060811112118.023af398@sparrow.telecommunity.com> <20060811084623.1931.JCARLSON@uci.edu> <44DD073C.7030305@acm.org> <43aa6ff70608111649g54e82dd6kef19862f0c281254@mail.gmail.com> Message-ID: <1cb725390608122347q2527151fiadf1a8fc7bcd4af5@mail.gmail.com> On 8/11/06, Collin Winter wrote: > > >>> def chain(*decorators): > >>> assert len(decorators) >= 2 > >>> > >>> def decorate(function): > >>> sig = function.__signature__ > >>> original = sig.annotations > >>> > >>> for i, dec in enumerate(decorators): > >>> fake = dict((p, original[p][i]) for p in original) > >>> > >>> function.__signature__.annotations = fake > >>> function = dec(function) > >>> > >>> function.__signature__.annotations = original > >>> return function > >>> return decorate I must be confused. This is a function returning a function. Does that mean that the thing showing up in the __signatures__ dictionary is a function? Or does the caller need to use two sets of parentheses to call the factory function and then the inner function? Paul Prescod -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060812/3263fd44/attachment.htm From paul at prescod.net Sun Aug 13 09:02:05 2006 From: paul at prescod.net (Paul Prescod) Date: Sun, 13 Aug 2006 00:02:05 -0700 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <76fd5acf0608122011w442afac8o6bfaa7f42ec9cbcd@mail.gmail.com> References: <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com> <44DD5DF0.40405@acm.org> <5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com> <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com> <1cb725390608121705s6e43b02fo28b4e83865c914ab@mail.gmail.com> <76fd5acf0608122011w442afac8o6bfaa7f42ec9cbcd@mail.gmail.com> Message-ID: <1cb725390608130002gbe3cb88j301b451386c51328@mail.gmail.com> On 8/12/06, Calvin Spealman wrote: > > I've been looking for a good place to pipe in with the suggestion of > defining that a dictionary as an annotation is taken as a mapping of > annotation type names to the annotation itself, such as using {'doc': > "The single character argument for the command line.", 'type': int} as > an annotation for some parameter in a function. I think we need to decide whether metadata type identifiers are just strings or whether they will typically be objects. I think that the arguments in favour of objects are strong. However, reading through all the posts I missed recooperating from a > long trip I just returned from, I think this coupled with taking _any > iterable_ (not just list and subtypes) and the whole "your type, your > annotation" guideline, is definately sufficient for all uses. > One reason not to treat any iterable as a list of decorators is that a string is an iterable. Maybe strings won't be the only annotation that people want to attach that happens to be iterable for unrelated reasons. A second reason that I restricted it to lists in particular is to encourage consistent syntax (rather than one person using a list, another a tuple, a third a generator, etc.). And overall it is just overgeneralization. YAGNI. Lists work fine. def myProtocolChainer(*args): return list(doSomething(args)): It is easy to loosen the protocol in future versions if I turn out to be wrong. Paul Prescod -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060813/a1656667/attachment.html From paul at prescod.net Sun Aug 13 09:42:06 2006 From: paul at prescod.net (Paul Prescod) Date: Sun, 13 Aug 2006 00:42:06 -0700 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com> References: <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com> <44DD5DF0.40405@acm.org> <5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com> <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com> Message-ID: <1cb725390608130042h50c7d7f9oc4068f30f2b04bbb@mail.gmail.com> > And the interpetation of: > > def cat(infile: [doc("input stream"), opt("i")] = sys.stdin, > outfile: [doc("output stream"), opt("o")] = sys.stdout > ): > > is likewise unambiguous, unless the creator of the documentation or option > features has defined some other interpretation for a list than > "recursively > apply to contained items". The meaning is "unambiguous unless..." then it ambiguous. So as per my previous proposal I think that you and I agree that we should disallow the stupid interpretation by encoding the obvious one in the PEP. In which case, you need only do something like: > > def cat(infile: docopt("input stream", "i") = sys.stdin, > outfile: docopt("output stream", "o") = sys.stdout > ): > > with an appropriate definition of methods for the 'docopt' type. Given that there are an infinite number of tools in the universe that could be processing "doc" and "opt" annotations, how would the user KNOW that there is one out there with a stupid interpretation of lists? They might annotate thousands of classes before finding out that some hot tool that they were planning to use next year is incompatible. So let's please define a STANDARD way of attaching multiple annotations to a parameter. Lists seem like a no-brainer choice for that. Since many people seem to be unfamiliar with overloaded functions, I would > just like to take this opportunity to remind you that the actual overload > mechanism is irrelevant. If you gave 'doc' objects a 'printDocString()' > method and 'opt' objects a 'setOptionName()' method, the exact same logic > regarding extensibility applies. The 'docopt' type would simply implement > both methods. > > This is normal, simple standard Python stuff; nothing at all fancy. The context is a little bit different than standard duck typing. Let's say I define a function like this: def car(b): "b is a list-like object" return b[0] Then someone comes along and does something I never expected. They invent a type representing a list of bits in a bitfield. They pass it to my function and everything works trivially. But there's something important that happened. The programmer ASSERTED by passing the RDF list to the function 'a' that it is a list like object. My code wouldn't have tried to treat it as a list if the user hadn't passed it as one explicitly. Now look at it from the point of view of function annotations. As we said before, the annotations are inert. They are just attached. There is some code like a type checker or documentation generator that comes along after the fact and scoops them up to do something with them. The user did not assert (at the language level!) that any particular annotation applies to any particular annotation processor. The annotation processor is just looking for stuff that it recognizes. But what if it thinks it recognizes something but does not? Consider this potential case: BobsDocumentationGenerator.py: class BobsDocumentationGeneratorAnnotation: def __init__... def printDocument(self): print self.doc def sideEffect(self): deleteHardDrive() def BobsDocumentationGenerator(annotation): if hasattr(annotation, "printDocument"): annotation.printDocument() SamsDocumentationGenerator.py: class SamsDocumentationGeneratorAnnotation: def __init__... def printDocument(self): return self.doc def sideEffect(self): email(self.doc, "python-dev at pytho...") def SamsDocumentationGenerator(annotation): if hasattr(annotation, "printDocument"): print annotation.printDocument() annotation.sideEffect() These objects, _by accident_ have the same method signature but different side effects and return values. Nobody anywhere in the system made an incorrect assertion. They just happened to be unlucky in the naming of their methods. (unbelievably unlucky but you get the drift) One simple way to make it unambiguous would be to do a test more like: if hasattr(annotation, SamsDocumentationGenerator.uniqueObject): ... The association of the unique object with an annotator object would be an explicit assertion of compatibility. Can we agree that the PEP should describe strategies that people should use to make their annotation recognition strategies unambiguous and failure-proof? I think that merely documenting appropriately defensive techniques might be enough to make Talin happy. Note that it isn't the processing code that needs to be defensive (in the sense of try/catch blocks). It is the whole recognition strategy that the processing code uses. Whatever recognition strategy it uses must be unambiguous. It seems like it would hurt nobody to document this and suggest some unambiguous techniques. Paul Prescod -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060813/8fd0e73f/attachment-0001.htm From jcarlson at uci.edu Sun Aug 13 09:59:06 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Sun, 13 Aug 2006 00:59:06 -0700 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <5.1.1.6.0.20060813010634.0228ee30@sparrow.telecommunity.com> References: <20060812205512.197A.JCARLSON@uci.edu> <5.1.1.6.0.20060813010634.0228ee30@sparrow.telecommunity.com> Message-ID: <20060812233132.197F.JCARLSON@uci.edu> "Phillip J. Eby" wrote: > At 09:16 PM 8/12/2006 -0700, Josiah Carlson wrote: > >"Phillip J. Eby" wrote: > > > However, if you have: > > > > > > def myfunc( x : doc("The x coordinate"), y : doc("The y coordinate") ) > > > > > > There is no ambiguity. Likewise: > > > > > > def cat( infile:opt("i") = sys.stdin, outfile:opt("o") = sys.stdout ): > > > > > > is unambiguous. And the interpetation of: > > > > > > def cat(infile: [doc("input stream"), opt("i")] = sys.stdin, > > > outfile: [doc("output stream"), opt("o")] = sys.stdout > > > ): > > > > > > is likewise unambiguous, unless the creator of the documentation or option > > > features has defined some other interpretation for a list than > > "recursively > > > apply to contained items". In which case, you need only do something like: > > > > > > def cat(infile: docopt("input stream", "i") = sys.stdin, > > > outfile: docopt("output stream", "o") = sys.stdout > > > ): > > > >I now understand where you were coming from with regards to this being > >equivalent to pickle (at least pickle + copy_reg). I think that if you > >would have posted this particular sample a couple days ago, there > >wouldn't have been the discussion (argument?) about incompatible > >mechanisms for annotation processing. > > Well, it just seemed to me that that was the One Obvious Way To Do It; more > specifically, I couldn't conceive of any *other* way to do it! Perhaps, but it was also obvious that very few people knew what the heck you were talking about (hence the "how" and "what do you mean" queries). Try to remember that while you may be old-hat at annotations, perhaps not everyone discussing them at the moment has your particular experience and assumptions. Also, when you hand-wave with "it's trivial", it's more than a little frustrating, because while it may be "trivial" to you, it's certainly not trivial to the asker (why would they be asking otherwise?) > >With that said, the above is a protocol. Just like __len__, __str__, > >copy_reg, __reduce__, __setstate__, etc., are protocols. It may not be > >fully specified (when annotations are to be processed, if at all, by > >whom, where the annotation registry is, etc.), but it is still a > >protocol. > > Actually, it's a family of *patterns* for creating protocols. It's not a > protocol, incompletely specified or otherwise. Note that the actual > implementation of the tell-don't-ask pattern can be via: Here's my take: Protocol in this context is a set of rules for the definition of the annotations and their interaction with the handler for the annotations. For what we seem to have agreed upon, the definition is via a base class or instance, and the annotation handling is left up to the user to define (via the four methods you offered, or even others). If you want to call it a 'pattern', 'protocol', 'meta-protocol', or whatever, they are all effectively the same thing in this context; a way of writing annotations that can later be seen as having a (hopefully unambiguous) meaning. > But I don't believe we know enough *today* to be able to safely define a > rigid specification without ruling out possibly-valid uses. By making a > less-rigid specification, we force annotation consumers to code > defensively... which is really the right thing to do in a heterogeneous > environment anyway. Right. I'm in no way suggesting that a 'rigid' specification be developed, and I'm generally on the fence about whether *any* specification should be done. But really, the more I think about it, the more I believe that *something* should be offered as a starting point. Whether it is in the Python cookbook, a 3rd party module or package, etc. As long as it includes a link from the standard Python documentation where annotations are discussed, I think that would be satisfactory. > >but I also cannot convince myself that specifying anything > >further would be flexible enough to not be a mistake. > > Right - that's the bit I'm concerned about. Python also usually doesn't > impose such policy constraints on mechanism. For example, function > attributes can be or contain anything, and nobody has argued that there > need to be prespecified combination semantics, despite the fact that > multiple tools can be consumers of the attributes. Ahh, but function decorators *do* have a specified combination semantic; specifically an order of application and chaining (the return from the first decorator will be passed to the second decorator, etc.). If we were to specify anything, I would suggest we define an order of annotation calling, which would also define a chaining order if applicable. Maybe it is completely obvious, but one should never underestimate what kinds of silly things users will do. You responded to Jim Jewett > [Meanwhile, I'm not going to respond to the rest of your message, since it > contained some things that appeared to me to be a mixture of ad hominem > attack and straw man argument. I hope that was not actually your intent.] As a point of reference, even after you linked the documentation about PEAK, I still had *no idea* what the heck you meant about PEAK annotations or their implications to function argument annotations. I like to believe that I'm not stupid, but maybe I'm wrong, or maybe the documentation could be better (this isn't an insult, I'm quite experienced at writing poor documentation)? - Josiah From paul at prescod.net Sun Aug 13 10:06:01 2006 From: paul at prescod.net (Paul Prescod) Date: Sun, 13 Aug 2006 01:06:01 -0700 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <5.1.1.6.0.20060813013329.0226d240@sparrow.telecommunity.com> References: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com> <20060812205512.197A.JCARLSON@uci.edu> <5.1.1.6.0.20060813013329.0226d240@sparrow.telecommunity.com> Message-ID: <1cb725390608130106y3cf29002q6c63dd6ac1ce04d4@mail.gmail.com> Sorry to write so many emails, but I want to get in one last point tonight (I'm sure I'll regret posting late at night) Jim's email seems not to have gotten through to the whole list. There's a lot of that going aruond. On 8/12/06, Phillip J. Eby wrote: > > >Sure there is. There will probably be several frameworks using the > >magic name "doc". > > > >This isn't a problem for the person writing myfunc, and therefore > >isn't a problem for immediate decorators. It is a problem for > >inspection code that wants to present information about arbitrary > >3rd-party libraries. > > By this argument, we shouldn't have metaclasses or function attributes, > because they have the same "problem". I don't think Jim's issue is a real one (according to the snippet I see in your email) because doc is an object defined in one and only one place in Python. It has a unique id(). If two people use the name "doc" then they will be addressable as module1.doc() and module2.doc(). No problem. However, it's only a problem if you insist on writing brain-damaged > code. If you want interoperability here, you must write tell-don't-ask > code. This is true for *any* use case where frameworks might share > objects; there is absolutely *nothing* special about annotations in this > regard! There is something different about annotations than everything else in Python so far. Annotations are the first feature other than docstrings (which are proto-annotations) in core Python where third party tools are supposed to go trolling through your objects FINDING STUFF that they may decide is interesting or not to them. When you attach a metaclass or a decorator, you INVOKE CODE that you have installed on your hard drive and if it crashes then you load up your debugger and see what happend. When you attach an annotation, you are just adding information that code OUTSIDE OF YOUR CONTROL will poke around and interpret (the metadata processor, like a type checker or documentation generator). What you do when you attach an annotation is make an assertion. You always want to be confident that you and the person writing the processor code have the same understanding of the assertion you are making. You do not want to attach a list because you are asserting that the list is a container for a bunch of other assertions about the contents of the list whereas the person writing the processing code thinks that you are asserting that the variable will be of TYPE list. Now I'm sure that with all of your framework programming you've run into this many times and have many techniques for making these assertions unambiguous. All we need to do is document them so that people who are not as knowledgable will not get themselves into trouble. It isn't sufficient to say: "Only smart people will use this stuff so we need not worry" which is what the original PEP said. Even if it is true, I don't understand why we would bother taking the risk when the alternative is so low-cost. Define the behaviour for intepreting a few built-in types and define guidelines and best practices for other types. After you run into the issue a few times, you look for a solution, and end > up with either duck typing, interfaces/adaptation, overloaded functions, > or > ad hoc registries. ALL of these solutions are *more* than adequate to > handle a simple thing like argument annotations. That's why I keep > describing this as a trivial thing: even *pickling* is more complicated > than this is. This is no more complex than len() or iter() or filter()! Pickling works because of the underscores and magic like " __safe_for_unpickling__". Len works because of __length__. etc. There are reasons there are underscores there. You understand them, I understand them, Talin understands them. That doesn't mean that they are self-evident. A lesser inventor might have used a method just called "safe_for_pickling" and some unlucky programmer at Bick's might have accidentally triggered unexpected aspects of the protocol while documenting the properties of cucumbers. These are not universally understood techniques. Let's just document them in the PEP. However, it appears that mine is a minority opinion. Unfortunately, I'm at > a bit of a communication disadvantage, because if somebody wants to > believe > something is complicated, there is nothing that anybody can do to change > their mind. If you don't consider the possibility that it is way simpler > than you think, you will never be able to see it. If it wasn't at least a bit complicated then there would be no underscores. The underscores are there to prevent SOMETHING bad from happening, right? Paul Prescod -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060813/f99f4ac2/attachment.htm From paul at prescod.net Sun Aug 13 10:17:26 2006 From: paul at prescod.net (Paul Prescod) Date: Sun, 13 Aug 2006 01:17:26 -0700 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <43aa6ff70608111649g54e82dd6kef19862f0c281254@mail.gmail.com> References: <5.1.1.6.0.20060811112118.023af398@sparrow.telecommunity.com> <20060811084623.1931.JCARLSON@uci.edu> <44DD073C.7030305@acm.org> <43aa6ff70608111649g54e82dd6kef19862f0c281254@mail.gmail.com> Message-ID: <1cb725390608130117p7f393441ld43f4f901728b316@mail.gmail.com> On 8/11/06, Collin Winter wrote: ... What Josiah is hinting at -- and what Talin describes more explicitly > -- is the problem of how exactly "chaining" annotation interpreters > will work. I don't think the question is really how to chain them. The question is how to avoid them stepping on top of each other accidentally. The case I've thought out the most completely is that of using > decorators to analyse/utilise the annotations: This is not as interesting a case as the following: annotation scheme 1 is invented by person 1 annotation scheme 2 is invented by person 2 person 3 must use them together on a single function persons 4 through 1000 write programs that hunt for annotation scheme 1 objects on functions in modules. persons 2000 through 4000 write programs that hunt for annotation scheme 2 objects. How can persons 4 through 4000 be confident when they see an annotation on an object that they are interpreting it as person 3 intended? How can they be confident that they are not accidentally processing an object (a list, a string, a file, a customer object, whatever) that was intended to be an assertion in annotation scheme 1 according to the rules of annotation scheme 2? Paul Prescod -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060813/7929f61c/attachment.html From talin at acm.org Sun Aug 13 10:18:18 2006 From: talin at acm.org (Talin) Date: Sun, 13 Aug 2006 01:18:18 -0700 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <1cb725390608130042h50c7d7f9oc4068f30f2b04bbb@mail.gmail.com> References: <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com> <44DD5DF0.40405@acm.org> <5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com> <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com> <1cb725390608130042h50c7d7f9oc4068f30f2b04bbb@mail.gmail.com> Message-ID: <44DEE04A.4090708@acm.org> Paul Prescod wrote: >> And the interpetation of: >> >> def cat(infile: [doc("input stream"), opt("i")] = sys.stdin, >> outfile: [doc("output stream"), opt("o")] = sys.stdout >> ): >> >> is likewise unambiguous, unless the creator of the documentation or >> option >> features has defined some other interpretation for a list than >> "recursively >> apply to contained items". > > > The meaning is "unambiguous unless..." then it ambiguous. So as per my > previous proposal I think that you and I agree that we should disallow the > stupid interpretation by encoding the obvious one in the PEP. > > In which case, you need only do something like: >> >> def cat(infile: docopt("input stream", "i") = sys.stdin, >> outfile: docopt("output stream", "o") = sys.stdout >> ): >> >> with an appropriate definition of methods for the 'docopt' type. > > > Given that there are an infinite number of tools in the universe that could > be processing "doc" and "opt" annotations, how would the user KNOW that > there is one out there with a stupid interpretation of lists? They might > annotate thousands of classes before finding out that some hot tool that > they were planning to use next year is incompatible. So let's please define > a STANDARD way of attaching multiple annotations to a parameter. Lists seem > like a no-brainer choice for that. > > Since many people seem to be unfamiliar with overloaded functions, I would >> just like to take this opportunity to remind you that the actual overload >> mechanism is irrelevant. If you gave 'doc' objects a 'printDocString()' >> method and 'opt' objects a 'setOptionName()' method, the exact same logic >> regarding extensibility applies. The 'docopt' type would simply >> implement >> both methods. >> >> This is normal, simple standard Python stuff; nothing at all fancy. > > > The context is a little bit different than standard duck typing. > > Let's say I define a function like this: > > def car(b): > "b is a list-like object" > return b[0] > > Then someone comes along and does something I never expected. They invent a > type representing a list of bits in a bitfield. They pass it to my function > and everything works trivially. But there's something important that > happened. The programmer ASSERTED by passing the RDF list to the function > 'a' that it is a list like object. My code wouldn't have tried to treat it > as a list if the user hadn't passed it as one explicitly. > > Now look at it from the point of view of function annotations. As we said > before, the annotations are inert. They are just attached. There is some > code like a type checker or documentation generator that comes along after > the fact and scoops them up to do something with them. The user did not > assert (at the language level!) that any particular annotation applies to > any particular annotation processor. The annotation processor is just > looking for stuff that it recognizes. But what if it thinks it recognizes > something but does not? > > Consider this potential case: > > BobsDocumentationGenerator.py: > > class BobsDocumentationGeneratorAnnotation: > def __init__... > def printDocument(self): > print self.doc > def sideEffect(self): > deleteHardDrive() > > def BobsDocumentationGenerator(annotation): > if hasattr(annotation, "printDocument"): > annotation.printDocument() > > SamsDocumentationGenerator.py: > > class SamsDocumentationGeneratorAnnotation: > def __init__... > def printDocument(self): > return self.doc > def sideEffect(self): > email(self.doc, "python-dev at pytho...") > > def SamsDocumentationGenerator(annotation): > if hasattr(annotation, "printDocument"): > print annotation.printDocument() > annotation.sideEffect() > > These objects, _by accident_ have the same method signature but different > side effects and return values. Nobody anywhere in the system made an > incorrect assertion. They just happened to be unlucky in the naming of > their > methods. (unbelievably unlucky but you get the drift) > > One simple way to make it unambiguous would be to do a test more like: > > if hasattr(annotation, SamsDocumentationGenerator.uniqueObject): ... > > The association of the unique object with an annotator object would be an > explicit assertion of compatibility. > > Can we agree that the PEP should describe strategies that people should use > to make their annotation recognition strategies unambiguous and > failure-proof? > > I think that merely documenting appropriately defensive techniques might be > enough to make Talin happy. Note that it isn't the processing code that > needs to be defensive (in the sense of try/catch blocks). It is the whole > recognition strategy that the processing code uses. Whatever recognition > strategy it uses must be unambiguous. It seems like it would hurt nobody to > document this and suggest some unambiguous techniques. This says pretty much what I was trying to say, only better :) I think I am going to chill out on this topic for a bit - it seems that there are folks who have a better understanding of the issue than I do, and mainly the only reason I was commenting on the PEP was because that was what was asked for. I don't really have a big stake in the whole annotation effort, there are other issues that I am really more interested in. -- Talin From paul at prescod.net Sun Aug 13 10:24:00 2006 From: paul at prescod.net (Paul Prescod) Date: Sun, 13 Aug 2006 01:24:00 -0700 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <20060812233132.197F.JCARLSON@uci.edu> References: <20060812205512.197A.JCARLSON@uci.edu> <5.1.1.6.0.20060813010634.0228ee30@sparrow.telecommunity.com> <20060812233132.197F.JCARLSON@uci.edu> Message-ID: <1cb725390608130124m2e3a3254v40058e23c2b6b737@mail.gmail.com> On 8/13/06, Josiah Carlson wrote: > > ... > If we were to specify anything, I would suggest we define an order of > annotation calling, which would also define a chaining order if > applicable. Maybe it is completely obvious, but one should never > underestimate what kinds of silly things users will do. > Annotations are not called. They are not like decorators. Decorators typically "wrap" a function. Annotations are just attached to it. A decorator must be a callable. An annotation could be just the number "5". Decorators build on each other, perhaps changing the function's behaviour. Annotations (should!) just accumulate and typically do not change the parameter's behaviour. The PEP does not say how you would define annotations that just accumulate but it seems common sense to me that it would be through a list syntax. I think that the PEP should just say that. Paul Prescod -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060813/b232372c/attachment.htm From jcarlson at uci.edu Sun Aug 13 10:53:23 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Sun, 13 Aug 2006 01:53:23 -0700 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <1cb725390608130124m2e3a3254v40058e23c2b6b737@mail.gmail.com> References: <20060812233132.197F.JCARLSON@uci.edu> <1cb725390608130124m2e3a3254v40058e23c2b6b737@mail.gmail.com> Message-ID: <20060813013709.1982.JCARLSON@uci.edu> "Paul Prescod" wrote: > On 8/13/06, Josiah Carlson wrote: > > > > ... > > If we were to specify anything, I would suggest we define an order of > > annotation calling, which would also define a chaining order if > > applicable. Maybe it is completely obvious, but one should never > > underestimate what kinds of silly things users will do. > > Annotations are not called. They are not like decorators. Right. What I meant (which perhaps wan't what I said), was that we should define the order in which functions that operate on these annotations execute, regardless of the mechanism. Say, for example, I have the following function definition: def foo(arg1:[bar(1), baz(2)]): ... However the (unspecified user defined machinery that handles the) annotation processing gets to foo(), if it knows about how to handle the 'bar' and 'baz' annotations, a properly written annotation processor will handle the 'bar' annotation before the 'baz' annotation. - Josiah From talin at acm.org Sun Aug 13 13:07:44 2006 From: talin at acm.org (Talin) Date: Sun, 13 Aug 2006 04:07:44 -0700 Subject: [Python-3000] Python/C++ question In-Reply-To: References: <44DA6C01.2040904@acm.org> Message-ID: <44DF0800.4060204@acm.org> Guido van Rossum wrote: > On 8/9/06, Talin wrote: > For the majority of Python developers it's probably the other way > around. It's been 15 years since I wrote C++, and unlike C, that > language has changed a lot since then... > > It would be a complete rewrite; I prefer doing a gradual > transmogrification of the current codebase into Py3k rather than > starting from scratch (read Joel Spolsky on why). BTW, Should this be added to PEP 3099? (Although I do think that a gradual transition is certainly possible, I am not going to push for it.) -- Talin From talin at acm.org Sun Aug 13 13:30:00 2006 From: talin at acm.org (Talin) Date: Sun, 13 Aug 2006 04:30:00 -0700 Subject: [Python-3000] Bound and unbound methods Message-ID: <44DF0D38.6070507@acm.org> One of the items in PEP 3100 is getting rid of unbound methods. I want to explore a heretical notion, which is getting rid of bound methods as well. Now, to be honest, I rather like bound methods. I like being able to capture a method call, store it in a variable, and call it later. However, I also realize that requiring every access to a class variable to instantiate a new method object is expensive, to say the least. Calling a callable would not require a bound method - the 'self' parameter would be just another argument. User-defined functions would then be no different from native built-in functions or other callables. You would still need some way to explicitly bind a method if you wanted to store it in a variable, perhaps using something like the various wrappers in module 'functional'. It would be extra typing, but for me at least its not something I do very often, and it would at least have the virtue that the intent of the code would be more visually obvious. (Also, I tend to find, in my code at least, that I more often use closures to accomplish the same thing, which are both clearer to read and more powerful.) Now, one remaining problem to be solved is whether or not to pass 'self' as an argument to the resulting callable. I suppose that could be handled by inspecting the attributes of the callable and adding the extra 'self' argument at the last minute if its not a static method. I suspect such tests would be relatively fast, much less than the time needed to instantiate and initialize a new method object. Anyway, I just wanted to throw that out there. Feel free to -1 away... :) -- Talin From g.brandl at gmx.net Sun Aug 13 14:24:59 2006 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 13 Aug 2006 14:24:59 +0200 Subject: [Python-3000] Python/C++ question In-Reply-To: <44DF0800.4060204@acm.org> References: <44DA6C01.2040904@acm.org> <44DF0800.4060204@acm.org> Message-ID: Talin wrote: > Guido van Rossum wrote: >> On 8/9/06, Talin wrote: >> For the majority of Python developers it's probably the other way >> around. It's been 15 years since I wrote C++, and unlike C, that >> language has changed a lot since then... >> >> It would be a complete rewrite; I prefer doing a gradual >> transmogrification of the current codebase into Py3k rather than >> starting from scratch (read Joel Spolsky on why). > > BTW, Should this be added to PEP 3099? Yes, why not. Georg From pje at telecommunity.com Sun Aug 13 19:28:42 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun, 13 Aug 2006 13:28:42 -0400 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <1cb725390608130106y3cf29002q6c63dd6ac1ce04d4@mail.gmail.co m> References: <5.1.1.6.0.20060813013329.0226d240@sparrow.telecommunity.com> <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com> <20060812205512.197A.JCARLSON@uci.edu> <5.1.1.6.0.20060813013329.0226d240@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20060813125944.056a3f40@sparrow.telecommunity.com> At 01:06 AM 8/13/2006 -0700, Paul Prescod wrote: >There is something different about annotations than everything else in >Python so far. Annotations are the first feature other than docstrings >(which are proto-annotations) in core Python where third party tools are >supposed to go trolling through your objects FINDING STUFF that they may >decide is interesting or not to them. You make it sound like we've never had documentation tools before, or web servers. Zope has been trolling through Python objects "finding stuff" since *1996*. It's not at all a coincidence that the first interface/adaptation systems for Python (AFAIK) were built for Zope. So some people in the Python community have had an entire *decade* of experience with this kind of thing. It's just a guess, but some of them might actually know a thing or two about the subject by now. ;-) >Now I'm sure that with all of your framework programming you've run into >this many times and have many techniques for making these assertions >unambiguous. All we need to do is document them so that people who are not >as knowledgable will not get themselves into trouble. Sure. Here are two nice articles that people can read to understand the basic ideas of "tell, don't ask". One by the "Pragmatic Programmers": http://www.pragmaticprogrammer.com/articles/jan_03_enbug.pdf And another by Allen Holub on the evils of getters and setters, that touches on the same principles: http://www.javaworld.com/javaworld/jw-09-2003/jw-0905-toolbox.html >It isn't sufficient to say: "Only smart people will use this stuff so we >need not worry" which is what the original PEP said. Even if it is true, I >don't understand why we would bother taking the risk when the alternative >is so low-cost. There are so many other pitfalls to writing extensible and interoperable code in Python, why focus so much effort on such an incredibly minor one? The truth is that hardly anybody cares about writing extensible or interoperable code except framework developers -- and they've already *got* solutions. Twisted or Zope developers would see this as a trivial use case for adaptation, and PEAK developers would use either adaptation or generic functions, and keep on moving with nary a speedbump. Nonetheless, I don't object to documenting best practices; I just don't want to mandate a *particular* solution -- with one exception. If Py3K is going to include overloaded functions, then that should be considered the One Obvious Way to work with annotations, since it's an "included battery" (and none of the existing interface/adaptation/overloading toolkits are likely to work as-is in Py3K without some porting effort). But if Py3K doesn't include overloading or adaptation, then the One Obvious Way will be "whatever a knowledgeable framework programmer wants to do." >Pickling works because of the underscores and magic like " >__safe_for_unpickling__". Len works because of __length__. etc. There are >reasons there are underscores there. You understand them, I understand >them, Talin understands them. That doesn't mean that they are >self-evident. A lesser inventor might have used a method just called >"safe_for_pickling" and some unlucky programmer at Bick's might have >accidentally triggered unexpected aspects of the protocol while >documenting the properties of cucumbers. Note that you're pointing out a problem that already exists today in Python, and has for some time. It's why the Zope folks use interfaces and adaptation, and why I use overloaded functions. The problem has nothing to do with annotations as such, so if you want to solve that problem, you should be pushing for overloaded functions in the stdlib, and using annotations as an example of why they're good to have. >Can we agree that the PEP should describe strategies that people should >use to make their annotation recognition strategies unambiguous and >failure-proof? Absolutely - and I recommended that we recommend "tell, don't ask" processing using one of the following techniques: 1. duck typing 2. adaptation 3. overloaded functions 4. type registries You seem to be arguing that duck typing is inadequate because it is name-based and names can conflict. I agree, which is why I believe #2-4 are better: they don't rely on mere name matching. However, duck typing is still *adequate* as long as names are sufficiently descriptive or at least lengthy enough to prevent collision. Including a package-specific namespace prefix like "foo_printDocumentation" is sufficient best practice to avoid duck typing name collisions in virtually all cases. I'm just baffled why all this focus on the issue on such a minor thing, when Python has far more pitfalls to interoperability than this. But I guess if you see this as the first time that objects might be implicitly used by something, I suppose it makes sense. But it's really not the first time, and these are well-understood problems among developers of major Python frameworks, especially Zope. >I think that merely documenting appropriately defensive techniques might >be enough to make Talin happy. Note that it isn't the processing code that >needs to be defensive (in the sense of try/catch blocks). It is the whole >recognition strategy that the processing code uses. Whatever recognition >strategy it uses must be unambiguous. It seems like it would hurt nobody >to document this and suggest some unambiguous techniques. I already recommended that we do this, and have repeated my recommendation above for your convenience. From steven.bethard at gmail.com Sun Aug 13 19:29:20 2006 From: steven.bethard at gmail.com (Steven Bethard) Date: Sun, 13 Aug 2006 11:29:20 -0600 Subject: [Python-3000] Bound and unbound methods In-Reply-To: <44DF0D38.6070507@acm.org> References: <44DF0D38.6070507@acm.org> Message-ID: On 8/13/06, Talin wrote: > One of the items in PEP 3100 is getting rid of unbound methods. I want > to explore a heretical notion, which is getting rid of bound methods as > well. I believe you're suggesting that the code that I just wrote moments ago would stop working:: get_features = self._get_document_features return [get_features(i, document_graph, comparable_graphs) for i, document_graph in enumerate(document_graphs)] The line ``get_features = ...`` expects the function stored to be bound to ``self``. I write code like this *all the time*, particularly when I have a long method name that needs to be used in a complex expression and I want to keep my lines within the suggested 79 characters. If I understand the proposal right and my code above would be invalidated, I'm a strong -1 to this. It would break an enormous amount of my code. STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From jcarlson at uci.edu Sun Aug 13 19:58:33 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Sun, 13 Aug 2006 10:58:33 -0700 Subject: [Python-3000] Bound and unbound methods In-Reply-To: <44DF0D38.6070507@acm.org> References: <44DF0D38.6070507@acm.org> Message-ID: <20060813102036.1985.JCARLSON@uci.edu> Talin wrote: > > One of the items in PEP 3100 is getting rid of unbound methods. I want > to explore a heretical notion, which is getting rid of bound methods as > well. > > Now, to be honest, I rather like bound methods. I like being able to > capture a method call, store it in a variable, and call it later. > > However, I also realize that requiring every access to a class variable > to instantiate a new method object is expensive, to say the least. Well, it's up-front vs. at-access. For instances whose methods are generally used rarely, the up-front cost of instantiating every method is high in comparison (unless there are a relatively large number of method accesses), and technically infinite if applied to all objects. Why? I have a class foo, I instantiate foo, now all of foo's methods get instantiated. Ahh, but foo's methods are also instances of function. It doesn't really have any new methods on foo's methods, but they do have attributes that are instances, so we will need to instantiate all of the methods' attributes' methods, and recursively, to infinity. The non-creation of instantiated methods for objects is a lazy-evaluation technique to prevent infinite recursion, in general. On the other hand, it may make sense to offer a metaclass and/or decorator that signals that a single method instance should be created for particular methods up-front, rather than at-access to those methods. But what kind of difference could we expect? 42%/28% improvement for class methods/object methods in 2.4 respectively, and 45%/26% improvement in 2.5 beta . This does not include actually calling the methods. > Now, one remaining problem to be solved is whether or not to pass 'self' > as an argument to the resulting callable. I suppose that could be > handled by inspecting the attributes of the callable and adding the > extra 'self' argument at the last minute if its not a static method. I > suspect such tests would be relatively fast, much less than the time > needed to instantiate and initialize a new method object. I think that a change that required calls of the form obj.instancemethod(obj, ...) are non-starters. I'm -1 for instantiating all methods (for the infinite cost reasons), and -1 for int, long, list, tuple, dict, float (method access is generally limited for these objects). I'm +0 for offering a suitable metaclass and/or decorator, but believe it would be better suited for the Python cookbook, as performance improvements when function calls are taken into consideration is significantly less. - Josiah [1] Timings for accessing instance methods Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import time >>> >>> def test(n): ... _time = time ... ... class foo: ... def bar(self): ... pass ... xr = xrange(n) ... x = foo() ... t = time.time() ... for i in xr: ... x.bar ... print 'class method', time.time()-t ... ... x.bar = x.bar ... t = time.time() ... for i in xr: ... x.bar ... print 'instantiated class method', time.time()-t ... ... class foo(object): ... def bar(self): ... pass ... ... x = foo() ... t = time.time() ... for i in xr: ... x.bar ... print 'object method', time.time()-t ... ... x.bar = x.bar ... t = time.time() ... for i in xr: ... x.bar ... print 'instantiated object method', time.time()-t ... ... class foo(object): ... __slots__ = 'bar' ... def __init__(self): ... self.bar = self._bar ... def _bar(self): ... pass ... ... x = foo() ... t = time.time() ... for i in xr: ... x.bar ... print 'instantiated object __slot__ method', time.time()-t ... >>> test(5000000) class method 1.96799993515 instantiated class method 1.14100003242 object method 1.71900010109 instantiated object method 1.23399996758 instantiated object __slot__ method 1.26600003242 >>> Python 2.5b2 (r25b2:50512, Jul 11 2006, 10:16:14) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import time >>> >>> def test(n): ... _time = time ... ... class foo: ... def bar(self): ... pass ... xr = xrange(n) ... x = foo() ... t = time.time() ... for i in xr: ... x.bar ... print 'class method', time.time()-t ... ... x.bar = x.bar ... t = time.time() ... for i in xr: ... x.bar ... print 'instantiated class method', time.time()-t ... ... class foo(object): ... def bar(self): ... pass ... ... x = foo() ... t = time.time() ... for i in xr: ... x.bar ... print 'object method', time.time()-t ... ... x.bar = x.bar ... t = time.time() ... for i in xr: ... x.bar ... print 'instantiated object method', time.time()-t ... ... class foo(object): ... __slots__ = 'bar' ... def __init__(self): ... self.bar = self._bar ... def _bar(self): ... pass ... ... x = foo() ... t = time.time() ... for i in xr: ... x.bar ... print 'instantiated object __slot__ method', time.time()-t ... >>> test(5000000) class method 1.98500013351 instantiated class method 1.09299993515 object method 1.67199993134 instantiated object method 1.23500013351 instantiated object __slot__ method 1.23399996758 >>> From paul at prescod.net Sun Aug 13 19:57:20 2006 From: paul at prescod.net (Paul Prescod) Date: Sun, 13 Aug 2006 10:57:20 -0700 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <5.1.1.6.0.20060813125944.056a3f40@sparrow.telecommunity.com> References: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com> <20060812205512.197A.JCARLSON@uci.edu> <5.1.1.6.0.20060813013329.0226d240@sparrow.telecommunity.com> <5.1.1.6.0.20060813125944.056a3f40@sparrow.telecommunity.com> Message-ID: <1cb725390608131057y122b0c0wf81611e136659793@mail.gmail.com> If we get past the meta-discussion, I don't really see any disagreement left. I'll grit my teeth and avoid commenting on the meta-discussion. ;) My proposed text for the PEP is as follows: "In order for processors of function annotations to work interoperably, they must use a common interpretation of objects used as annotations on a particular function. For example, one might interpret string annotations as docstrings. Another might interpet them as path segments for a web framework. For this reason, function annotation processors SHOULD avoid assigning processor-specific meanings to types defined outside of the processor's framework. For example, a Django processor could process annotations of a type defined in a Zope package, but Zope's creators should be considered the authorities on the type's meaning for the same reasons that they would be considered authorities on the semantics of classes or methods in their packages." "This implies that the interpretation of built-in types would be controlled by Python's developers and documented in Python's documentation. This is just a best practice. Nothing in the language can or should enforce this practice and there may be a few domains where there is a strong argument for violating it ( e.g. an education environment where saving keystrokes may be more important than easing interopability)." "In Python 3000, semantics will be attached to the following types: objects of type string (or subtype of string) are to be used for documentation (though they are not necessarily the exclusive source of documentation about the type). Objects of type list (or subtype of list) are to be used for attaching multiple independent annotations." "Developers who define new metadata frameworks SHOULD choose explicit and unambiguous mechanisms for associating objects with their frameworks. Furthermore, they SHOULD consider that some users may wish to extend their frameworks and should support that. For example, they could use Python 3000 overloaded functions, some form of registry, some kind of interface or some unambiguously recognizable method signature protocol (e.g. _pytypelib_type_check())." Paul Prescod -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060813/9edf62a9/attachment.htm From talin at acm.org Sun Aug 13 22:08:10 2006 From: talin at acm.org (Talin) Date: Sun, 13 Aug 2006 13:08:10 -0700 Subject: [Python-3000] Bound and unbound methods In-Reply-To: <20060813102036.1985.JCARLSON@uci.edu> References: <44DF0D38.6070507@acm.org> <20060813102036.1985.JCARLSON@uci.edu> Message-ID: <44DF86AA.7050207@acm.org> Josiah Carlson wrote: > Talin wrote: >> One of the items in PEP 3100 is getting rid of unbound methods. I want >> to explore a heretical notion, which is getting rid of bound methods as >> well. >> >> Now, to be honest, I rather like bound methods. I like being able to >> capture a method call, store it in a variable, and call it later. >> >> However, I also realize that requiring every access to a class variable >> to instantiate a new method object is expensive, to say the least. > > Well, it's up-front vs. at-access. For instances whose methods are > generally used rarely, the up-front cost of instantiating every method > is high in comparison (unless there are a relatively large number of > method accesses), and technically infinite if applied to all objects. > Why? > > I have a class foo, I instantiate foo, now all of foo's methods get > instantiated. Ahh, but foo's methods are also instances of function. It > doesn't really have any new methods on foo's methods, but they do have > attributes that are instances, so we will need to instantiate all of the > methods' attributes' methods, and recursively, to infinity. The > non-creation of instantiated methods for objects is a lazy-evaluation > technique to prevent infinite recursion, in general. > > On the other hand, it may make sense to offer a metaclass and/or > decorator that signals that a single method instance should be created > for particular methods up-front, rather than at-access to those methods. > But what kind of difference could we expect? 42%/28% improvement for > class methods/object methods in 2.4 respectively, and 45%/26% > improvement in 2.5 beta . This does not include actually calling the > methods. No, I wasn't proposing that methods be bound up front...read on. >> Now, one remaining problem to be solved is whether or not to pass 'self' >> as an argument to the resulting callable. I suppose that could be >> handled by inspecting the attributes of the callable and adding the >> extra 'self' argument at the last minute if its not a static method. I >> suspect such tests would be relatively fast, much less than the time >> needed to instantiate and initialize a new method object. > > I think that a change that required calls of the form > obj.instancemethod(obj, ...) are non-starters. Yes, that's a non-starter, but that's not what I was proposing either. I see that I left an important piece out of my proposal, which I'll need to explain. Right now, when you say: 'obj.instancemethod()', there are in fact two distinct operations going on. The first is the lookup of the attribute 'instancemethod', and the second is the invoking of the resulting callable. In order to get rid of the creation of method objects, the compiler would have to recognize these two operations and combine them into a single "call method" opcode - one which looks up the attribute, but leaves the original object pointer on the stack, and then invokes the resulting callable, along with the object pointer. So essentially the 'bind' operation is moved into the method invocation code - which eliminates the need to create a holding object to remember the binding information. Hmmmm....I wonder if it could be me made to work in a backwards-compatible way. In other words, suppose the existing logic of creating a method object were left in place, however the 'obj.instancemethod()' pattern would bypass all of that. In other words, the compiler would note the combination of the attribute access and the call, and combine them into an opcode that skips the whole method creation step. (Maybe it already does this and I'm just being stupid.) > I'm -1 for instantiating all methods (for the infinite cost reasons), > and -1 for int, long, list, tuple, dict, float (method access is > generally limited for these objects). I'm +0 for offering a suitable > metaclass and/or decorator, but believe it would be better suited for > the Python cookbook, as performance improvements when function calls are > taken into consideration is significantly less. Thanks for the timing information by the way. From thomas at python.org Sun Aug 13 23:22:32 2006 From: thomas at python.org (Thomas Wouters) Date: Sun, 13 Aug 2006 23:22:32 +0200 Subject: [Python-3000] Bound and unbound methods In-Reply-To: <44DF86AA.7050207@acm.org> References: <44DF0D38.6070507@acm.org> <20060813102036.1985.JCARLSON@uci.edu> <44DF86AA.7050207@acm.org> Message-ID: <9e804ac0608131422w3bd95d57gb1c195e16dc1f9bd@mail.gmail.com> On 8/13/06, Talin wrote: > Hmmmm....I wonder if it could be me made to work in a > backwards-compatible way. In other words, suppose the existing logic of > creating a method object were left in place, however the > 'obj.instancemethod()' pattern would bypass all of that. In other words, > the compiler would note the combination of the attribute access and the > call, and combine them into an opcode that skips the whole method > creation step. (Maybe it already does this and I'm just being stupid.) Been there, done that, bought the T-shirt (well, it was just a PyCon-1 T-shirt): http://sourceforge.net/tracker/index.php?func=detail&aid=709744&group_id=5470&atid=305470 Back then, the end result of that particular change was very tiny, and it wasn't even taking new-style classes into account (which would have made it more complex.) It may be worth re-trying anyway, especially for python-3000: no classic classes to worry about. And quite a lot has changed in the compiler and opcode dispatcher in the mean time. I am completely -1 on getting rid of bound methods, though. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060813/5d70b8e9/attachment.html From ark-mlist at att.net Mon Aug 14 00:47:31 2006 From: ark-mlist at att.net (Andrew Koenig) Date: Sun, 13 Aug 2006 18:47:31 -0400 Subject: [Python-3000] Bound and unbound methods In-Reply-To: <44DF0D38.6070507@acm.org> Message-ID: <000901c6bf2a$76781270$6402a8c0@arkdesktop> > However, I also realize that requiring every access to a class variable > to instantiate a new method object is expensive, to say the least. Why does every access to a class variable have to instantiate a new method object? From tomerfiliba at gmail.com Mon Aug 14 01:03:06 2006 From: tomerfiliba at gmail.com (tomer filiba) Date: Mon, 14 Aug 2006 01:03:06 +0200 Subject: [Python-3000] Bound and unbound methods Message-ID: <1d85506f0608131603u39be2727ie0b2f15db3dee69f@mail.gmail.com> [Josiah] > I'm -1 for instantiating all methods (for the infinite cost reasons), > and -1 for int, long, list, tuple, dict, float (method access is > generally limited for these objects). I'm +0 for offering a suitable > metaclass and/or decorator, but believe it would be better suited for > the Python cookbook, as performance improvements when function calls are > taken into consideration is significantly less. http://sebulba.wikispaces.com/receip+prebound i'm sorry, i just love descriptors too much. it kept me out of bed, until i wrote it down :) -tomer From ncoghlan at gmail.com Mon Aug 14 03:40:43 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 14 Aug 2006 11:40:43 +1000 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <1cb725390608131057y122b0c0wf81611e136659793@mail.gmail.com> References: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com> <20060812205512.197A.JCARLSON@uci.edu> <5.1.1.6.0.20060813013329.0226d240@sparrow.telecommunity.com> <5.1.1.6.0.20060813125944.056a3f40@sparrow.telecommunity.com> <1cb725390608131057y122b0c0wf81611e136659793@mail.gmail.com> Message-ID: <44DFD49B.4030308@gmail.com> Paul Prescod wrote: > If we get past the meta-discussion, I don't really see any disagreement > left. I'll grit my teeth and avoid commenting on the meta-discussion. ;) Ah, so I'm not the only one doing that then };> > My proposed text for the PEP is as follows: Generally +1, except for this bit: > "In Python 3000, semantics will be attached to the following types: > objects of type string (or subtype of string) are to be used for > documentation (though they are not necessarily the exclusive source of > documentation about the type). Objects of type list (or subtype of list) > are to be used for attaching multiple independent annotations." Interpretations of string & list subtypes should be up to whoever creates those subtypes - it's only the builtins themselves that python-dev should be the authority for. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ncoghlan at gmail.com Mon Aug 14 04:27:57 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 14 Aug 2006 12:27:57 +1000 Subject: [Python-3000] Bound and unbound methods In-Reply-To: <44DF0D38.6070507@acm.org> References: <44DF0D38.6070507@acm.org> Message-ID: <44DFDFAD.3010408@gmail.com> Talin wrote: > Anyway, I just wanted to throw that out there. Feel free to -1 away... :) Based on the later discussion, I see two interesting possibilities: 1. A special CALL_METHOD opcode that the compiler emits when it spots the ".NAME(ARGS)" pattern. This could simply be an optimisation performed by the bytecode emitter when processing an AST Call node with an Attribute node as the "func" subnode (it would need to poke around inside the Attribute node, rather than generating the Attribute node's code normally, though). For functions, this opcode could bypass __get__ and invoke __call__ directly with the right arguments. Put the actual optimisation into PyObject_CallMethod and call that from the new opcode, and more than just the eval loop would benefit. This could also be done by the addition of a MethodCall AST node, and an AST->AST optimizing pass that took the Call+Attribute node and merged them into a single MethodCall node (The concrete parser can't look far enough ahead to figure out that a given attribute access is part of a method call). Option 1 is focused on the speedup Talin mentioned. Aside from the downside of additional complexity in the code generation phase, I don't see any real downside - __get__ will only be bypassed when the interpreter *knows* what the descriptor would do. 2. Rewrite the __get__ methods on functions, classmethod and staticmethod to cache the resulting method object in the class dictionary or instance dictionary. This would entail making method objects descriptors that returned a bound copy of themselves when retrieved through an instance. That way, for methods that are never called, the method objects are never created, but for methods that are used, the method object is created only once. Something would need to be done to make this work for object's without an instance dictionary - those could either continue to not cache their instance methods, they could have a lazily initialized __dict__ pointer that is instantiated the first time it is needed instead of no dict at all (yay, attributes on object() instances!), or else there could be an id() keyed cache internal to the interpreter. I personally would favour the option of making __dict__ available by default (i.e. put that behaviour in object), with no caching occurring if the object had no __dict__ attribute at all. Tuples and the numeric types could continue not to support attributes (as allocating space for an extra pointer would be a big size increase for them in their general usage pattern, and they don't generally have methods that are called from Python), while the other builtin types would acquire a usable __dict__ attribute (which may not be instantiated until the first time it is needed, although if instance methods get cached, it would be needed most of the time, so the extra complexity of lazy initialization may not be worth it). The interesting benefit of option 2 is that "assert list.append is list.append" would now succeed, as would "s = []; assert s.append is s.append". "assert [].index is [].index" would still fail though, as different instances would get their own bound methods. The downside of option 2 is that it is slightly more likely to break stuff due to the changes in semantics, and that it is a case of a genuine space-speed tradeoff - this approach *will* use more memory than the current approach, because bound method objects are always allocated permanently instead of being ephemeral things. OTOH, if you did both option 1 and option 2, the caching would occur only if you retrieved a method without calling it immediately, and be bypassed most of the time. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From greg.ewing at canterbury.ac.nz Mon Aug 14 04:31:46 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 14 Aug 2006 14:31:46 +1200 Subject: [Python-3000] Bound and unbound methods In-Reply-To: <44DF86AA.7050207@acm.org> References: <44DF0D38.6070507@acm.org> <20060813102036.1985.JCARLSON@uci.edu> <44DF86AA.7050207@acm.org> Message-ID: <44DFE092.8030604@canterbury.ac.nz> Talin wrote: > the compiler would note the combination of the attribute access and the > call, and combine them into an opcode that skips the whole method > creation step. Something like that could probably be made to work. You'd want to be careful to do the optimisation only when the attribute in question is an ordinary attribute, not a property or other descriptor. I'm also -1 on eliminating bound methods entirely. I worked through that idea in considerable depth during my discussions with the author of Prothon, which was also to have been without any notion of bound methods. The consequences are further-reaching than you might think at first. The bottom line is that without bound methods, Python wouldn't really be Python any more. -- Greg From greg.ewing at canterbury.ac.nz Mon Aug 14 05:22:10 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 14 Aug 2006 15:22:10 +1200 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com> References: <5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com> <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com> <44DD5DF0.40405@acm.org> <5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com> <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com> Message-ID: <44DFEC62.6000904@canterbury.ac.nz> Phillip J. Eby wrote: > Since many people seem to be unfamiliar with overloaded functions, I would > just like to take this opportunity to remind you that the actual overload > mechanism is irrelevant. I don't think it's the concept of overloadable functions that people are having trouble with here, but that you haven't clearly explained *how* they would be applied to solving this particular problem. You seem to think the answer to that is so obvious that it doesn't need mentioning, but we're not all up to the same mental speed as you on this. Perhaps you could provide a complete worked-out example for people to look at? -- Greg From greg.ewing at canterbury.ac.nz Mon Aug 14 05:22:22 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 14 Aug 2006 15:22:22 +1200 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com> References: <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com> Message-ID: <44DFEC6E.8020603@canterbury.ac.nz> Phillip J. Eby wrote: > Not at all. A and B need only use overloadable functions, and the problem > is trivially resolved by adding overloads. The author of C can add an > overload to "A" that will handle objects with 'next' attributes, or add one > to "B" that handles tuples, or both. Phillip, you still haven't explained what to do if the code processing the annotations is in a separate program altogether, to which the user has no access in order to overload methods or perform other such modifications. -- Greg From pje at telecommunity.com Mon Aug 14 06:21:27 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 14 Aug 2006 00:21:27 -0400 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <44DFEC6E.8020603@canterbury.ac.nz> References: <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com> <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20060814002014.02dbe9d0@sparrow.telecommunity.com> At 03:22 PM 8/14/2006 +1200, Greg Ewing wrote: >Phillip J. Eby wrote: > >>Not at all. A and B need only use overloadable functions, and the >>problem is trivially resolved by adding overloads. The author of C can >>add an overload to "A" that will handle objects with 'next' attributes, >>or add one to "B" that handles tuples, or both. > >Phillip, you still haven't explained what to do if >the code processing the annotations is in a separate >program altogether, to which the user has no access >in order to overload methods or perform other such >modifications. It can't be a "separate program altogether", since to get at the annotations, the program must import the module that contains them. Thus, the registration need only occur in some module imported by the module that uses the annotations. From pje at telecommunity.com Mon Aug 14 06:52:41 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 14 Aug 2006 00:52:41 -0400 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <44DFEC62.6000904@canterbury.ac.nz> References: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com> <5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com> <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com> <44DD5DF0.40405@acm.org> <5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com> <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20060814002138.02909ad0@sparrow.telecommunity.com> At 03:22 PM 8/14/2006 +1200, Greg Ewing wrote: >Phillip J. Eby wrote: >>Since many people seem to be unfamiliar with overloaded functions, I >>would just like to take this opportunity to remind you that the actual >>overload mechanism is irrelevant. > >I don't think it's the concept of overloadable functions >that people are having trouble with here, but that you >haven't clearly explained *how* they would be applied >to solving this particular problem. In the same way that plain old standard Python duck typing would be used. The only differences between overloaded functions and duck typing are that: 1. Overloaded functions can't accidentally collide, the way names chosen for duck typing can. 2. Third parties can declare overloaded methods without monkeypatching, but duck typing requires that you be the author of the object in question or that you be able to monkeypatch the type to add methods. 3. You can usually define some default behavior for an unrecognized type - as though you could add methods to the 'object' type. 4. Overloaded functions can dispatch on more than one type at the same time, or do other things, depending on their implementation. Aside from these extra features of overloaded functions, there isn't much difference between overloading and duck typing; it's merely the difference between: someOb.quack() and: quack(someOb) So, if you can imagine handling annotations using duck typing and hasattr(), then you can imagine doing it with overloaded functions. If you can't imagine using duck typing or hasattr() to process some annotations and ignore the ones you don't understand, then I don't really know how I would explain it. >You seem to think the answer to that is so obvious >that it doesn't need mentioning, but we're not all >up to the same mental speed as you on this. > >Perhaps you could provide a complete worked-out >example for people to look at? I did - the PEAK documentation links I gave previously included a doctest that walked through the definition of a 'Message()' attribute annotation that prints a message at class definition (or other metadata definition) time. The other two links showed examples of using attribute annotations for declaring security permissions and command-line options. Some people said they didn't "get" anything from those links, but I'm at somewhat of a loss to understand why. The examples there are very short and simple; in fact the complete Message implementation, including imports and overload declarations is only *6 lines long*. So, my only guess is that the people who looked at that skimmed right past it, looking for something more complicated! They probably then proceeded to the rest of the documentation and got bogged down in other aspects of the framework that aren't related to this discussion. Therefore, if anybody would like to provide an example of how *they* would write code for some function attribute scenario, I'll happily modify it to demonstrate tell-don't-ask with either duck typing, adaptation, overloading, or whatever you like. But from a communication POV, it doesn't make sense to me to try and write an example, since it's going to come from *my* worldview (in which this is a trivial problem) and not the worldview of the people who don't understand it. It seems to me that the right way to proceed is to have somebody provide an example in *their* worldview, so that when I alter it they will have a reference point for what I'm talking about. (Notice that this seemed to work well for Josiah and Paul when I reworked Paul's example.) From theller at python.net Mon Aug 14 16:55:24 2006 From: theller at python.net (Thomas Heller) Date: Mon, 14 Aug 2006 16:55:24 +0200 Subject: [Python-3000] threading, part 2 --- + a bit of ctypes FFI worry In-Reply-To: <1f7befae0608120329wc646164w25ca4875da4cc5c0@mail.gmail.com> References: <1f7befae0608120329wc646164w25ca4875da4cc5c0@mail.gmail.com> Message-ID: Tim Peters schrieb: > [Josiah Carlson] >> ... >> Python 2.3.5 (#62, Feb 8 2005, 16:23:02) [MSC v.1200 32 bit (Intel)] on win32 >> Type "help", "copyright", "credits" or "license" for more information. >> >>> import ctypes >> >>> import threading >> >>> import time >> >>> def foo(): >> ... try: >> ... while 1: >> ... time.sleep(.01) >> ... finally: >> ... print "I quit!" >> ... >> >>> x = threading.Thread(target=foo) >> >>> x.start() >> >>> for i,j in threading._active.items(): >> ... if j is x: >> ... break >> ... >> >>> ctypes.pythonapi.PyThreadState_SetAsyncExc(i, ctypes.py_object(Exception)) > > As I discovered to my chagrin when I added a similar test to the test > suite a few days ago, that's got a subtle error on most 64-bit boxes. > When the ctypes docs talk about passing and returning integers, they > never explain what "integers" /means/, but it seems the docs > implicitly have a 32-bit-only view of the world here. In reality > "integer" seems to mean the native C `int` type. 'ctypes.c_int' and 'ctypes.c_long' correspond to the C 'int' and 'long' types. If you think that the docs could be clearer, please suggest changes. > But a Python thread > id is a native C `long` (== a Python short integer), and the code > above fails in a baffling way on most 64-bit boxes: the call returns > 0 instead; i.e. the thread id isn't found, and no exception gets set. > So I believe that needs to be: > > ctypes.pythonapi.PyThreadState_SetAsyncExc( > ctypes.c_long(i), > ctypes.py_object(Exception)) > > to make it portable. Right. A little bit more safety migt be gained by setting the argtypes attribute of the PyThreadState_SetAsyncExc function in this way: ctypes.pythonapi.PyThreadState_SetAsyncEx.argtypes = ctypes.c_long, ctypes.py_object This way the wrapping of arguments is automatic. > It's unclear to me how to write portable ctypes code in the presence > of a gazillion integer typedefs and #defines, such as for Py_ssize_t. > That doesn't map to a fixed C integral type cross-platform, so what > can you do? You're not required to answer that ;-) This must probably be exported from the C code. Currently ctypes has the basic (integer) types c_byte, c_short, c_int, c_long, c_longlong, plus their unsigned variants. On 32-bit platforms, c_int is an alias to c_long. Sized ints are defined: c_int8, c_int16, c_int32, c_int64, (plus the unsigned variants again), also as aliases to the 10 basic integer types. I *should* be possible by some checks to find out about the size of Py_ssize_t at runtime (unless it is an configurable option)... > Thread ids may bite us someday too. Python casts the platform's > notion of a thread id to C `long`, but there's no guarantee this won't > lose information (or is even legal) on all platforms. We'd probably > be safer casting to, e.g., Py_uintptr_t (some thread implementions > return an index into a kernel or library thread-info table, but at > least some in my lifetime returned a pointer to a thread-info struct, > and that's definitely fatter than C `long` on some boxes). > >> 1 >> >>> I quit! >> Exception in thread Thread-2:Traceback (most recent call last): >> File "C:\python23\lib\threading.py", line 442, in __bootstrap >> self.run() >> File "C:\python23\lib\threading.py", line 422, in run >> self.__target(*self.__args, **self.__kwargs) >> File "", line 4, in foo >> Exception > > It's really cool that you can do this from ctypes, eh? That's exactly > the right level of abstraction for this attractive nuisance too ;-) ;-) Thomas From guido at python.org Mon Aug 14 17:31:31 2006 From: guido at python.org (Guido van Rossum) Date: Mon, 14 Aug 2006 08:31:31 -0700 Subject: [Python-3000] Ctypes as cross-interpreter C calling interface In-Reply-To: References: <1cb725390608092219v695b7f24t92534d3aa444ca8c@mail.gmail.com> Message-ID: After thinking about it some more, IMO for most purposes ctypes is really quite sub-optimal. I think it would make more sense to work on Parrot support for Python. Sure, in the short term ctypes is more practical than Parrot -- in its most recent incarnation, the latter doesn't even list Python as a supported language -- a regression from last year when Python support was among the best. But in the long term, Parrot (like .NET or Jython do in other contexts) offers cross-language interoperability, and perhaps even (like .NET and Jython) automatic generation of wrappers. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From exarkun at divmod.com Mon Aug 14 17:33:48 2006 From: exarkun at divmod.com (Jean-Paul Calderone) Date: Mon, 14 Aug 2006 11:33:48 -0400 Subject: [Python-3000] Ctypes as cross-interpreter C calling interface In-Reply-To: Message-ID: <20060814153348.1717.1313126828.divmod.quotient.22734@ohm> On Mon, 14 Aug 2006 08:31:31 -0700, Guido van Rossum wrote: >After thinking about it some more, IMO for most purposes ctypes is >really quite sub-optimal. I think it would make more sense to work on >Parrot support for Python. Sure, in the short term ctypes is more >practical than Parrot -- in its most recent incarnation, the latter >doesn't even list Python as a supported language -- a regression from >last year when Python support was among the best. But in the long >term, Parrot (like .NET or Jython do in other contexts) offers >cross-language interoperability, and perhaps even (like .NET and >Jython) automatic generation of wrappers. > This is a joke, right? Jean-Paul From guido at python.org Mon Aug 14 18:09:49 2006 From: guido at python.org (Guido van Rossum) Date: Mon, 14 Aug 2006 09:09:49 -0700 Subject: [Python-3000] Ctypes as cross-interpreter C calling interface In-Reply-To: <20060814153348.1717.1313126828.divmod.quotient.22734@ohm> References: <20060814153348.1717.1313126828.divmod.quotient.22734@ohm> Message-ID: No. Why would it be a joke? Because it's a Perl thing? Because it doesn't acknowledge Python's obvious supremacy in the universe of languages? Because it admits that other projects sometimes have good ideas? Because it's a good idea to have to write separate wrappers around every useful library for each dynamic languague separately? Because Parrot isn't real? IMO it's pretty real already -- the 0.4.6 release supports Ruby, Javascript, Tcl, and a bunch more (possibly even Perl 6 :-). I wouldn't be surprised if Parrot reached maturity around the same time as Py3k. --Guido On 8/14/06, Jean-Paul Calderone wrote: > On Mon, 14 Aug 2006 08:31:31 -0700, Guido van Rossum wrote: > >After thinking about it some more, IMO for most purposes ctypes is > >really quite sub-optimal. I think it would make more sense to work on > >Parrot support for Python. Sure, in the short term ctypes is more > >practical than Parrot -- in its most recent incarnation, the latter > >doesn't even list Python as a supported language -- a regression from > >last year when Python support was among the best. But in the long > >term, Parrot (like .NET or Jython do in other contexts) offers > >cross-language interoperability, and perhaps even (like .NET and > >Jython) automatic generation of wrappers. > > > > This is a joke, right? > > Jean-Paul > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From exarkun at divmod.com Mon Aug 14 19:20:00 2006 From: exarkun at divmod.com (Jean-Paul Calderone) Date: Mon, 14 Aug 2006 13:20:00 -0400 Subject: [Python-3000] Ctypes as cross-interpreter C calling interface In-Reply-To: Message-ID: <20060814172000.1717.863905740.divmod.quotient.22821@ohm> On Mon, 14 Aug 2006 09:09:49 -0700, Guido van Rossum wrote: >On 8/14/06, Jean-Paul Calderone wrote: >>On Mon, 14 Aug 2006 08:31:31 -0700, Guido van Rossum >>wrote: >> >After thinking about it some more, IMO for most purposes ctypes is >> >really quite sub-optimal. I think it would make more sense to work on >> >Parrot support for Python. Sure, in the short term ctypes is more >> >practical than Parrot -- in its most recent incarnation, the latter >> >doesn't even list Python as a supported language -- a regression from >> >last year when Python support was among the best. But in the long >> >term, Parrot (like .NET or Jython do in other contexts) offers >> >cross-language interoperability, and perhaps even (like .NET and >> >Jython) automatic generation of wrappers. >> > >> >>This is a joke, right? >> >No. Why would it be a joke? Because it's a Perl thing? Because it >doesn't acknowledge Python's obvious supremacy in the universe of >languages? Because it admits that other projects sometimes have good >ideas? Heh. Strawmen, all. I assure you, none of these objections ever entered my mind. >Because it's a good idea to have to write separate wrappers >around every useful library for each dynamic languague separately? If a project has done this successfully, I don't think I've seen it. Can you point out some examples where this has been accomplished in a useful form? The nearest thing I can think of is SWIG, which is basically a failure. This is not to say that it is not a noble goal, but I think it remains to be shown that Parrot is actually a solution here. >Because Parrot isn't real? IMO it's pretty real already -- the 0.4.6 >release supports Ruby, Javascript, Tcl, and a bunch more (possibly >even Perl 6 :-). I wouldn't be surprised if Parrot reached maturity >around the same time as Py3k. > Parrot has been around for quite a while now without accomplishing anything much of practical value. Does anyone _use_ it for Ruby, JavaScript, or Tcl? (I know no one uses it for Perl 6 ;) For five years of development by a pretty large community, that's not showing a lot. The reason I suspected a joke is that you seem to want to discard a fairly good existing widely used solution in favor of one that's just vapor right now. Granted Py3k is a ways off, but it's not /that/ far off. We're talking about a year or two here. Is Parrot going to be as solid in a year as ctypes already is? I doubt it. If you /really/ want to look outside of the Python community for solutions here, the lisp community has thought about this for a long time. Instead of looking at Parrot, you should look at the ffi provided by almost any lisp runtime. Jean-Paul From guido at python.org Mon Aug 14 19:38:25 2006 From: guido at python.org (Guido van Rossum) Date: Mon, 14 Aug 2006 10:38:25 -0700 Subject: [Python-3000] PEP3102 Keyword-Only Arguments In-Reply-To: References: Message-ID: Not remembering the PEP in detail, I agree with Jim's resolution of all these. I guess the right rule is that all positional arguments come first (first the regular ones, then * or *args). Then come the keyword arguments, again, first the regular ones (name=value), then **kwds. I believe the PEP doesn't address the opposite use case: positional arguments that should *not* be specified as keyword arguments. For example, I might want to write def foo(a, b): ... but I don't want callers to be able to call it as foo(b=1, a=2) or even foo(a=2, b=1). A realistic example is the write() method of file objects. We really don't want people starting to say f.write(s="abc") because even if that works for the current file type you're using, it won't work if an instance of some other class implementing write() is substituted -- write() is always documented as an API taking a positional argument, so different "compatible" classes are likely to have different argument names. Currently this is enforced because the default file type is implemented in C and it doesn't have keyword arguments; but in Py3k it may well be implemented in Python and then we currently have no decent way to say "this should really be a positional argument". (There's an analogy to forcing keyword arguments using **, using *args for all arguments and parsing that explicitly -- but that's tedious for a fairly common use case.) Perhaps we can use ** without following identifier to signal this? It's not entirely analogous to * without following identifier, but at least somewhat similar. --Guido On 8/12/06, Jim Jewett wrote: > On 8/11/06, Jiwon Seo wrote: > > When we have keyword-only arguments, do we allow 'keyword dictionary' > > argument? If that's the case, where would we want to place > > keyword-only arguments? > > > Are we going to allow any of followings? > > > 1. def foo(a, b, *, key1=None, key2=None, **map) > > Seems perfectly reasonable. > > I think the controversy was over whether or not to allow keyword-only > without a default. > > > 2. def foo(a, b, *, **map, key1=None, key2=None) > > Seems backward, though I suppose we could adjust if we needed to. > > > 3. def foo(a, b, *, **map) > > What would the * even mean, since there aren't any named keywords to separate? > > -jJ > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From steven.bethard at gmail.com Mon Aug 14 19:49:36 2006 From: steven.bethard at gmail.com (Steven Bethard) Date: Mon, 14 Aug 2006 11:49:36 -0600 Subject: [Python-3000] PEP3102 Keyword-Only Arguments In-Reply-To: References: Message-ID: On 8/14/06, Guido van Rossum wrote: > I believe the PEP doesn't address the opposite use case: positional > arguments that should *not* be specified as keyword arguments. For > example, I might want to write > > def foo(a, b): ... > > but I don't want callers to be able to call it as foo(b=1, a=2) or > even foo(a=2, b=1). Another use case is when you want to accept the arguments of another callable, but you have your own positional arguments:: >>> class Wrapper(object): ... def __init__(self, func): ... self.func = func ... def __call__(self, *args, **kwargs): ... print 'calling wrapped function' ... return self.func(*args, **kwargs) ... >>> @Wrapper ... def func(self, other): ... return self, other ... >>> func(other=1, self=2) Traceback (most recent call last): File "", line 1, in ? TypeError: __call__() got multiple values for keyword argument 'self' It would be really nice in the example above to mark ``self`` in ``__call__`` as a positional only argument. > Perhaps we can use ** without following identifier to signal this? > It's not entirely analogous to * without following identifier, but at > least somewhat similar. I'm certainly not opposed to going this way, but I don't think it would solve the problem above since you still need to take keyword arguments. STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From guido at python.org Mon Aug 14 20:04:19 2006 From: guido at python.org (Guido van Rossum) Date: Mon, 14 Aug 2006 11:04:19 -0700 Subject: [Python-3000] PEP3102 Keyword-Only Arguments In-Reply-To: References: Message-ID: On 8/14/06, Steven Bethard wrote: > On 8/14/06, Guido van Rossum wrote: > > I believe the PEP doesn't address the opposite use case: positional > > arguments that should *not* be specified as keyword arguments. For > > example, I might want to write > > > > def foo(a, b): ... > > > > but I don't want callers to be able to call it as foo(b=1, a=2) or > > even foo(a=2, b=1). > > Another use case is when you want to accept the arguments of another > callable, but you have your own positional arguments:: > > >>> class Wrapper(object): > ... def __init__(self, func): > ... self.func = func > ... def __call__(self, *args, **kwargs): > ... print 'calling wrapped function' > ... return self.func(*args, **kwargs) > ... > >>> @Wrapper > ... def func(self, other): > ... return self, other > ... > >>> func(other=1, self=2) > Traceback (most recent call last): > File "", line 1, in ? > TypeError: __call__() got multiple values for keyword argument 'self' > > It would be really nice in the example above to mark ``self`` in > ``__call__`` as a positional only argument. But this is a rather unusual use case isn't it? It's due to the bound methods machinery. Do you have other use cases? I would assume that normally such wrappers take their own control arguments in the form of keyword-only arguments (that are unlikely to conflict with arguments of the wrapped method). > > Perhaps we can use ** without following identifier to signal this? > > It's not entirely analogous to * without following identifier, but at > > least somewhat similar. > > I'm certainly not opposed to going this way, but I don't think it > would solve the problem above since you still need to take keyword > arguments. Can you elaborate? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Aug 14 20:08:56 2006 From: guido at python.org (Guido van Rossum) Date: Mon, 14 Aug 2006 11:08:56 -0700 Subject: [Python-3000] Ctypes as cross-interpreter C calling interface In-Reply-To: <20060814172000.1717.863905740.divmod.quotient.22821@ohm> References: <20060814172000.1717.863905740.divmod.quotient.22821@ohm> Message-ID: On 8/14/06, Jean-Paul Calderone wrote: > On Mon, 14 Aug 2006 09:09:49 -0700, Guido van Rossum wrote: > >On 8/14/06, Jean-Paul Calderone wrote: > >>This is a joke, right? > >Because it's a good idea to have to write separate wrappers > >around every useful library for each dynamic languague separately? > > If a project has done this successfully, I don't think I've seen it. Can > you point out some examples where this has been accomplished in a useful > form? The nearest thing I can think of is SWIG, which is basically a > failure. SWIG is not my favorite (msotly because I don't like C++ much) but it's used very effectively here at Google (for example); I wouldn't dream of calling it a failure. I also consider .NET's CLR a success, based on the testimony of Jim Hugunin (who must be Microsoft's most reluctant employee :). And I see the JVM as a successful case too -- Jython can link to anything written in Java or compiled to JVM bytecode, and so can other languages that use JVM introspection the same way as Jython (I hear there's Ruby analogue). The major difference between all these examples and ctypes is that ctypes has no way of introspecting the wrapped library; you have to repeat everything you know about the API in your calls to ctypes (and as was just shown in another thread about 64-bit issues, that's not always easy). > This is not to say that it is not a noble goal, but I think it remains to > be shown that Parrot is actually a solution here. Parrot definitely has to show itself still. But a year ago Sam Ruby reported on his efforts of making Python work on Parrot, and he sounded like it was very a feasible proposition. > Parrot has been around for quite a while now without accomplishing anything > much of practical value. Does anyone _use_ it for Ruby, JavaScript, or Tcl? > (I know no one uses it for Perl 6 ;) > > For five years of development by a pretty large community, that's not showing > a lot. The reason I suspected a joke is that you seem to want to discard a > fairly good existing widely used solution in favor of one that's just vapor > right now. Granted Py3k is a ways off, but it's not /that/ far off. We're > talking about a year or two here. Is Parrot going to be as solid in a year > as ctypes already is? I doubt it. That's not exactly the point I am making. I find Parrot's approach, assuming the project won't fail due to internal friction, much more long-term viable than ctypes. The big difference being (I hope) introspective generation of APIs rather than having to repeat the linkage information in each client language. > If you /really/ want to look outside of the Python community for solutions > here, the lisp community has thought about this for a long time. Instead of > looking at Parrot, you should look at the ffi provided by almost any lisp > runtime. This seems a mostly theoretical viewpoint to me. Can you point me to an example of a Python-like language that is successful in reusing a Lisp runtime? (And I don't consider Lisp or Scheme Python-like in this context. ;-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Aug 14 20:13:56 2006 From: guido at python.org (Guido van Rossum) Date: Mon, 14 Aug 2006 11:13:56 -0700 Subject: [Python-3000] Python/C++ question In-Reply-To: References: <44DA6C01.2040904@acm.org> <44DF0800.4060204@acm.org> Message-ID: On 8/13/06, Georg Brandl wrote: > Talin wrote: > > Guido van Rossum wrote: > >> On 8/9/06, Talin wrote: > >> For the majority of Python developers it's probably the other way > >> around. It's been 15 years since I wrote C++, and unlike C, that > >> language has changed a lot since then... > >> > >> It would be a complete rewrite; I prefer doing a gradual > >> transmogrification of the current codebase into Py3k rather than > >> starting from scratch (read Joel Spolsky on why). > > > > BTW, Should this be added to PEP 3099? > > Yes, why not. Although perhaps it makes more sense to add something positive to PEP 3000, e.g. Implementation Language ================== Python 3000 will be implemented in C, and the implementation will be derived as an evolution of the Python 2 code base. This reflects my views (which I share with Joel Spolsky) on the dangers of complete rewrites. Since Python 3000 as a language is a relatively mild improvement on Python 2, we can gain a lot by not attempting to reimplement the language from scratch. I am not against parallel from-scratch implementation efforts, but my own efforts will be directed at the language and implementation that I know best. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Aug 14 20:17:14 2006 From: guido at python.org (Guido van Rossum) Date: Mon, 14 Aug 2006 11:17:14 -0700 Subject: [Python-3000] threading, part 2 In-Reply-To: <1d85506f0608111713m15cf2e67v8b94f06c928e9125@mail.gmail.com> References: <1d85506f0608111713m15cf2e67v8b94f06c928e9125@mail.gmail.com> Message-ID: On 8/11/06, tomer filiba wrote: > i mailed this to several people separately, but then i thought it could > benefit the entire group: > > http://sebulba.wikispaces.com/recipe+thread2 > > it's an implementation of the proposed " thread.raise_exc", through an extension > to the threading.Thread class. you can test it for yourself; if it proves useful, > it should be exposed as thread.raise_exc in the stdlib (instead of the ctypes > hack)... and of course it should be reflected in threading.Thread as welll. Cool. Question: what's the problem with raising exception instances? Especially in the light of my proposal to use raise SomeException(42) in preference over (and perhaps exclusively instead of) raise SomeException, 42 in Py3k. The latter IMO is a relic from the days of string exceptions which are as numbered as they come. :-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From paul at prescod.net Mon Aug 14 20:40:04 2006 From: paul at prescod.net (Paul Prescod) Date: Mon, 14 Aug 2006 11:40:04 -0700 Subject: [Python-3000] Ctypes as cross-interpreter C calling interface In-Reply-To: References: <1cb725390608092219v695b7f24t92534d3aa444ca8c@mail.gmail.com> Message-ID: <1cb725390608141140g480e0c66q6f1e74f32ad1e540@mail.gmail.com> I guess I don't see ctypes and Parrot solving the same problem at all. My idea with ctypes was the opposite of choosing a new runtime. It was to help various runtimes (PyPy, Jython, IronPython, CPython 2.5, CPython 3.0, Parrot, ...) to compete on their own merits (primarily performance and interoperability) and not on the basis that they don't support some Python library whether it be "crypt" or "pyopengl". It would also be nice to move beyond the situation where everyone in the world must re-release their C modules (no matter how trivial) every time Python goes through a minor upgrade. Does Parrot these problems or exacerbate them? Also, Parrot seems like a bit of a random choice considering the fact that there are many candidates for a next-generation Python runtime: PyPy, IronPython/mono, etc. They have both come much further, much quicker, than Parrot. I'm a bit skeptical of the Parrot story after the Guile mess. It was supposed to be a multi-language dynamic runtime as well. But that's a digression. I don't think you're betting on any particular strategy, just saying that we should watch Parrot and see how it turns out. But anyhow, my original suggestion did not start with ctypes at all. From my point of view, the goal is to express Pythonic constructs in Python (whether using Ctypes, Pyrex, rctypes, or whaver) where possible rather than expressing Pythonic constructs in C (PyErr_SetString, PyDict_SetItem, etc.). Then each runtime can map the Pythonic constructs to their internal model and use their native FFI strategy (JNI, P/Invoke, libffi) to handle the C stuff. The actual details of the syntax do not matter to me (though they do matter!). I also do not care whether it uses a compiler strategy like Pyrex or a runtime model like ctypes, or a dual-mode strategy like PyPy/extcompiler. I accept the current limitations of this technique when it comes to (especially) C++, and therefore don't promote it as a panacea. Let me ask a question about our current status. If there were a requirement to do a simple wrapper library like "crypt" or "getpasswd"...is there any high level wrapping strategy that you would allow into the standard library? A ctypes-based module? The C output of a Pyrex compiler? The output of SWIG? Or is hand-written C code the only thing you want for extensions in the Python library? Even if the answer is "hand-written C code" it might be nice to have an explicit statement so that people know in advance. I propose that if the developer can make the case that a ctypes-based library is more maintainable than the C code would be, and performance is acceptable for the problem domain, that the ctypes-based library be acceptable. Would you agree that that small step is reasonable? Paul Prescod -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060814/04bca409/attachment.htm From jimjjewett at gmail.com Mon Aug 14 21:11:00 2006 From: jimjjewett at gmail.com (Jim Jewett) Date: Mon, 14 Aug 2006 15:11:00 -0400 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <1cb725390608131057y122b0c0wf81611e136659793@mail.gmail.com> References: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com> <20060812205512.197A.JCARLSON@uci.edu> <5.1.1.6.0.20060813013329.0226d240@sparrow.telecommunity.com> <5.1.1.6.0.20060813125944.056a3f40@sparrow.telecommunity.com> <1cb725390608131057y122b0c0wf81611e136659793@mail.gmail.com> Message-ID: On 8/13/06, Paul Prescod wrote: > My proposed text for the PEP is as follows: Mostly good. A few remaining comments... Should annotation objects with defined semantics have some standard way to indicate this? (By analogy, new exceptions *should* inherit from Exception; should annotation objects inherit from an Annotation class, at least as a mixin?) > "This implies that the interpretation of built-in types would be controlled > by Python's developers and documented in Python's documentation. It also implies that the interpretation of annotations made with a built-in type should be safe -- they shouldn't trigger any irreversible actions. > "In Python 3000, semantics will be attached to the following types: objects > of type string (or subtype of string) are to be used for documentation > (though they are not necessarily the exclusive source of documentation about > the type). Objects of type list (or subtype of list) are to be used for > attaching multiple independent annotations." subtypes should be available for other frameworks. This implies that something other than lists should be used if the annotations are not independent. The obvious candidates are tuples and dicts, but this should be explicit (or explicitly not defined). The definition of a type as an annotation should probably be either defined or explicitly undefined. Earlier discussions talked about things like def f (a:int, b:(float | Decimal), c:[int, str, X]) ->str) This implied that a type object would represent the type of the argument (but would it be safe to call as an adapter?), that special syntactic support might be added for type unions, and that the "independent" part of the list specification should probably be repeated at least in an example. I'm not sure if these implications *should* be true, but they're obvious enough to some people (and not to others) that the decision should be explicit. -jJ From jcarlson at uci.edu Mon Aug 14 21:15:18 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Mon, 14 Aug 2006 12:15:18 -0700 Subject: [Python-3000] threading, part 2 In-Reply-To: References: <1d85506f0608111713m15cf2e67v8b94f06c928e9125@mail.gmail.com> Message-ID: <20060814121235.19A8.JCARLSON@uci.edu> "Guido van Rossum" wrote: > > On 8/11/06, tomer filiba wrote: > > i mailed this to several people separately, but then i thought it could > > benefit the entire group: > > > > http://sebulba.wikispaces.com/recipe+thread2 > > > > it's an implementation of the proposed " thread.raise_exc", through an extension > > to the threading.Thread class. you can test it for yourself; if it proves useful, > > it should be exposed as thread.raise_exc in the stdlib (instead of the ctypes > > hack)... and of course it should be reflected in threading.Thread as welll. > > Cool. Question: what's the problem with raising exception instances? > Especially in the light of my proposal to use > > raise SomeException(42) > > in preference over (and perhaps exclusively instead of) The problem is that it is not implemented in the underlying CPython API PyThreadState_SetAsyncExc function. - Josiah From g.brandl at gmx.net Mon Aug 14 21:12:50 2006 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 14 Aug 2006 21:12:50 +0200 Subject: [Python-3000] threading, part 2 In-Reply-To: References: <1d85506f0608111713m15cf2e67v8b94f06c928e9125@mail.gmail.com> Message-ID: Guido van Rossum wrote: > On 8/11/06, tomer filiba wrote: >> i mailed this to several people separately, but then i thought it could >> benefit the entire group: >> >> http://sebulba.wikispaces.com/recipe+thread2 >> >> it's an implementation of the proposed " thread.raise_exc", through an extension >> to the threading.Thread class. you can test it for yourself; if it proves useful, >> it should be exposed as thread.raise_exc in the stdlib (instead of the ctypes >> hack)... and of course it should be reflected in threading.Thread as welll. > > Cool. Question: what's the problem with raising exception instances? > Especially in the light of my proposal to use > > raise SomeException(42) > > in preference over (and perhaps exclusively instead of) > > raise SomeException, 42 > > in Py3k. The latter IMO is a relic from the days of string exceptions > which are as numbered as they come. :-) I think this is the answer: http://mail.python.org/pipermail/python-dev/2006-August/068165.html Georg From g.brandl at gmx.net Mon Aug 14 21:13:50 2006 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 14 Aug 2006 21:13:50 +0200 Subject: [Python-3000] Python/C++ question In-Reply-To: References: <44DA6C01.2040904@acm.org> <44DF0800.4060204@acm.org> Message-ID: Guido van Rossum wrote: > On 8/13/06, Georg Brandl wrote: >> Talin wrote: >> > Guido van Rossum wrote: >> >> On 8/9/06, Talin wrote: >> >> For the majority of Python developers it's probably the other way >> >> around. It's been 15 years since I wrote C++, and unlike C, that >> >> language has changed a lot since then... >> >> >> >> It would be a complete rewrite; I prefer doing a gradual >> >> transmogrification of the current codebase into Py3k rather than >> >> starting from scratch (read Joel Spolsky on why). >> > >> > BTW, Should this be added to PEP 3099? >> >> Yes, why not. > > Although perhaps it makes more sense to add something positive to PEP 3000, e.g. > > Implementation Language > ================== > > Python 3000 will be implemented in C, and the implementation will be > derived as an evolution of the Python 2 code base. This reflects my > views (which I share with Joel Spolsky) on the dangers of complete > rewrites. Since Python 3000 as a language is a relatively mild > improvement on Python 2, we can gain a lot by not attempting to > reimplement the language from scratch. I am not against parallel > from-scratch implementation efforts, but my own efforts will be > directed at the language and implementation that I know best. I had already added something to PEP 3099, but if you like that approach better, I'll add that to PEP 3000. Georg From tim.peters at gmail.com Mon Aug 14 21:15:30 2006 From: tim.peters at gmail.com (Tim Peters) Date: Mon, 14 Aug 2006 15:15:30 -0400 Subject: [Python-3000] threading, part 2 In-Reply-To: References: <1d85506f0608111713m15cf2e67v8b94f06c928e9125@mail.gmail.com> Message-ID: <1f7befae0608141215y72e827cfo4f541b7e5fe927a8@mail.gmail.com> [tomer filiba] >> i mailed this to several people separately, but then i thought it could >> benefit the entire group: >> >> http://sebulba.wikispaces.com/recipe+thread2 >> >> it's an implementation of the proposed " thread.raise_exc", >> ... [Guido] > Cool. Question: what's the problem with raising exception instances? See http://mail.python.org/pipermail/python-dev/2006-August/068165.html Short course: in ceval.c, x = tstate->async_exc; ... PyErr_SetNone(x); That is, with the current code it's only possible to set the exception type via PyThreadState_SetAsyncExc(); the exception value is forced to None/NULL. What was the intent ;-)? Example: """ from time import sleep import ctypes, thread, sys, threading setexc = ctypes.pythonapi.PyThreadState_SetAsyncExc f_done = threading.Event() def f(): try: while 1: time.sleep(1) finally: f_done.set() tid = thread.start_new_thread(f, ()) exc = ValueError("13") setexc(ctypes.c_long(tid), ctypes.py_object(exc)) f_done.wait() """ Output: Unhandled exception in thread started by Traceback (most recent call last): File "setexc.py", line 12, in f f_done.set() File "C:\Code\python\lib\threading.py", line 351, in set self.__cond.release() SystemError: 'finally' pops bad exception Change `exc` to, e.g., exc = ValueError and then it's fine: Unhandled exception in thread started by Traceback (most recent call last): File "setexc.py", line 12, in f f_done.set() File "C:\Code\python\lib\threading.py", line 349, in set self.__cond.notifyAll() File "C:\Code\python\lib\threading.py", line 265, in notifyAll self.notify(len(self.__waiters)) File "C:\Code\python\lib\threading.py", line 258, in notify waiter.release() ValueError From jimjjewett at gmail.com Mon Aug 14 21:26:18 2006 From: jimjjewett at gmail.com (Jim Jewett) Date: Mon, 14 Aug 2006 15:26:18 -0400 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <5.1.1.6.0.20060813013329.0226d240@sparrow.telecommunity.com> References: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com> <20060812205512.197A.JCARLSON@uci.edu> <5.1.1.6.0.20060813013329.0226d240@sparrow.telecommunity.com> Message-ID: On 8/13/06, Phillip J. Eby wrote: > However, it's only a problem if you insist on writing brain-damaged > code. If you want interoperability here, you must write tell-don't-ask > code. ... is it really the case that > so many people don't know what tell-don't-ask code is or why you want > it? I guess maybe it's something that's only grasped by people who have > experience writing code intended for interoperability. > [Meanwhile, I'm not going to respond to the rest of your message, since it > contained some things that appeared to me to be a mixture of ad hominem > attack and straw man argument. I hope that was not actually your intent.] I did not intend to insult you. My point is simply that what is obvious to you -- and even what is obvious to almost anyone experienced enough to be reading this message -- won't be obvious to everyone first starting out. I want to be able to use a new programmer's first contribution. I absolutely don't want to tell them "Great, but you really should have used XYZ. We didn't really make that explicit because experienced folks tend to do it naturally." -jJ From exarkun at divmod.com Mon Aug 14 21:34:25 2006 From: exarkun at divmod.com (Jean-Paul Calderone) Date: Mon, 14 Aug 2006 15:34:25 -0400 Subject: [Python-3000] Ctypes as cross-interpreter C calling interface In-Reply-To: Message-ID: <20060814193425.1717.135462452.divmod.quotient.22922@ohm> On Mon, 14 Aug 2006 11:08:56 -0700, Guido van Rossum wrote: >On 8/14/06, Jean-Paul Calderone wrote: >>On Mon, 14 Aug 2006 09:09:49 -0700, Guido van Rossum >>wrote: >I also consider .NET's CLR a success, based on the testimony of Jim >Hugunin (who must be Microsoft's most reluctant employee :). > >And I see the JVM as a successful case too -- Jython can link to >anything written in Java or compiled to JVM bytecode, and so can other >languages that use JVM introspection the same way as Jython (I hear >there's Ruby analogue). These successes are necessarily limited in scope. Jython can use any Java library, and that's great, as far as it goes. Clearly, though, it isn't a complete solution. Relying on Parrot to have a rich library of wrapper modules seems ill advised. If it /already/ had a rich library, then maybe it would seem more reasonable. > >The major difference between all these examples and ctypes is that >ctypes has no way of introspecting the wrapped library; you have to >repeat everything you know about the API in your calls to ctypes (and >as was just shown in another thread about 64-bit issues, that's not >always easy). The codegenerator package which is closely related to ctypes is capable of this as well. PyPy has a complete ctypes-based OpenSSL wrapper which is automatically generated. >That's not exactly the point I am making. I find Parrot's approach, >assuming the project won't fail due to internal friction, much more >long-term viable than ctypes. The big difference being (I hope) >introspective generation of APIs rather than having to repeat the >linkage information in each client language. Given the existence of codegenerator, do you still find Parrot's approach more viable? It seems to me that it easily levels the playing field, and makes ctypes still more attractive than Parrot, since it side-steps the not insignificant internal political issues with the Parrot team. >This seems a mostly theoretical viewpoint to me. Can you point me to >an example of a Python-like language that is successful in reusing a >Lisp runtime? (And I don't consider Lisp or Scheme Python-like in this >context. ;-) PyPy has a Common Lisp backend. It's not the primary target, but it's not inconceivable that it could someday provide an ffi from a Common Lisp runtime to Python programs. There has also been work done on an IL backend for PyPy. This could be used to make any CLR library available to Python programs. Of course, with those two examples in hand, we see a fundamental drawback to the Parrot-style solution (of which these are both essentially examples). What if I want to use the CL FFI at the same time as a library exposed via .NET? I'm out of luck. Had the libraries I wanted both been wrapped with ctypes, I could have used them both from either runtime. In general, what are alternate runtimes like PyPy to do if Parrot becomes the de facto standard for extension modules? Link against Parrot? Suffer without those modules until someone does a custom binding for that runtime? Jean-Paul From collinw at gmail.com Mon Aug 14 21:41:11 2006 From: collinw at gmail.com (Collin Winter) Date: Mon, 14 Aug 2006 15:41:11 -0400 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <1cb725390608131057y122b0c0wf81611e136659793@mail.gmail.com> References: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com> <20060812205512.197A.JCARLSON@uci.edu> <5.1.1.6.0.20060813013329.0226d240@sparrow.telecommunity.com> <5.1.1.6.0.20060813125944.056a3f40@sparrow.telecommunity.com> <1cb725390608131057y122b0c0wf81611e136659793@mail.gmail.com> Message-ID: <43aa6ff70608141241w43b7b694k77e63ba6766a1f55@mail.gmail.com> On 8/13/06, Paul Prescod wrote: > "In order for processors of function annotations to work interoperably, they > must use a common interpretation of objects used as annotations on a > particular function. For example, one might interpret string annotations as > docstrings. Another might interpet them as path segments for a web > framework. For this reason, function annotation processors SHOULD avoid > assigning processor-specific meanings to types defined outside of the > processor's framework. For example, a Django processor could process > annotations of a type defined in a Zope package, but Zope's creators should > be considered the authorities on the type's meaning for the same reasons > that they would be considered authorities on the semantics of classes or > methods in their packages." The way I read this, it forces (more or less) each annotation-consuming library to invent new ways to spell Python's built-in types. I read all this as saying that annotation processors should avoid using Python's lists, tuples and dicts in annotations (since whatever semantics the Python developers come up with will inevitably be incompatible with what some library writer needs/wants). Each processor library will then define my_processor.List, my_processor.Tuple, my_processor.Dict, etc as alternate spellings for [x, y, z], (x, y, z), {x: y} and so on. > "This implies that the interpretation of built-in types would be controlled > by Python's developers and documented in Python's documentation. The inherent difficulty in defining a standard interpretation for these types is what motivated me to leave this up to the authors of annotation consumers. I don't mean "it was hard so I gave up"; I can easily come up with a standard, but it will probably be of limited or no utility to some section of the possible userbase. If you have an idea, though, feel free to propose something concrete. > "In Python 3000, semantics will be attached to the following types: objects > of type string (or subtype of string) are to be used for documentation > (though they are not necessarily the exclusive source of documentation about > the type). Objects of type list (or subtype of list) are to be used for > attaching multiple independent annotations." Does this mean all lists "are to be used for attaching multiple independent annotations", or just top-level lists (ie, "def foo(a: [x, y])" indicates two independent annotations)? What does "def foo(a: [x, [y, z]])" indicate? Collin Winter From paul at prescod.net Mon Aug 14 22:00:59 2006 From: paul at prescod.net (Paul Prescod) Date: Mon, 14 Aug 2006 13:00:59 -0700 Subject: [Python-3000] Ctypes as cross-interpreter C calling interface In-Reply-To: References: <20060814172000.1717.863905740.divmod.quotient.22821@ohm> Message-ID: <1cb725390608141300o7b6e6503x23e6c7b9cf31b92f@mail.gmail.com> On 8/14/06, Guido van Rossum wrote: > > > The major difference between all these examples and ctypes is that > ctypes has no way of introspecting the wrapped library; you have to > repeat everything you know about the API in your calls to ctypes (and > as was just shown in another thread about 64-bit issues, that's not > always easy). An excellent point and very clarifying (though I still don't totally understand the relationship with Parrot). What do you think about techniques like these: * http://starship.python.net/crew/theller/ctypes/old/codegen.html * http://lists.copyleft.no/pipermail/pyrex/2006-June/001885.html I agree that this is an issue. But then on the other hand, given N methods and objects that you need wrapped, you will in general need to make N individual mapping statements no matter what technology you use. The question is how many lines of mapping are you doing? Ctypes currently requires you to re-declare what you know about the C library. Hand-written C libraries require you to do go through other hoops. For example, looking at Pygame ctypes, consider the following method: def __copy__(self): return Rect(self.x, self.y, self.w, self.h) That's the ctypes version. Here's the C version: /* for copy module */ static PyObject* rect_copy(PyObject* oself, PyObject* args) { PyRectObject* self = (PyRectObject*)oself; return PyRect_New4(self->r.x, self->r.y, self->r.w, self->r.h); } static struct PyMethodDef rect_methods[] = { ... {"__copy__", (PyCFunction)rect_copy, 0, NULL},... }; So there is some repetition there as well (casts, function name duplications, etc.). Paul Prescod -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060814/a281094f/attachment.htm From paul at prescod.net Mon Aug 14 22:20:54 2006 From: paul at prescod.net (Paul Prescod) Date: Mon, 14 Aug 2006 13:20:54 -0700 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <43aa6ff70608141241w43b7b694k77e63ba6766a1f55@mail.gmail.com> References: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com> <20060812205512.197A.JCARLSON@uci.edu> <5.1.1.6.0.20060813013329.0226d240@sparrow.telecommunity.com> <5.1.1.6.0.20060813125944.056a3f40@sparrow.telecommunity.com> <1cb725390608131057y122b0c0wf81611e136659793@mail.gmail.com> <43aa6ff70608141241w43b7b694k77e63ba6766a1f55@mail.gmail.com> Message-ID: <1cb725390608141320n11683af8q27a75309011a512c@mail.gmail.com> On 8/14/06, Collin Winter wrote: > > The way I read this, it forces (more or less) each > annotation-consuming library to invent new ways to spell Python's > built-in types. I think that this is related to your other question. What if an annotation consuming library wanted to use Python's built-in types nested within their own top-level structures. def foo(a: xxx([x, y, z])): ... I would say that the innermost list has its semantics (as metadata) defined by "xxx", not raw Python. That's the only reasonable thing. > "This implies that the interpretation of built-in types would be > controlled > > by Python's developers and documented in Python's documentation. > > The inherent difficulty in defining a standard interpretation for > these types is what motivated me to leave this up to the authors of > annotation consumers. There are three issues: first, we need to RESERVE the types for standardization by Guido and crew. Second, we can decide to do the standardization at any point. Third, we absolutely need a standard for multiple independent annotations on a parameter. Using lists is a no-brainer. So let's do that. If you have an idea, though, feel free to propose something concrete. Yes, my proposal is here: > "In Python 3000, semantics will be attached to the following types: > objects > > of type string (or subtype of string) are to be used for documentation > > (though they are not necessarily the exclusive source of documentation > about > > the type). Objects of type list (or subtype of list) are to be used for > > attaching multiple independent annotations." > > Does this mean all lists "are to be used for attaching multiple > independent annotations", or just top-level lists (ie, "def foo(a: [x, > y])" indicates two independent annotations)? What does "def foo(a: [x, > [y, z]])" indicate? I meant only top-level lists. I hadn't thought through nesting. def foo(a: [x, y, [a, b, c]]): ... This should probably be just handled recursively or disallowed. I don't feel strongly either way. Paul Prescod -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060814/ff1b6d61/attachment.html From jimjjewett at gmail.com Mon Aug 14 22:24:15 2006 From: jimjjewett at gmail.com (Jim Jewett) Date: Mon, 14 Aug 2006 16:24:15 -0400 Subject: [Python-3000] PEP3102 Keyword-Only Arguments; Signature Message-ID: On 8/14/06, Steven Bethard wrote: > On 8/14/06, Guido van Rossum wrote: > > I believe the PEP doesn't address the opposite use case: positional > > arguments that should *not* be specified as keyword arguments. ... > It would be really nice in the example above to mark ``self`` in > ``__call__`` as a positional only argument. Would this have to be in the standard function prologue, or would it be acceptable to modify a function's Signature object? As I see it, each argument can be any combination of the following: positional keyword named defaulted annotated I can see some value in supporting all 32 possibilities, but doing it directly as part of the def syntax might get awkward. Most arguments are both positional and keyword. The bare * will support keyword-only, and you're asking for positional-only. (An argument which is neither positional nor keyword doesn't make sense.) Today (except in extension code), an argument that isn't named only appears courtesy of *args or **kwargs. Today, named + keyword <==> defaulted Today, arguments are not annotated. Would it be acceptable if functions contained a (possibly implicit) Signature object, and the way to get the odd combinations were through modifying that? For example: def unnamedargs(func): for arg in func.Signature: arg.name=None return func ... @unnamedargs def write(self, s): -jJ From steven.bethard at gmail.com Mon Aug 14 22:34:54 2006 From: steven.bethard at gmail.com (Steven Bethard) Date: Mon, 14 Aug 2006 14:34:54 -0600 Subject: [Python-3000] PEP3102 Keyword-Only Arguments In-Reply-To: References: Message-ID: On 8/14/06, Guido van Rossum wrote: > On 8/14/06, Steven Bethard wrote: > > On 8/14/06, Guido van Rossum wrote: > > > I believe the PEP doesn't address the opposite use case: positional > > > arguments that should *not* be specified as keyword arguments. For > > > example, I might want to write > > > > > > def foo(a, b): ... > > > > > > but I don't want callers to be able to call it as foo(b=1, a=2) or > > > even foo(a=2, b=1). > > > > Another use case is when you want to accept the arguments of another > > callable, but you have your own positional arguments:: > > > > >>> class Wrapper(object): > > ... def __init__(self, func): > > ... self.func = func > > ... def __call__(self, *args, **kwargs): > > ... print 'calling wrapped function' > > ... return self.func(*args, **kwargs) > > ... > > >>> @Wrapper > > ... def func(self, other): > > ... return self, other > > ... > > >>> func(other=1, self=2) > > Traceback (most recent call last): > > File "", line 1, in ? > > TypeError: __call__() got multiple values for keyword argument 'self' > > > > It would be really nice in the example above to mark ``self`` in > > ``__call__`` as a positional only argument. > > But this is a rather unusual use case isn't it? It's due to the bound > methods machinery. Do you have other use cases? Well, for example, unitest.TestCase.failUnlessRaises works this way. Here's the method signature:: def failUnlessRaises(self, excClass, callableObj, *args, **kwargs): Which means that if you write:: self.failUnlessRaises(TypeError, my_func, callableObj=foo) you'll get an error since there's a name clash between the callableObj taken by failUnlessRaises and the one taken by the my_func object. OTOH, I haven't run into this error because I don't use camelCase names. Perhaps the right answer is to always use camelCase on any arguments that you don't want to worry about conflicts, and then any PEP 8 compliant code will never have problems. ;-) > > > Perhaps we can use ** without following identifier to signal this? > > > It's not entirely analogous to * without following identifier, but at > > > least somewhat similar. > > > > I'm certainly not opposed to going this way, but I don't think it > > would solve the problem above since you still need to take keyword > > arguments. > > Can you elaborate? Well, taking the failUnlessRaises signature above, if you wanted to specify that ``self``, ``excClass`` and ``callableObj`` were positional only arguments, I believe you'd have to write:: def failUnlessRaises(self, excClass, callableObj, *args, **): I believe that means that you can't use failUnlessRaises to call a method that expects keyword arguments, e.g.:: self.assertRaises(OptionError, parser.add_option, type='foo') STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From paul at prescod.net Mon Aug 14 22:51:10 2006 From: paul at prescod.net (Paul Prescod) Date: Mon, 14 Aug 2006 13:51:10 -0700 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: References: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com> <20060812205512.197A.JCARLSON@uci.edu> <5.1.1.6.0.20060813013329.0226d240@sparrow.telecommunity.com> <5.1.1.6.0.20060813125944.056a3f40@sparrow.telecommunity.com> <1cb725390608131057y122b0c0wf81611e136659793@mail.gmail.com> Message-ID: <1cb725390608141351n78099df6s6bf4359758d18b10@mail.gmail.com> On 8/14/06, Jim Jewett wrote: > > Should annotation objects with defined semantics have some standard > way to indicate this? (By analogy, new exceptions *should* inherit > from Exception; should annotation objects inherit from an Annotation > class, at least as a mixin?) All annotation objects have defined semantics (somewhere) or else they are useless. I don't see any benefit in making them inherit from anything in particular. Python has a very specific reason for requiring that in the exception case. I'd rather not complicate the design without a good reason. > "This implies that the interpretation of built-in types would be > controlled > > by Python's developers and documented in Python's documentation. > > It also implies that the interpretation of annotations made with a > built-in type should be safe -- they shouldn't trigger any > irreversible actions. I disagree and don't think you can come up with a clear definition of "irreversible" in any case. Is spitting out text to a stream "irreversible"? I'd rather not complicate stuff. > "In Python 3000, semantics will be attached to the following types: > objects > > of type string (or subtype of string) are to be used for documentation > > (though they are not necessarily the exclusive source of documentation > about > > the type). Objects of type list (or subtype of list) are to be used for > > attaching multiple independent annotations." > > subtypes should be available for other frameworks. I'd be happy to remove the whole subtype clause. I don't care much either way. But anyhow I (now) disagree that there is a problem as stated. If a framework wants to use a subtype of list they just need to wrap it in a top-level wrapper that makes the association. def foo(a: xxx(mylist_subtype(a, b, c))): This is clear thanks to Collin Winters' recent post. This implies that something other than lists should be used if the > annotations are not independent. The obvious candidates are tuples > and dicts, but this should be explicit (or explicitly not defined). The "dependence" between notations is totally up to the framework. To repeat the example: def foo(a: xxx(mylist_subtype(a, b, c))): xxx might say that a is passed as a ".next" attribute to b which is passed as a ".next" attribute to "c". Or xxx might say that "a" is passed to "b"'s constructor which is passed to "c"'s constructor. Remeber that "xxx" is executable so it could do whatever it wants. It should just document what it did so that various libraries know how to navigate the object structure it creates. The definition of a type as an annotation should probably be either > defined or explicitly undefined. Earlier discussions talked about > things like > > def f (a:int, b:(float | Decimal), c:[int, str, X]) ->str) I think that's a separate (large!) PEP. This PEP should disallow frameworks from inventing their own meaning for this syntax (requiring them to at least wrap). Then Guido and crew can dig into this issue on their own schedule. Paul Prescod -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060814/1629696c/attachment.html From collinw at gmail.com Mon Aug 14 23:03:56 2006 From: collinw at gmail.com (Collin Winter) Date: Mon, 14 Aug 2006 16:03:56 -0500 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <1cb725390608141320n11683af8q27a75309011a512c@mail.gmail.com> References: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com> <20060812205512.197A.JCARLSON@uci.edu> <5.1.1.6.0.20060813013329.0226d240@sparrow.telecommunity.com> <5.1.1.6.0.20060813125944.056a3f40@sparrow.telecommunity.com> <1cb725390608131057y122b0c0wf81611e136659793@mail.gmail.com> <43aa6ff70608141241w43b7b694k77e63ba6766a1f55@mail.gmail.com> <1cb725390608141320n11683af8q27a75309011a512c@mail.gmail.com> Message-ID: <43aa6ff70608141403i36dfeefcn2cb1aa7f803b5579@mail.gmail.com> On 8/14/06, Paul Prescod wrote: > There are three issues: first, we need to RESERVE the types for > standardization by Guido and crew. You're just pushing the decision off to someone else. Regardless of who makes it, decisions involving the built-in types are going to make some group unhappy. This list saw several discussions related to standard interpretations for the built-in types back in May and June; here's a selection for you to catch up on: http://mail.python.org/pipermail/python-3000/2006-May/002134.html http://mail.python.org/pipermail/python-3000/2006-May/002216.html http://mail.python.org/pipermail/python-3000/2006-June/002438.html One particularly divisive issue is whether tuples should be treated as fixed- or arbitrary-length containers. Concretely, does "tuple(Number)" match only 1-tuples with a single Number element, or does it match all tuples that have only Number elements? Regardless of which you pick, somebody's going to be pissed. > Second, we can decide to do the standardization at any point. Um, "at any point"? You mean it's conceivable that this standardisation could come *after* Python ships with function annotations? Collin Winter From paul at prescod.net Mon Aug 14 23:18:07 2006 From: paul at prescod.net (Paul Prescod) Date: Mon, 14 Aug 2006 14:18:07 -0700 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <43aa6ff70608141403i36dfeefcn2cb1aa7f803b5579@mail.gmail.com> References: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com> <20060812205512.197A.JCARLSON@uci.edu> <5.1.1.6.0.20060813013329.0226d240@sparrow.telecommunity.com> <5.1.1.6.0.20060813125944.056a3f40@sparrow.telecommunity.com> <1cb725390608131057y122b0c0wf81611e136659793@mail.gmail.com> <43aa6ff70608141241w43b7b694k77e63ba6766a1f55@mail.gmail.com> <1cb725390608141320n11683af8q27a75309011a512c@mail.gmail.com> <43aa6ff70608141403i36dfeefcn2cb1aa7f803b5579@mail.gmail.com> Message-ID: <1cb725390608141418y4c111070l73554a2a959e5d72@mail.gmail.com> On 8/14/06, Collin Winter wrote: > > On 8/14/06, Paul Prescod wrote: > > There are three issues: first, we need to RESERVE the types for > > standardization by Guido and crew. > > You're just pushing the decision off to someone else. Regardless of > who makes it, decisions involving the built-in types are going to make > some group unhappy. Yes, I know. I spent about a month of my life going through the same process back around 2003. > Second, we can decide to do the standardization at any point. > > Um, "at any point"? You mean it's conceivable that this > standardisation could come *after* Python ships with function > annotations? Sure. Why not? All I'm saying is that the "function annotations" PEP should not depend on the "function annotations for static type declarations" PEP. That was implicit in your original pre-PEP! If the "static type declarations PEP" misses the Python 3000 deadline then the function annotations feature is still valuable. The former could be used as a testbed for the latter: def myfunc( NumTuples: [typepackage1(tuple(Number)), typepackage2("tuple(Number+))"]):... Paul Prescod -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060814/2ac569eb/attachment.htm From collinw at gmail.com Mon Aug 14 23:23:48 2006 From: collinw at gmail.com (Collin Winter) Date: Mon, 14 Aug 2006 16:23:48 -0500 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <1cb725390608141418y4c111070l73554a2a959e5d72@mail.gmail.com> References: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com> <20060812205512.197A.JCARLSON@uci.edu> <5.1.1.6.0.20060813013329.0226d240@sparrow.telecommunity.com> <5.1.1.6.0.20060813125944.056a3f40@sparrow.telecommunity.com> <1cb725390608131057y122b0c0wf81611e136659793@mail.gmail.com> <43aa6ff70608141241w43b7b694k77e63ba6766a1f55@mail.gmail.com> <1cb725390608141320n11683af8q27a75309011a512c@mail.gmail.com> <43aa6ff70608141403i36dfeefcn2cb1aa7f803b5579@mail.gmail.com> <1cb725390608141418y4c111070l73554a2a959e5d72@mail.gmail.com> Message-ID: <43aa6ff70608141423w64afca33uc284417cec4a62fe@mail.gmail.com> On 8/14/06, Paul Prescod wrote: > On 8/14/06, Collin Winter wrote: > > On 8/14/06, Paul Prescod wrote: > > > Second, we can decide to do the standardization at any point. > > > > Um, "at any point"? You mean it's conceivable that this > > standardisation could come *after* Python ships with function > > annotations? > > Sure. Why not? Because not having standardised meanings at the same time as the feature becomes available says to developers, "don't use the built-in types in your annotations because we might give them a meaning later...or maybe we won't...but in the meantime, you're going to need to invent new spellings for lists, tuples, dicts, sets, strings, just in case". As someone writing an annotation consumer, that comes across as an incredibly arbitrary decision that forces me to do a lot of extra work. Collin Winter From collinw at gmail.com Mon Aug 14 23:59:38 2006 From: collinw at gmail.com (Collin Winter) Date: Mon, 14 Aug 2006 16:59:38 -0500 Subject: [Python-3000] Conventions for annotation consumers (was: Re: Draft pre-PEP: function annotations) Message-ID: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com> On 8/14/06, Paul Prescod wrote: > Third, we absolutely need a standard for > multiple independent annotations on a parameter. Using lists is a > no-brainer. So let's do that. The problem with using lists is that its impossible for non-decorator annotation consumers to know which element "belongs" to them. Way back in http://mail.python.org/pipermail/python-3000/2006-August/002865.html, Nick Coghlan said: > However, what we're really talking about here is a scenario where you're > defining your *own* custom annotation processor: you want the first part of > the tuple in the expression handled by the type processing library, and the > second part handled by the docstring processing library. > > Which says to me that the right solution is for the annotation to be split up > into its constituent parts before the libraries ever see it. > > This could be done as Collin suggests by tampering with > __signature__.annotations before calling each decorator, but I think it is > cleaner to do it by defining a particular signature for decorators that are > intended to process annotations. > > Specifically, such decorators should accept a separate dictionary to use in > preference to the annotations on the function itself: > > process_function_annotations(f, annotations=None): > # Process the function f > # If annotations is not None, use it > # otherwise, get the annotations from f.__signature__ I've come to like this idea more and more. Here's my stab at a dict-based convention for specifying annotations for decorator-style consumers: There are several annotation consumers, docstring, typecheck and constrain_values. Respectively, these treat annotations as documentation; as restrictions on the type of an argument; as restrictions on the values of an argument. Each of these is defined something like def consumer(annotated_function, annotations=sentinel): ... If the consumer isn't given an `annotations` parameter, it is free to assume it is the only consumer for the annotations on that function and is free to treat the annotation expressions however it sees fit. However, if it is given an `annotations` argument, it should observe those annotations and only those annotations. The more complete example: @multiple_annotations(docstring, typecheck, constrain_values) def foo(a: {'docstring': "Frobnication count", 'typecheck': Number, 'constrain_values': range(3, 9)}, b: {'typecheck': Number, # This can be only 4, 8 or 12 'constrain_values': [4, 8, 12]}) -> {'typecheck': Number} Here, multiple_annotations assumes that the annotation dicts are keyed on consumer.__name__; the test "if consumer.__name__ in per_parameter_annotations" should do nicely for figuring out whether a given consumer should be provided an `annotations` argument. (It is up to multiple_annotations() to decide whether "consumer.__name__ in per_parameter_annotations == False" should raise an exception.) Collin Winter From jimjjewett at gmail.com Tue Aug 15 00:03:17 2006 From: jimjjewett at gmail.com (Jim Jewett) Date: Mon, 14 Aug 2006 18:03:17 -0400 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <1cb725390608141351n78099df6s6bf4359758d18b10@mail.gmail.com> References: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com> <20060812205512.197A.JCARLSON@uci.edu> <5.1.1.6.0.20060813013329.0226d240@sparrow.telecommunity.com> <5.1.1.6.0.20060813125944.056a3f40@sparrow.telecommunity.com> <1cb725390608131057y122b0c0wf81611e136659793@mail.gmail.com> <1cb725390608141351n78099df6s6bf4359758d18b10@mail.gmail.com> Message-ID: On 8/14/06, Paul Prescod wrote: > > > "This implies that the interpretation of built-in types would be > controlled > > > by Python's developers and documented in Python's documentation. > > It also implies that the interpretation of annotations made with a > > built-in type should be safe -- they shouldn't trigger any > > irreversible actions. > I disagree and don't think you can come up with a clear definition of > "irreversible" in any case. Is spitting out text to a stream "irreversible"? > I'd rather not complicate stuff. That part is admittedly a guideline for development of python, rather than with python. The question is what happens with something like def f(a:int): ... If the thing starts compiling (like Pyrex) to code which assumes an int and doesn't verify, that would be a disaster waiting to happen -- unless int were explicitly reserved to the python core more strongly than the proposed wording implies. > I'd be happy to remove the whole subtype clause. I don't care much either > way. But anyhow I (now) disagree that there is a problem as stated. If a > framework wants to use a subtype of list they just need to wrap it in a > top-level wrapper that makes the association. > def foo(a: xxx(mylist_subtype(a, b, c))): mylist_subtype is as unique as an object (but not as a name); if xxx is sufficient disambiguation, then so is mylist_subtype on its own. > > This implies that something other than lists should be used if the > > annotations are not independent. The obvious candidates are tuples > > and dicts, but this should be explicit (or explicitly not defined). > The "dependence" between notations is totally up to the framework. To repeat > the example: For builtin lists, they meaning should be reserved to python core. What does the following mean? def f(a:[int, str]) I assume it doesn't mean a list of int and str (because lists are used for independent annotations). I assume it also doesn't mean "int _or_ str" because the annotations are independent. If they two are supposed to be used together, then they should be chained with something other than list. > > The definition of a type as an annotation should probably be either > > defined or explicitly undefined. Earlier discussions talked about > > things like > > def f (a:int, b:(float | Decimal), c:[int, str, X]) ->str) > I think that's a separate (large!) PEP. Agreed. But I think the PEP should explicitly reserve the (annotational) meaning of (1) builtin and standard library types, such as int and Decimal (2) The results of combining types with operators (such as |, +=, etc) (3) lists, tuples, and dictionaries of the above It doesn't have to say what they mean, but it has to warn that a standard meaning is contemplated, and that 3rd parties should consider them reserved. -jJ From jimjjewett at gmail.com Tue Aug 15 00:22:45 2006 From: jimjjewett at gmail.com (Jim Jewett) Date: Mon, 14 Aug 2006 18:22:45 -0400 Subject: [Python-3000] Conventions for annotation consumers (was: Re: Draft pre-PEP: function annotations) In-Reply-To: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com> References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com> Message-ID: On 8/14/06, Collin Winter wrote: > On 8/14/06, Paul Prescod wrote: > The problem with using lists is that its impossible for non-decorator > annotation consumers to know which element "belongs" to them. The ones whose type they own -- which is why I see at least some parallel to exceptions, and its inheritance based semantics. def f(a:[mytype("asdfljasdf"), zope.mypackage.something(b,d,e), "a string", mytype([47]), 15): """Example of long compound annotations Maybe annotations this size should just be restricted to Signature modification instead of allowing them in the actual declaration? At least by style guides? """ By the defined meaning of list, these are 5 independent annotations. Whoever defined mytype controls the meaning of the mytype annotations; anyone not familiar with that package should ignore them (and hope there were no side effects in the expressions that generated them). zope.mypackage controls that annotation; anyone not familiar with that product should ignore it (and hope there were no side effects ...) "a string" and 15 are builtin types -- so their semantics are defined by core python, which says that they are documentation only -- stripping them off or changing them wouldn't break a properly written program. > Here, multiple_annotations assumes that the annotation dicts are keyed > on consumer.__name__; Too many consumers will call themselves "wrapper" or some such. You should key on the actual type object -- in which case you probably want isinstance to support subtypes. -jJ From paul at prescod.net Tue Aug 15 00:48:33 2006 From: paul at prescod.net (Paul Prescod) Date: Mon, 14 Aug 2006 15:48:33 -0700 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <43aa6ff70608141423w64afca33uc284417cec4a62fe@mail.gmail.com> References: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com> <20060812205512.197A.JCARLSON@uci.edu> <5.1.1.6.0.20060813013329.0226d240@sparrow.telecommunity.com> <5.1.1.6.0.20060813125944.056a3f40@sparrow.telecommunity.com> <1cb725390608131057y122b0c0wf81611e136659793@mail.gmail.com> <43aa6ff70608141241w43b7b694k77e63ba6766a1f55@mail.gmail.com> <1cb725390608141320n11683af8q27a75309011a512c@mail.gmail.com> <43aa6ff70608141403i36dfeefcn2cb1aa7f803b5579@mail.gmail.com> <1cb725390608141418y4c111070l73554a2a959e5d72@mail.gmail.com> <43aa6ff70608141423w64afca33uc284417cec4a62fe@mail.gmail.com> Message-ID: <1cb725390608141548l2cf6f484rd6cf909cdb3637e7@mail.gmail.com> On 8/14/06, Collin Winter wrote: > > Because not having standardised meanings at the same time as the > feature becomes available says to developers, "don't use the built-in > types in your annotations because we might give them a meaning > later...or maybe we won't...but in the meantime, you're going to need > to invent new spellings for lists, tuples, dicts, sets, strings, just > in case". As someone writing an annotation consumer, that comes across > as an incredibly arbitrary decision that forces me to do a lot of > extra work. No, you aren't going to have to invent new spellings. As per my previous email, this should be allowed: def myfunc( NumTuples: [typepackage1(tuple(int)), typepackage2("tuple(Number+))")]):... All you need to do is declare the fact that you are using the built-in types in a non-standard way by wrapping them in your own annotation constructor. Paul Prescod -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060814/206991d8/attachment.html From collinw at gmail.com Tue Aug 15 00:51:40 2006 From: collinw at gmail.com (Collin Winter) Date: Mon, 14 Aug 2006 17:51:40 -0500 Subject: [Python-3000] Conventions for annotation consumers (was: Re: Draft pre-PEP: function annotations) In-Reply-To: References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com> Message-ID: <43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com> On 8/14/06, Jim Jewett wrote: > On 8/14/06, Collin Winter wrote: > > The problem with using lists is that its impossible for non-decorator > > annotation consumers to know which element "belongs" to them. > > The ones whose type they own -- which is why I see at least some > parallel to exceptions, and its inheritance based semantics. > > def f(a:[mytype("asdfljasdf"), > zope.mypackage.something(b,d,e), > "a string", > mytype([47]), > 15): > > Whoever defined mytype controls the meaning of the mytype annotations; > anyone not familiar with that package should ignore them (and hope > there were no side effects in the expressions that generated them). > > zope.mypackage controls that annotation; anyone not familiar with that > product should ignore it (and hope there were no side effects ...) As hideous as I think this is from an aesthetics/visual noise standpoint, it's probably the only reliable way to let both decorator- and non-decorator-based consumers work. What would the rule be about top-level types? Would you have to use a list, or could a set or dict be used? Collin Winter From paul at prescod.net Tue Aug 15 01:18:14 2006 From: paul at prescod.net (Paul Prescod) Date: Mon, 14 Aug 2006 16:18:14 -0700 Subject: [Python-3000] Conventions for annotation consumers (was: Re: Draft pre-PEP: function annotations) In-Reply-To: <43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com> References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com> <43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com> Message-ID: <1cb725390608141618r11e61720y7ad3c1ab410dccc5@mail.gmail.com> On 8/14/06, Collin Winter wrote: > > What would the rule be about top-level types? Would you have to use a > list, or could a set or dict be used? I argue to restrict to a list for the following reasons: 1. Better to just pick something for visual consistency (someone said they liked tuples but I find all of the rounded parens confusing) 2. May want to use other types for other meanings in the future. 3. What do you do with the keys of the dictionary? Is this back to connecting decorators to annotations by name or something? The string namespace is not very managable. Paul Prescod -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060814/4deed1fd/attachment.htm From guido at python.org Tue Aug 15 02:26:42 2006 From: guido at python.org (Guido van Rossum) Date: Mon, 14 Aug 2006 17:26:42 -0700 Subject: [Python-3000] PEP3102 Keyword-Only Arguments; Signature In-Reply-To: References: Message-ID: On 8/14/06, Jim Jewett wrote: > On 8/14/06, Steven Bethard wrote: > > On 8/14/06, Guido van Rossum wrote: > > > I believe the PEP doesn't address the opposite use case: positional > > > arguments that should *not* be specified as keyword arguments. > > ... > > It would be really nice in the example above to mark ``self`` in > > ``__call__`` as a positional only argument. > > Would this have to be in the standard function prologue, or would it > be acceptable to modify a function's Signature object? > > As I see it, each argument can be any combination of the following: > > positional > keyword > named > defaulted > annotated > > I can see some value in supporting all 32 possibilities, but doing it > directly as part of the def syntax might get awkward. Perhaps. Though you're making it seem worse than it is by adding annotated (which should be considered completely orthogonal to the rest, and may not combine with everything else). > Most arguments are both positional and keyword. The bare * will > support keyword-only, and you're asking for positional-only. (An > argument which is neither positional nor keyword doesn't make sense.) > > Today (except in extension code), an argument that isn't named only > appears courtesy of *args or **kwargs. > > Today, named + keyword <==> defaulted I'm not sure I follow. You seem to be perpetuating the eternal misunderstanding that from the caller's POV this is not a keyword argument: def foo(a): pass In fact, calling foo(a=1) is totally legal. > Today, arguments are not annotated. > > Would it be acceptable if functions contained a (possibly implicit) > Signature object, and the way to get the odd combinations were through > modifying that? > > For example: > > def unnamedargs(func): > for arg in func.Signature: > arg.name=None > return func > ... > @unnamedargs > def write(self, s): This seems a last-resort approach; I'd rather do something less drastic. Unfortunately the more I think about it the less I like using '**' without a following name for this feature. PS whenever you respond to something it becomes a new thread in Gmail. Is your mail app perhaps not properly inserting In-reply-to headers? Or do you forge a reply by creating a new message with the same subject and "Re:" prepended? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From pje at telecommunity.com Tue Aug 15 03:08:25 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 14 Aug 2006 21:08:25 -0400 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: Message-ID: <5.1.1.6.0.20060814174448.025d9018@sparrow.telecommunity.com> At 1:51 PM 8/14/2006 -0700, "Paul Prescod" wrote: >On 8/14/06, Jim Jewett wrote: > > The definition of a type as an annotation should probably be either > > defined or explicitly undefined. Earlier discussions talked about > > things like > > > > def f (a:int, b:(float | Decimal), c:[int, str, X]) ->str) > > >I think that's a separate (large!) PEP. This PEP should disallow frameworks >from inventing their own meaning for this syntax (requiring them to at least >wrap). Then Guido and crew can dig into this issue on their own schedule. I see we haven't made nearly as much progress on the concept of "no predefined semantics" as I thought we had. :( i.e., -1 on constraining what types mean. From ironfroggy at gmail.com Tue Aug 15 03:10:13 2006 From: ironfroggy at gmail.com (Calvin Spealman) Date: Mon, 14 Aug 2006 21:10:13 -0400 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <43aa6ff70608141241w43b7b694k77e63ba6766a1f55@mail.gmail.com> References: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com> <20060812205512.197A.JCARLSON@uci.edu> <5.1.1.6.0.20060813013329.0226d240@sparrow.telecommunity.com> <5.1.1.6.0.20060813125944.056a3f40@sparrow.telecommunity.com> <1cb725390608131057y122b0c0wf81611e136659793@mail.gmail.com> <43aa6ff70608141241w43b7b694k77e63ba6766a1f55@mail.gmail.com> Message-ID: <76fd5acf0608141810gf062eabh76b0ca92d61372b1@mail.gmail.com> On 8/14/06, Collin Winter wrote: > On 8/13/06, Paul Prescod wrote: > > "In order for processors of function annotations to work interoperably, they > > must use a common interpretation of objects used as annotations on a > > particular function. For example, one might interpret string annotations as > > docstrings. Another might interpet them as path segments for a web > > framework. For this reason, function annotation processors SHOULD avoid > > assigning processor-specific meanings to types defined outside of the > > processor's framework. For example, a Django processor could process > > annotations of a type defined in a Zope package, but Zope's creators should > > be considered the authorities on the type's meaning for the same reasons > > that they would be considered authorities on the semantics of classes or > > methods in their packages." > > The way I read this, it forces (more or less) each > annotation-consuming library to invent new ways to spell Python's > built-in types. > > I read all this as saying that annotation processors should avoid > using Python's lists, tuples and dicts in annotations (since whatever > semantics the Python developers come up with will inevitably be > incompatible with what some library writer needs/wants). Each > processor library will then define my_processor.List, > my_processor.Tuple, my_processor.Dict, etc as alternate spellings for > [x, y, z], (x, y, z), {x: y} and so on. I'm sorry but I don't see the logic here. Why will all the annotation libraries need to invent stand-ins for the built-in types? They just shouldn't define any meaning to standard types as annotations, leaving the interpretation of int in 'def foo(a: int)' up to the python developers. The only thing I can figure is that you see this need in order for other annotation libraries to handle associating types with arguments, but there is evidence that this shouldn't be done directly with built-in type objects (unless defined by python itself). Using the types directly doesn't cover important use-cases like adapting, even tho we can expect it is safe with builtin types, we can not be sure of this with all types, so there is a good chance the type annotations will take the form of def foo(a: argtype(int)) def bar(b: argtype(Baz, adapter=Baz.adaptFrom)) which defines that foo takes an int object and bar takes a Baz instance, which can be adapted to with the classmethod Baz.adaptFrom. Maybe Baz' constructor takes a database connection and object ID, and would break just being passed a random object. In this case, we don't need to use my_anno.Integer or something like that, because we aren't (and shouldn't) use the built-in type objects directly as our annotation objects. I'll propose this as a new rule the PEP should define that annotation handling libraries should not only avoid expecting instances of built-in types as annotations (lists and strings, etc.) but also those types themselves (using the int object itself as an annotation). It may seem terribly convenient to use types directly, but its becoming more and more apparent that all annotations should be wrapped in something by which the meaning of the annotation can be reliably and safely determined by its type, and no built-in type really does that in an agreeable way. Also, Collin Winter said: > One particularly divisive issue is whether tuples should be treated as > fixed- or arbitrary-length containers. Concretely, does > "tuple(Number)" match only 1-tuples with a single Number element, or > does it match all tuples that have only Number elements? I would personally be completely adverse to the use of any containers as a meaning of "This argument is a list/tuple of some specific types". On on hand, this is the realm of the individual annotation libraries, so it isn't even relevent to this convesation. However, when it is done, a specific type to represent the concept would be more prodent. For example, I would like to annotate with listOf(str, int) or tupleOf(multiple(bool)) to mean "A list of a str and an int" and "A tuple of muiltple bool objects", respectively. From greg.ewing at canterbury.ac.nz Tue Aug 15 03:13:53 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 15 Aug 2006 13:13:53 +1200 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <5.1.1.6.0.20060814002014.02dbe9d0@sparrow.telecommunity.com> References: <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com> <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com> <5.1.1.6.0.20060814002014.02dbe9d0@sparrow.telecommunity.com> Message-ID: <44E11FD1.1020201@canterbury.ac.nz> Phillip J. Eby wrote: > It can't be a "separate program altogether", since to get at the > annotations, the program must import the module that contains them. Why? I can imagine something like a documentation generator or static type checker that just parses the source, being careful not to execute anything. Also, even if it does work by importing the module, how is the module being imported supposed to know which annotation processor is going to be processing its annotations, and therefore what generic methods need to be overridden, and how to go about doing that -- assuming there is no standardisation of any sort? -- Greg From greg.ewing at canterbury.ac.nz Tue Aug 15 03:19:26 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 15 Aug 2006 13:19:26 +1200 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <5.1.1.6.0.20060814002138.02909ad0@sparrow.telecommunity.com> References: <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com> <5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com> <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com> <44DD5DF0.40405@acm.org> <5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com> <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com> <5.1.1.6.0.20060814002138.02909ad0@sparrow.telecommunity.com> Message-ID: <44E1211E.5040308@canterbury.ac.nz> Phillip J. Eby wrote: > The examples there are very short > and simple; in fact the complete Message implementation, including > imports and overload declarations is only *6 lines long*. > > So, my only guess is that the people who looked at that skimmed right > past it, looking for something more complicated! If it really is that short and simple, why not just post the whole thing? Then there's no danger of anyone getting lost in parts of the documentation they're not supposed to be looking at. -- Greg From ncoghlan at gmail.com Tue Aug 15 03:25:57 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 15 Aug 2006 11:25:57 +1000 Subject: [Python-3000] Ctypes as cross-interpreter C calling interface In-Reply-To: References: <20060814172000.1717.863905740.divmod.quotient.22821@ohm> Message-ID: <44E122A5.6090203@gmail.com> Guido van Rossum wrote: > On 8/14/06, Jean-Paul Calderone wrote: >> On Mon, 14 Aug 2006 09:09:49 -0700, Guido van Rossum wrote: >>> On 8/14/06, Jean-Paul Calderone wrote: >>>> This is a joke, right? >>> Because it's a good idea to have to write separate wrappers >>> around every useful library for each dynamic languague separately? >> If a project has done this successfully, I don't think I've seen it. Can >> you point out some examples where this has been accomplished in a useful >> form? The nearest thing I can think of is SWIG, which is basically a >> failure. > > SWIG is not my favorite (msotly because I don't like C++ much) but > it's used very effectively here at Google (for example); I wouldn't > dream of calling it a failure. I've found SWIG to be especially effective when using it to wrap a library I have control over, so I can tweak the interface to avoid stressing the code generator too much. Running it over arbitrary C libraries requires a fair bit of work defining the necessary typemaps (although you still have the benefit of writing the typemap for a given style of interface *once* instead of for every function that uses it). However, in the context of this discussion, a SWIG-like tool that produced pure Python ctypes-based code would be a vast improvement. Taking the SWIG typemaps for the Python C API as a starting point, you could even do it with SWIG itself (rather than reinventing the wheel, as codegen's components for parsing C header files appear to do). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From pje at telecommunity.com Tue Aug 15 03:33:03 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 14 Aug 2006 21:33:03 -0400 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <44E11FD1.1020201@canterbury.ac.nz> References: <5.1.1.6.0.20060814002014.02dbe9d0@sparrow.telecommunity.com> <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com> <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com> <5.1.1.6.0.20060814002014.02dbe9d0@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20060814212620.025d8f70@sparrow.telecommunity.com> At 01:13 PM 8/15/2006 +1200, Greg Ewing wrote: >Phillip J. Eby wrote: > >>It can't be a "separate program altogether", since to get at the >>annotations, the program must import the module that contains them. > >Why? I can imagine something like a documentation >generator or static type checker that just parses the >source, being careful not to execute anything. How is such a thing going to know what doc("foo") means at the time the code is run? What about closures, dynamic imports, etc.? >Also, even if it does work by importing the module, >how is the module being imported supposed to know >which annotation processor is going to be processing >its annotations, and therefore what generic methods >need to be overridden, and how to go about doing >that -- assuming there is no standardisation of any >sort? Weak imports are a good solution for the case where interop is optional. You do something like: @whenImported('some.doc.processor') def registerDocHandler(processor): @processor.someOverloadedFunction.when(SomeType) def handleTypeDefinedByThisModule(...): ... The idea here being that the registration occurs if and only if the some.doc.processor module is imported during the lifetime of the program. See http://cheeseshop.python.org/pypi/Importing for a package that contains a non-decorator version of this functionality. Anyway, the idea here is that if you create a library with a bunch of annotation types in it, you use weak importing to optionally register handlers for whatever processors are out there that you want to support. Also, other people can of course define their own third-party glue modules that provide this kind of support for some given combination of annotation types and processors. From pje at telecommunity.com Tue Aug 15 03:37:47 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 14 Aug 2006 21:37:47 -0400 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <44E1211E.5040308@canterbury.ac.nz> References: <5.1.1.6.0.20060814002138.02909ad0@sparrow.telecommunity.com> <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com> <5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com> <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com> <44DD5DF0.40405@acm.org> <5.1.1.6.0.20060812113118.0293d2d8@sparrow.telecommunity.com> <5.1.1.6.0.20060812165132.0226e550@sparrow.telecommunity.com> <5.1.1.6.0.20060814002138.02909ad0@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20060814213434.027739f0@sparrow.telecommunity.com> At 01:19 PM 8/15/2006 +1200, Greg Ewing wrote: >Phillip J. Eby wrote: > > > The examples there are very short >>and simple; in fact the complete Message implementation, including >>imports and overload declarations is only *6 lines long*. >>So, my only guess is that the people who looked at that skimmed right >>past it, looking for something more complicated! > >If it really is that short and simple, why not just post >the whole thing? Then there's no danger of anyone getting >lost in parts of the documentation they're not supposed >to be looking at. Here are the most relevant bits excerpted from the text: To create a new kind of metadata, we need to create a class that represents the metadata, and then add a method to the ``binding.declareAttribute()`` generic function. For our example, we'll create a ``Message`` metadata type that just prints a message when the metadata is registered:: >>> class Message(str): ... pass >>> def print_message(classobj, attrname, metadata): ... print metadata, "(%s.%s)" % (classobj.__name__, attrname) >>> binding.declareAttribute.addMethod(Message,print_message) Now, we'll see if it works:: >>> class Foo: pass >>> binding.declareAttribute(Foo, 'bar', Message("testing")) testing (Foo.bar) In addition to defining your own metadata types, ``declareAttribute()`` has built-in semantics for ``None`` and sequence types. The former is a no-op, and the latter re-invokes ``declareAttribute()`` on the sequence contents:: >>> binding.declareAttribute(Foo, 'baz', ... [Message('test1'), Message('test2')] ... ) test1 (Foo.baz) test2 (Foo.baz) >>> binding.declareAttribute(Foo, 'spam', None) # no-op From tim.peters at gmail.com Tue Aug 15 03:39:42 2006 From: tim.peters at gmail.com (Tim Peters) Date: Mon, 14 Aug 2006 21:39:42 -0400 Subject: [Python-3000] threading, part 2 --- + a bit of ctypes FFI worry In-Reply-To: References: <1f7befae0608120329wc646164w25ca4875da4cc5c0@mail.gmail.com> Message-ID: <1f7befae0608141839vf45bddaw7a19701d766cb4af@mail.gmail.com> [Tim Peters] >> ... >> When the ctypes docs talk about passing and returning integers, they >> never explain what "integers" /means/, but it seems the docs >> implicitly have a 32-bit-only view of the world here. In reality >> "integer" seems to mean the native C `int` type. [Thomas Heller] > 'ctypes.c_int' and 'ctypes.c_long' correspond to the C 'int' and 'long' types. Sure, that's clear. It's where the docs talk about (the unqualified) "integers", and the quotes there aren't just to scare you ;-). Like in: http://starship.python.net/crew/theller/ctypes/tutorial.html near the end of section "Calling functions": Python integers, strings and unicode strings are the only objects that can directly be used as parameters in these function calls. What does the word "integers" /mean/ there? > If you think that the docs could be clearer, please suggest changes. I can't, because I don't know what was intended. Python integers come in two flavors, `int` and `long`, so I assumed at first that the "Python integers" in the above probably meant "a Python (short) int" (which is a C `long`). But writing the thread test using that assumption failed on some 64-bit buildbots. After staring at the specific ways it failed, my next guess was that by "Python integers" the docs don't really mean Python integers at all, but C's `int`. That's what convinced me to /try/ wrapping the thread id in ctypes.c_long(), and the test problems went away then, so I did too :-) I searched all the docs for the word "integers" and never found out what was intended. So you could search the docs for the same thing. Like, still in the tutorial, at the start of section "Return types": By default functions are assumed to return integers. Or in the reference docs: Note that all these functions are assumed to return integers, which is of course not always the truth, so you have to assign the correct restype attribute to use these functions. and the description of memmove(): memmove(dst, src, count) Same as the standard C memmove library function: copies count bytes from src to dst. dst and src must be integers or ... Python has at least three meanings for the word "integer" (short, long, & "either"), and C has at least 10 (signed & unsigned char, short, int, long, & long long), so the unqualifed "integer" is highly ambiguous. While in many contexts that doesn't much matter, in ctypes it does. From ncoghlan at gmail.com Tue Aug 15 03:44:10 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 15 Aug 2006 11:44:10 +1000 Subject: [Python-3000] PEP3102 Keyword-Only Arguments In-Reply-To: References: Message-ID: <44E126EA.5000200@gmail.com> Guido van Rossum wrote: >> It would be really nice in the example above to mark ``self`` in >> ``__call__`` as a positional only argument. > > But this is a rather unusual use case isn't it? It's due to the bound > methods machinery. Do you have other use cases? I would assume that > normally such wrappers take their own control arguments in the form of > keyword-only arguments (that are unlikely to conflict with arguments > of the wrapped method). > I'd like a syntax or convention for it so I can document the signature of functions written in C that accept positional-only arguments using Python's own function definition notation ;) I'd also like to be able to use it to say "I'm not sure about this parameter name yet, so don't rely on it staying the same!" while developing an API. However, I'm also wondering if we need an actual syntax, or if a simple convention would do the trick: start the names of positional-only arguments with an underscore. Then Steven's examples would become: >>> class Wrapper(object): ... def __init__(self, func): ... self.func = func ... def __call__(_self, *args, **kwargs): ... print 'calling wrapped function' ... return self.func(*args, **kwargs) ... def failUnlessRaises(_self, _excClass, _callableObj, *args, **kwargs): With the 'best practice' being that any function that accepts arbitrary kwargs should use an underscore on its named parameters. The only way to screw the latter example up would be for a caller to do: self.failUnlessRaises(TypeError, my_func, _callableObj=foo) And if the 'underscore indicates positional only' convention were adopted officially, it would be trivial for PyLint/PyChecker to flag any call that specifies a name starting with an underscore as a keyword argument. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ncoghlan at gmail.com Tue Aug 15 03:49:05 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 15 Aug 2006 11:49:05 +1000 Subject: [Python-3000] Python/C++ question In-Reply-To: References: <44DA6C01.2040904@acm.org> <44DF0800.4060204@acm.org> Message-ID: <44E12811.5090709@gmail.com> Georg Brandl wrote: > Guido van Rossum wrote: >> Implementation Language >> ================== >> >> Python 3000 will be implemented in C, and the implementation will be >> derived as an evolution of the Python 2 code base. This reflects my >> views (which I share with Joel Spolsky) on the dangers of complete >> rewrites. Since Python 3000 as a language is a relatively mild >> improvement on Python 2, we can gain a lot by not attempting to >> reimplement the language from scratch. I am not against parallel >> from-scratch implementation efforts, but my own efforts will be >> directed at the language and implementation that I know best. > > I had already added something to PEP 3099, but if you like that approach > better, I'll add that to PEP 3000. You can always keep both :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From tjreedy at udel.edu Tue Aug 15 03:53:27 2006 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 14 Aug 2006 21:53:27 -0400 Subject: [Python-3000] Python/C++ question References: <44DA6C01.2040904@acm.org> <44DF0800.4060204@acm.org> Message-ID: "Georg Brandl" wrote in message news:ebqi1f$80m$2 at sea.gmane.org... > Guido van Rossum wrote: >> Implementation Language >> ================== >> >> Python 3000 will be implemented in C, and the implementation will be >> derived as an evolution of the Python 2 code base. This reflects my >> views (which I share with Joel Spolsky) on the dangers of complete >> rewrites. Since Python 3000 as a language is a relatively mild >> improvement on Python 2, we can gain a lot by not attempting to >> reimplement the language from scratch. I am not against parallel >> from-scratch implementation efforts, but my own efforts will be >> directed at the language and implementation that I know best. > > I had already added something to PEP 3099, but if you like that approach > better, I'll add that to PEP 3000. Please add this. It clearly says what and why and will answer questions that are sure to come. I would leave a comment in 3099 also. tjr From alexander.belopolsky at gmail.com Tue Aug 15 04:15:36 2006 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 14 Aug 2006 22:15:36 -0400 Subject: [Python-3000] [Python-Dev] Type of range object members In-Reply-To: <44E12B0A.9020907@gmail.com> References: <44E12B0A.9020907@gmail.com> Message-ID: <2DA248BC-5534-4CE5-A9C8-84259E8A71B2@local> On Aug 14, 2006, at 10:01 PM, Nick Coghlan wrote: > Guido van Rossum wrote: >> Methinks that as long as PyIntObject uses long (see intobject.h) >> there's no point in changing this to long. > > Those fields are going to have to change to Py_Object* eventually > if xrange() is going to become the range() replacement in Py3k. . . > In this case it will become indistinguishable from typedef struct { PyObject_HEAD PyObject *start, *stop, *step; /* not NULL */ } PySliceObject; See sliceobject.h . Would it make sense to unify rangeobject with PySliceObject? From ncoghlan at gmail.com Tue Aug 15 04:34:18 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 15 Aug 2006 12:34:18 +1000 Subject: [Python-3000] [Python-Dev] Type of range object members In-Reply-To: <2DA248BC-5534-4CE5-A9C8-84259E8A71B2@local> References: <44E12B0A.9020907@gmail.com> <2DA248BC-5534-4CE5-A9C8-84259E8A71B2@local> Message-ID: <44E132AA.1020900@gmail.com> Alexander Belopolsky wrote: > > On Aug 14, 2006, at 10:01 PM, Nick Coghlan wrote: > >> Guido van Rossum wrote: >>> Methinks that as long as PyIntObject uses long (see intobject.h) >>> there's no point in changing this to long. >> >> Those fields are going to have to change to Py_Object* eventually if >> xrange() is going to become the range() replacement in Py3k. . . >> > > In this case it will become indistinguishable from > > typedef struct { > PyObject_HEAD > PyObject *start, *stop, *step; /* not NULL */ > } PySliceObject; > > See sliceobject.h . Would it make sense to unify rangeobject with > PySliceObject? > Not really. The memory layouts may end up being the same in Py3k, but they're still different types. The major differences between the two types just happen to lie in the methods they support (as defined by the value of the type pointer in PyObject_HEAD), rather than the data they contain. Besides, the range object may actually keep the current optimised behaviour for dealing with PyInt values, only falling back to PyObject* if one of start, stop or step was too large to fit into a PyInt. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From greg.ewing at canterbury.ac.nz Tue Aug 15 04:49:42 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 15 Aug 2006 14:49:42 +1200 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <5.1.1.6.0.20060814212620.025d8f70@sparrow.telecommunity.com> References: <5.1.1.6.0.20060814002014.02dbe9d0@sparrow.telecommunity.com> <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com> <5.1.1.6.0.20060811224658.0226da70@sparrow.telecommunity.com> <5.1.1.6.0.20060814002014.02dbe9d0@sparrow.telecommunity.com> <5.1.1.6.0.20060814212620.025d8f70@sparrow.telecommunity.com> Message-ID: <44E13646.9020709@canterbury.ac.nz> Phillip J. Eby wrote: > How is such a thing going to know what doc("foo") means at the time the > code is run? What about closures, dynamic imports, etc.? Annotations intended for such external processors would have to be designed not to rely on anything dynamic, i.e. be purely declarative. Maybe this is why we're having trouble communicating. You seem to be thinking of annotations purely as dynamic things that affect the execution of the program. I'm thinking of them as something that will just as likely be used in a declarative way, possibly by tools that don't execute the code at all, but do something entirely different with it. > Weak imports are a good solution for the case where interop is > optional. You do something like: > > @whenImported('some.doc.processor') > def registerDocHandler(processor): > @processor.someOverloadedFunction.when(SomeType) > def handleTypeDefinedByThisModule(...): > ... But this requires the module using the annotations to anticipate all the processors that will potentially process its annotations, and teach each one of them about itself. > Also, other people can of course define their own third-party > glue modules that provide this kind of support for some given > combination of annotation types and processors. I don't see how a third party can do this, because only the module containing the annotations can know what idiosynchratic scheme it's chosen for combining them. -- Greg > From guido at python.org Tue Aug 15 04:58:12 2006 From: guido at python.org (Guido van Rossum) Date: Mon, 14 Aug 2006 19:58:12 -0700 Subject: [Python-3000] Python/C++ question In-Reply-To: References: <44DA6C01.2040904@acm.org> <44DF0800.4060204@acm.org> Message-ID: +1 On 8/14/06, Terry Reedy wrote: > > "Georg Brandl" wrote in message > news:ebqi1f$80m$2 at sea.gmane.org... > > Guido van Rossum wrote: > >> Implementation Language > >> ================== > >> > >> Python 3000 will be implemented in C, and the implementation will be > >> derived as an evolution of the Python 2 code base. This reflects my > >> views (which I share with Joel Spolsky) on the dangers of complete > >> rewrites. Since Python 3000 as a language is a relatively mild > >> improvement on Python 2, we can gain a lot by not attempting to > >> reimplement the language from scratch. I am not against parallel > >> from-scratch implementation efforts, but my own efforts will be > >> directed at the language and implementation that I know best. > > > > I had already added something to PEP 3099, but if you like that approach > > better, I'll add that to PEP 3000. > > Please add this. It clearly says what and why and will answer questions > that are sure to come. I would leave a comment in 3099 also. > > tjr > > > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Aug 15 05:00:32 2006 From: guido at python.org (Guido van Rossum) Date: Mon, 14 Aug 2006 20:00:32 -0700 Subject: [Python-3000] PEP3102 Keyword-Only Arguments In-Reply-To: <44E126EA.5000200@gmail.com> References: <44E126EA.5000200@gmail.com> Message-ID: On 8/14/06, Nick Coghlan wrote: > However, I'm also wondering if we need an actual syntax, or if a simple > convention would do the trick: start the names of positional-only arguments > with an underscore. Hm... and perhaps we could forbid keyword arguments starting with an underscore in the call syntax? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Aug 15 05:01:57 2006 From: guido at python.org (Guido van Rossum) Date: Mon, 14 Aug 2006 20:01:57 -0700 Subject: [Python-3000] threading, part 2 --- + a bit of ctypes FFI worry In-Reply-To: <1f7befae0608141839vf45bddaw7a19701d766cb4af@mail.gmail.com> References: <1f7befae0608120329wc646164w25ca4875da4cc5c0@mail.gmail.com> <1f7befae0608141839vf45bddaw7a19701d766cb4af@mail.gmail.com> Message-ID: Perhaps this thread can be moved back to python-dev? I'm not sure how relevant a discussion of the ambiguities in ctypes' docs are for Py3k. On 8/14/06, Tim Peters wrote: > [Tim Peters] > >> ... > >> When the ctypes docs talk about passing and returning integers, they > >> never explain what "integers" /means/, but it seems the docs > >> implicitly have a 32-bit-only view of the world here. In reality > >> "integer" seems to mean the native C `int` type. > > [Thomas Heller] > > 'ctypes.c_int' and 'ctypes.c_long' correspond to the C 'int' and 'long' types. > > Sure, that's clear. It's where the docs talk about (the unqualified) > "integers", and the quotes there aren't just to scare you ;-). Like > in: > > http://starship.python.net/crew/theller/ctypes/tutorial.html > > near the end of section "Calling functions": > > Python integers, strings and unicode strings are the only objects that can > directly be used as parameters in these function calls. > > What does the word "integers" /mean/ there? > > > If you think that the docs could be clearer, please suggest changes. > > I can't, because I don't know what was intended. Python integers come > in two flavors, `int` and `long`, so I assumed at first that the > "Python integers" in the above probably meant "a Python (short) int" > (which is a C `long`). But writing the thread test using that > assumption failed on some 64-bit buildbots. After staring at the > specific ways it failed, my next guess was that by "Python integers" > the docs don't really mean Python integers at all, but C's `int`. > That's what convinced me to /try/ wrapping the thread id in > ctypes.c_long(), and the test problems went away then, so I did too > :-) > > I searched all the docs for the word "integers" and never found out > what was intended. So you could search the docs for the same thing. > Like, still in the tutorial, at the start of section "Return types": > > By default functions are assumed to return integers. > > Or in the reference docs: > > Note that all these functions are assumed to return integers, > which is of course > not always the truth, so you have to assign the correct restype > attribute to use > these functions. > > and the description of memmove(): > > memmove(dst, src, count) > > Same as the standard C memmove library function: copies count bytes from > src to dst. dst and src must be integers or ... > > Python has at least three meanings for the word "integer" (short, > long, & "either"), and C has at least 10 (signed & unsigned char, > short, int, long, & long long), so the unqualifed "integer" is highly > ambiguous. While in many contexts that doesn't much matter, in ctypes > it does. > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Aug 15 05:04:27 2006 From: guido at python.org (Guido van Rossum) Date: Mon, 14 Aug 2006 20:04:27 -0700 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <5.1.1.6.0.20060814174448.025d9018@sparrow.telecommunity.com> References: <5.1.1.6.0.20060814174448.025d9018@sparrow.telecommunity.com> Message-ID: On 8/14/06, Phillip J. Eby wrote: > At 1:51 PM 8/14/2006 -0700, "Paul Prescod" wrote: > >On 8/14/06, Jim Jewett wrote: > > > The definition of a type as an annotation should probably be either > > > defined or explicitly undefined. Earlier discussions talked about > > > things like > > > > > > def f (a:int, b:(float | Decimal), c:[int, str, X]) ->str) > > > > > >I think that's a separate (large!) PEP. This PEP should disallow frameworks > >from inventing their own meaning for this syntax (requiring them to at least > >wrap). Then Guido and crew can dig into this issue on their own schedule. > > I see we haven't made nearly as much progress on the concept of "no > predefined semantics" as I thought we had. :( > > i.e., -1 on constraining what types mean. Haven't I said that the whole time? I *thought* that Collin's PEP steered clear from the topic too. At the same time, does this preclude having some kind of "default" type notation in the standard library? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From steven.bethard at gmail.com Tue Aug 15 05:11:40 2006 From: steven.bethard at gmail.com (Steven Bethard) Date: Mon, 14 Aug 2006 21:11:40 -0600 Subject: [Python-3000] PEP3102 Keyword-Only Arguments In-Reply-To: References: <44E126EA.5000200@gmail.com> Message-ID: [Steven Bethard] > It would be really nice in the example above to mark ``self`` in > ``__call__`` as a positional only argument. [Nick Coghlan] > However, I'm also wondering if we need an actual syntax, or if a simple > convention would do the trick: start the names of positional-only arguments > with an underscore. That would certainly be good enough for me. As long as it's documented and there's somewhere to point to when someone does it wrong, it solves my problem. [Guido van Rossum] > Hm... and perhaps we could forbid keyword arguments starting with an > underscore in the call syntax? -0. As long as the convention exists somewhere, I don't think this buys us too much. I think supplying a keyword argument when you should be using a positional is about the same level of willing-to-shoot-yourself-in-the-foot as using attributes that are supposed to be private (the other place where leading underscores are suggested). Steve -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From aahz at pythoncraft.com Tue Aug 15 05:37:59 2006 From: aahz at pythoncraft.com (Aahz) Date: Mon, 14 Aug 2006 20:37:59 -0700 Subject: [Python-3000] PEP3102 Keyword-Only Arguments In-Reply-To: References: <44E126EA.5000200@gmail.com> Message-ID: <20060815033759.GA4078@panix.com> On Mon, Aug 14, 2006, Guido van Rossum wrote: > On 8/14/06, Nick Coghlan wrote: >> >> However, I'm also wondering if we need an actual syntax, or if a simple >> convention would do the trick: start the names of positional-only arguments >> with an underscore. > > Hm... and perhaps we could forbid keyword arguments starting with an > underscore in the call syntax? Do you mean forbid by convention or syntactically? I'm -1 on the latter; that would be far too much gratuitous code breakage. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." --Brian W. Kernighan From alexander.belopolsky at gmail.com Tue Aug 15 06:15:54 2006 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 15 Aug 2006 00:15:54 -0400 Subject: [Python-3000] [Python-Dev] Type of range object members In-Reply-To: <44E132AA.1020900@gmail.com> References: <44E12B0A.9020907@gmail.com> <2DA248BC-5534-4CE5-A9C8-84259E8A71B2@local> <44E132AA.1020900@gmail.com> Message-ID: On Aug 14, 2006, at 10:34 PM, Nick Coghlan wrote: > Alexander Belopolsky wrote: [snip] >> Would it make sense to unify rangeobject with PySliceObject? > > Not really. The memory layouts may end up being the same in Py3k, > but they're still different types. The major differences between > the two types just happen to lie in the methods they support (as > defined by the value of the type pointer in PyObject_HEAD), rather > than the data they contain. The slice objects support a single method "indices", which I have to admit I've have not seen before a minute ago. (I've grepped through the standard library and did not see it used anywhere). The slice attributes start/stop/step are probably more useful, but I don't see why those cannot be added to the range object. > > Besides, the range object may actually keep the current optimised > behaviour for dealing with PyInt values, only falling back to > PyObject* if one of start, stop or step was too large to fit into a > PyInt. How would that hurt reusing them for slicing? From jcarlson at uci.edu Tue Aug 15 07:43:17 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Mon, 14 Aug 2006 22:43:17 -0700 Subject: [Python-3000] PEP3102 Keyword-Only Arguments In-Reply-To: <20060815033759.GA4078@panix.com> References: <20060815033759.GA4078@panix.com> Message-ID: <20060814223931.19BD.JCARLSON@uci.edu> Aahz wrote: > > On Mon, Aug 14, 2006, Guido van Rossum wrote: > > On 8/14/06, Nick Coghlan wrote: > >> > >> However, I'm also wondering if we need an actual syntax, or if a simple > >> convention would do the trick: start the names of positional-only arguments > >> with an underscore. > > > > Hm... and perhaps we could forbid keyword arguments starting with an > > underscore in the call syntax? > > Do you mean forbid by convention or syntactically? I'm -1 on the latter; > that would be far too much gratuitous code breakage. At least 40 examples of it being used in a keyword argument in the 2.5b2 standard library (so sayeth my regular expression of '\((.*?\s)?_\w*=' ). - Josiah From ncoghlan at gmail.com Tue Aug 15 08:44:25 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 15 Aug 2006 16:44:25 +1000 Subject: [Python-3000] PEP3102 Keyword-Only Arguments In-Reply-To: References: <44E126EA.5000200@gmail.com> Message-ID: <44E16D49.2010601@gmail.com> Steven Bethard wrote: > [Steven Bethard] >> It would be really nice in the example above to mark ``self`` in >> ``__call__`` as a positional only argument. > > [Nick Coghlan] >> However, I'm also wondering if we need an actual syntax, or if a simple >> convention would do the trick: start the names of positional-only >> arguments >> with an underscore. > > That would certainly be good enough for me. As long as it's > documented and there's somewhere to point to when someone does it > wrong, it solves my problem. Putting something in PEP 8's section on naming conventions should do the trick (along with updating the standard library so that things like UserDict that accept arbitrary **kwargs use it for their positional arguments). That would also serve as a reminder that the support for keyword arguments means that the parameter *names* are part of the public interface of a Python function along with their positions and types. > > [Guido van Rossum] >> Hm... and perhaps we could forbid keyword arguments starting with an >> underscore in the call syntax? > > -0. As long as the convention exists somewhere, I don't think this > buys us too much. I think supplying a keyword argument when you should > be using a positional is about the same level of > willing-to-shoot-yourself-in-the-foot as using attributes that are > supposed to be private (the other place where leading underscores are > suggested). That's exactly the comparison I was aiming for - you *can* if you really have to, but you also *shouldn't*. And if you do, you'd better including a comment explaining why you have to if you don't want any reviewers complaining about it ;) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From paul at prescod.net Tue Aug 15 15:56:18 2006 From: paul at prescod.net (Paul Prescod) Date: Tue, 15 Aug 2006 06:56:18 -0700 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <5.1.1.6.0.20060814174448.025d9018@sparrow.telecommunity.com> References: <5.1.1.6.0.20060814174448.025d9018@sparrow.telecommunity.com> Message-ID: <1cb725390608150656o2587c0ddx2af8e7df80f8e7b8@mail.gmail.com> On 8/14/06, Phillip J. Eby wrote: > > At 1:51 PM 8/14/2006 -0700, "Paul Prescod" wrote: > >On 8/14/06, Jim Jewett wrote: > > > The definition of a type as an annotation should probably be either > > > defined or explicitly undefined. Earlier discussions talked about > > > things like > > > > > > def f (a:int, b:(float | Decimal), c:[int, str, X]) ->str) > > > > > >I think that's a separate (large!) PEP. This PEP should disallow > frameworks > >from inventing their own meaning for this syntax (requiring them to at > least > >wrap). Then Guido and crew can dig into this issue on their own schedule. > > I see we haven't made nearly as much progress on the concept of "no > predefined semantics" as I thought we had. :( > i.e., -1 on constraining what types mean. > > I don't understand what you're saying. 1. Do you (still?) agree that the meaning of the list type should be defined as a semantically neutral container for other annotations? 2. Do you (still?) agree that the meanings of ALL built-in types at the top-level should be reserved for the Python language designers and should not be randomly used by framework developers. In other words: the function type declaration syntax above should not be used by one third party type checker while another third-party type checker uses the same structure to mean something totally different. Note that I don't mind if they have conflicting semantics for the same expression as long as the end-user is forced to declare which semantic model they are using: tc = typechecker.typecheck tl = typelinter.check_types def f (a:tc(int), b:tc(float | Decimal), c:tc([int, str, X])) -> tc(str) def g (a:tl(int), b:tl(float | Decimal), c:tl([int, str, X])) -> tl(str) 3. Do you agree that 1. and 2. together promotes the experimentation and variety that we need? def f (a: [tc(int),tl("Integer")] b: [tc(float | Decimal), tl(Or("float", "Decimal")] c: [tc([int, str, X]), tl(listOf("Integer", "string", "X"))] ) -> [tc(str), tl(str)] Paul Prescod -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060815/31a4e803/attachment.htm From paul at prescod.net Tue Aug 15 16:04:19 2006 From: paul at prescod.net (Paul Prescod) Date: Tue, 15 Aug 2006 07:04:19 -0700 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: References: <5.1.1.6.0.20060814174448.025d9018@sparrow.telecommunity.com> Message-ID: <1cb725390608150704x4ef5a9abm532cd7ebaae511d@mail.gmail.com> On 8/14/06, Guido van Rossum wrote: > > > Haven't I said that the whole time? I *thought* that Collin's PEP > steered clear from the topic too. At the same time, does this preclude > having some kind of "default" type notation in the standard library? The PEP steered TOO far of this topic. If it is total free-for-all then when and if we do come up with a standard syntax (whether in 2006 or 2010) it will conflict with deployed code that used the same syntax to mean something different then the standard. And even if there is never, ever, going to be a standard, it must be possible for tools reading the annotations to know whether the user intended their markup to conform to metadata-syntax 1, where "int" means "32 bit int" or metadata syntax 2 where it means "arbitrary sized int". Similarly, they must know whether the annotater intended to use metadata syntax 1 where "tuple" means "fixed size, heterogenous" or syntax 2 where it means "immutable list". Finally, there must be a standard way for attaching more than one annotation to a single parameter. The PEP did not define a syntax for that. I think that there must be enough standardized infrastructure that annotation processors can recognize the annotations that are applicable to them and act on them, even if the user has chosen to use more than one annotation scheme. Paul Prescod -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060815/b8793f1c/attachment.html From collinw at gmail.com Tue Aug 15 16:06:28 2006 From: collinw at gmail.com (Collin Winter) Date: Tue, 15 Aug 2006 09:06:28 -0500 Subject: [Python-3000] Conventions for annotation consumers (was: Re: Draft pre-PEP: function annotations) In-Reply-To: <43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com> References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com> <43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com> Message-ID: <43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com> On 8/14/06, Collin Winter wrote: > On 8/14/06, Jim Jewett wrote: > > On 8/14/06, Collin Winter wrote: > > > The problem with using lists is that its impossible for non-decorator > > > annotation consumers to know which element "belongs" to them. > > > > The ones whose type they own -- which is why I see at least some > > parallel to exceptions, and its inheritance based semantics. > > > > def f(a:[mytype("asdfljasdf"), > > zope.mypackage.something(b,d,e), > > "a string", > > mytype([47]), > > 15): > > > > Whoever defined mytype controls the meaning of the mytype annotations; > > anyone not familiar with that package should ignore them (and hope > > there were no side effects in the expressions that generated them). > > > > zope.mypackage controls that annotation; anyone not familiar with that > > product should ignore it (and hope there were no side effects ...) > > As hideous as I think this is from an aesthetics/visual noise > standpoint, it's probably the only reliable way to let both decorator- > and non-decorator-based consumers work. I've changed my mind. This idea isn't going to work at all. The sticking point is that while this might allow decorator and non-decorator-based consumers to operate side-by-side *within a single program*, it makes it impossible for things like pychecker or an optimising compiler to take advantage of the annotations. Here's another stab at my earlier idea: Here's the modified example @docstring @typechecker @constrain_values def foo(a: {'doc': "Frobnication count", 'type': Number, 'constrain_values': range(3, 9)}, b: {'type': Number, # This can be only 4, 8 or 12 'constrain_values': [4, 8, 12]}) -> {'type': Number} We're still using dicts to hold the annotations, but instead of having the dict keyed on the name (function.__name__) of the annotation consumer, the keys are arbitrary (for certain values of "arbitrary"). To enable both in-program and static analysis, the most prominent keys will be specified by the PEP. In this example, "type" and "doc" are reserved keys; anything that needs the intended type of an annotation will look at the "type" key, anything that's looking for special doc strings will look at the "doc" key. Any other consumers are free to define whatever keys they want (e.g., "constrain_values", above), so long as they stay away from the reserved strings. The dict form will be required, even if there's only one type of annotation. To modify the example above to only use typechecker(), @typechecker def foo(a: {'type': Number}, b: {'type': Number}) -> {'type': Number} I'm going to raise the bar for future ideas on this subject: any proposals must be able to address the following use cases: 1) Static analysis tools (pychecker, optimising compilers, etc) must be able to use the annotations 2) Decorator-based annotation consumers must be able to use the annotations 3) Non-decorator-based annotation consumers (pydoc, etc) must be able to use the annotations Proposals that do not address all of these will not be considered. Collin Winter From p.f.moore at gmail.com Tue Aug 15 16:38:34 2006 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 15 Aug 2006 15:38:34 +0100 Subject: [Python-3000] Conventions for annotation consumers (was: Re: Draft pre-PEP: function annotations) In-Reply-To: <43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com> References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com> <43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com> <43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com> Message-ID: <79990c6b0608150738p37debcf4qfa97400d9c17ba52@mail.gmail.com> On 8/15/06, Collin Winter wrote: > Here's the modified example > > @docstring > @typechecker > @constrain_values > def foo(a: {'doc': "Frobnication count", > 'type': Number, > 'constrain_values': range(3, 9)}, > b: {'type': Number, > # This can be only 4, 8 or 12 > 'constrain_values': [4, 8, 12]}) -> {'type': Number} > I've been keeping out of this - I haven't followed the discussions, and I am certainly not up to speed on the various subtleties, but *surely* there's no intention that a monstrosity like this would count as a "normal" function definition in Py3K???!!!! > I'm going to raise the bar for future ideas on this subject: any > proposals must be able to address the following use cases: [...] > Proposals that do not address all of these will not be considered. Can I suggest a further constraint - anything that results in the definition of a simple 2-argument function not fitting on a single source line is probably unworkable in practice? Paul. From paul at prescod.net Tue Aug 15 17:18:31 2006 From: paul at prescod.net (Paul Prescod) Date: Tue, 15 Aug 2006 08:18:31 -0700 Subject: [Python-3000] Conventions for annotation consumers (was: Re: Draft pre-PEP: function annotations) In-Reply-To: <43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com> References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com> <43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com> <43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com> Message-ID: <1cb725390608150818r5e95b3fdw31b6998bc32051bb@mail.gmail.com> I totally do not understand the requirement for the dictionary and its extra overhead. On 8/15/06, Collin Winter wrote: > > > @typechecker > def foo(a: {'type': Number}, > b: {'type': Number}) -> {'type': Number} > > > I'm going to raise the bar for future ideas on this subject: any > proposals must be able to address the following use cases: > > 1) Static analysis tools (pychecker, optimising compilers, etc) must > be able to use the annotations > 2) Decorator-based annotation consumers must be able to use the > annotations > 3) Non-decorator-based annotation consumers (pydoc, etc) must be able > to use the annotations Consider the following syntax: class MyType: def __init__(self, name): self.name = name Number = MyType("Number") Tuple = MyTime("Tuple") def foo(a: tc(Number)) -> Tuple(Number, Number) 1. Static analysis tools can deal with this as much as with ANY truly Pythonic syntax. Their ability to deal will depend (as in any syntax) on their ability to do module or whole-program analysis. In your syntax, or mine, "Number" could be defined dynamically. In either case, someone could say "Number = None" and confuse everything. 2. A decorator based anaysis could look at __signatures__ and do what it needs. 3. Similarly for non-decorator analyzers. In fact, given that decorators are just syntactic sugar for function calls, I don't see why they should require special consideration at all. If the syntax works well for non-decorator consumers then decorators will be just a special case. As far as static analysis tools: Python has never made major concessions to them. Minor concessions, yes. I'd ask that you add the following requirement: * must define how multiple annotation syntaxes can assign potentially differing meanings to built-in types and objects, on the same parameter, without actually conflicting My full program (meeting all requirements) follows. Paul Prescod ==== def simulate_signature(sig): "simulates the signature feature of Pythn 3000" def _(func): func.__signature__ = sig return func return _ def my_print_signature(func): "a demo decorator that prints signatures." if hasattr(func, "__signature__"): sig = func.__signature__ [my_print_arg(name, value) for name, value in sig.items()] return func def my_print_arg(name, annotation): """print a single argument's declaration, skipping unknown anno types.""" if isinstance(annotation, list): [my_print_arg(name, anno) for anno in annotation] elif conformsToInterface(annotation, MyType): print name annotation.print_arg() def conformsToInterface(object, interface): "naive implemenation of interfaces" return isinstance(object, interface) class MyType: def __init__(self, *children): self.children = children def print_arg(self): print self.children #defined in your module. I have no knowledge of it class YourType: def __init__(self, *stuff): pass # a simple signature # real syntax should be: # def foo(bar: MyType(int)) @simulate_signature({"bar": MyType(int)}) def foo(bar): return (bar, bar) # use print signature decorator # real syntax should be: # def foo2(bar: [MyType(int)...]) -> [MyType(...] @my_print_signature @simulate_signature({"bar": [MyType(int), YourType("int")], "return": [MyType(tuple([int, int])), YourType("tuple of int,int")]}) def foo2(bar): return (bar, bar) # can also be used as non-decorator for name, val in vars().items(): my_print_signature(val) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060815/6cee76da/attachment.html From collinw at gmail.com Tue Aug 15 17:36:25 2006 From: collinw at gmail.com (Collin Winter) Date: Tue, 15 Aug 2006 10:36:25 -0500 Subject: [Python-3000] Conventions for annotation consumers (was: Re: Draft pre-PEP: function annotations) In-Reply-To: <1cb725390608150818r5e95b3fdw31b6998bc32051bb@mail.gmail.com> References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com> <43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com> <43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com> <1cb725390608150818r5e95b3fdw31b6998bc32051bb@mail.gmail.com> Message-ID: <43aa6ff70608150836o8f9970dr3974935edefb9f3d@mail.gmail.com> On 8/15/06, Paul Prescod wrote: > I totally do not understand the requirement for the dictionary and its extra > overhead. Under your proposal, annotation consumer libraries have to provide wrappers for Python's built-in types, since the only way a library has of knowing whether it should process a given object is by applying a subclass test. Extending this same idea to static analysis tools, tools like pychecker or an optimising compiler would have to supply their own such wrapper classes. This would be a huge burden, not just on the authors of such tools, but also on those wishing to use these tools. I want people to be able to use Python's built-in types without ugly wrapper classes or any other similar impediments to their pre-existing Python workflow/thought patterns. Collin Winter From pje at telecommunity.com Tue Aug 15 18:05:22 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 15 Aug 2006 12:05:22 -0400 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <1cb725390608150704x4ef5a9abm532cd7ebaae511d@mail.gmail.com > References: <5.1.1.6.0.20060814174448.025d9018@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20060815120206.026052e8@sparrow.telecommunity.com> At 07:04 AM 8/15/2006 -0700, Paul Prescod wrote: >On 8/14/06, Guido van Rossum <guido at python.org> >wrote: >> >>Haven't I said that the whole time? I *thought* that Collin's PEP >>steered clear from the topic too. At the same time, does this preclude >>having some kind of "default" type notation in the standard library? > >The PEP steered TOO far of this topic. If it is total free-for-all then >when and if we do come up with a standard syntax (whether in 2006 or 2010) >it will conflict with deployed code that used the same syntax to mean >something different then the standard. And even if there is never, ever, >going to be a standard, it must be possible for tools reading the >annotations to know whether the user intended their markup to conform to >metadata-syntax 1, where "int" means "32 bit int" or metadata syntax 2 >where it means "arbitrary sized int". Similarly, they must know whether >the annotater intended to use metadata syntax 1 where "tuple" means "fixed >size, heterogenous" or syntax 2 where it means "immutable list". On the contrary - it is precisely this looseness that the PEP meant to specify, and that I support. The alternative is too restrictive. Meanwhile, the absence of predefined semantics does *not* preclude a default type notation existing in the standard library, any more than the absence of a predefined semantics for docstrings or function attributes prevents the stdlib from containing docstring processors or tools that operate on function attributes. From pje at telecommunity.com Tue Aug 15 18:09:48 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 15 Aug 2006 12:09:48 -0400 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <1cb725390608150656o2587c0ddx2af8e7df80f8e7b8@mail.gmail.co m> References: <5.1.1.6.0.20060814174448.025d9018@sparrow.telecommunity.com> <5.1.1.6.0.20060814174448.025d9018@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20060815120531.02601dd8@sparrow.telecommunity.com> At 06:56 AM 8/15/2006 -0700, Paul Prescod wrote: >On 8/14/06, Phillip J. Eby ><pje at telecommunity.com> wrote: >>At 1:51 PM 8/14/2006 -0700, "Paul Prescod" >><paul at prescod.net> wrote: >> >On 8/14/06, Jim Jewett >> <jimjjewett at gmail.com> wrote: >> > > The definition of a type as an annotation should probably be either >> > > defined or explicitly undefined. Earlier discussions talked about >> > > things like >> > > >> > > def f (a:int, b:(float | Decimal), c:[int, str, X]) ->str) >> > >> > >> >I think that's a separate (large!) PEP. This PEP should disallow frameworks >> >from inventing their own meaning for this syntax (requiring them to at >> least >> >wrap). Then Guido and crew can dig into this issue on their own schedule. >> >>I see we haven't made nearly as much progress on the concept of "no >>predefined semantics" as I thought we had. :( > > >>i.e., -1 on constraining what types mean. > >I don't understand what you're saying. I'm saying that we don't need a predefined semantics for annotation objects of type 'type'; i.e. the PEP need not define what "a:int" means. I'm roughly +0 on having predefined semantics for annotation objects of type 'list' and 'str'. >1. Do you (still?) agree that the meaning of the list type should be >defined as a semantically neutral container for other annotations? I believe it should be a recommended best practice -- "defined" is too strong a word. >2. Do you (still?) agree that the meanings of ALL built-in types at the >top-level should be reserved for the Python language designers and should >not be randomly used by framework developers. In other words: the function >type declaration syntax above should not be used by one third party type >checker while another third-party type checker uses the same structure to >mean something totally different. Note that I don't mind if they have >conflicting semantics for the same expression as long as the end-user is >forced to declare which semantic model they are using: I don't see a reason to require an explicit wrapper except as a disambiguator. That is, until you *actually* need them, discriminator-wrappers are a YAGNI. From paul at prescod.net Tue Aug 15 20:09:17 2006 From: paul at prescod.net (Paul Prescod) Date: Tue, 15 Aug 2006 11:09:17 -0700 Subject: [Python-3000] Conventions for annotation consumers (was: Re: Draft pre-PEP: function annotations) In-Reply-To: <43aa6ff70608150836o8f9970dr3974935edefb9f3d@mail.gmail.com> References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com> <43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com> <43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com> <1cb725390608150818r5e95b3fdw31b6998bc32051bb@mail.gmail.com> <43aa6ff70608150836o8f9970dr3974935edefb9f3d@mail.gmail.com> Message-ID: <1cb725390608151109t8c58f89p43ba13472031201a@mail.gmail.com> On 8/15/06, Collin Winter wrote: > > On 8/15/06, Paul Prescod wrote: > > I totally do not understand the requirement for the dictionary and its > extra > > overhead. > > Under your proposal, annotation consumer libraries have to provide > wrappers for Python's built-in types, since the only way a library has > of knowing whether it should process a given object is by applying a > subclass test. > > Extending this same idea to static analysis tools, tools like > pychecker or an optimising compiler would have to supply their own > such wrapper classes. This would be a huge burden, not just on the > authors of such tools, but also on those wishing to use these tools. No, this is incorrect. Metadata is just metadata. Libraries act on metadata. There is a many to many relationship. You could go and define Collin's type metadata syntax. You create a library of wrappers (really you need only ONE wrapper). Then you could convince the writers of PyPy to use the same syntax. So there would be one set of annotations used by two libraries. Here's what the definition of the one wrapper could look like: class my_type: def __init__(self, data): self.data = data That's it. That's all you need to implement. I want people to be able to use Python's built-in types without ugly > wrapper classes or any other similar impediments to their pre-existing > Python workflow/thought patterns. The wrapper class doesn't need to be ugly. Just: from typecheck import my_type as t def foo(a: t(int, int), b: t("abc")): ... Paul Prescod -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060815/6a95fa1a/attachment.html From collinw at gmail.com Tue Aug 15 20:28:24 2006 From: collinw at gmail.com (Collin Winter) Date: Tue, 15 Aug 2006 13:28:24 -0500 Subject: [Python-3000] Conventions for annotation consumers (was: Re: Draft pre-PEP: function annotations) In-Reply-To: <1cb725390608151109t8c58f89p43ba13472031201a@mail.gmail.com> References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com> <43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com> <43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com> <1cb725390608150818r5e95b3fdw31b6998bc32051bb@mail.gmail.com> <43aa6ff70608150836o8f9970dr3974935edefb9f3d@mail.gmail.com> <1cb725390608151109t8c58f89p43ba13472031201a@mail.gmail.com> Message-ID: <43aa6ff70608151128u720b59ecq7d6831177452ebea@mail.gmail.com> On 8/15/06, Paul Prescod wrote: > On 8/15/06, Collin Winter wrote: > > Extending this same idea to static analysis tools, tools like > > pychecker or an optimising compiler would have to supply their own > > such wrapper classes. This would be a huge burden, not just on the > > authors of such tools, but also on those wishing to use these tools. > > No, this is incorrect. Metadata is just metadata. Libraries act on metadata. > There is a many to many relationship. You could go and define Collin's type > metadata syntax. You create a library of wrappers (really you need only ONE > wrapper). Then you could convince the writers of PyPy to use the same > syntax. So there would be one set of annotations used by two libraries. If multiple libraries use the same wrappers, then I can't use more than one of these libraries at the same time. If a typechecking consumer, a docstring consumer and PyPy all use the same wrapper (or "syntax" -- you switch terms between sentences), then I can't have typechecking and docstrings on the same functions, and I can't do either if I'm running my program with PyPy. Collin Winter From paul at prescod.net Tue Aug 15 20:54:48 2006 From: paul at prescod.net (Paul Prescod) Date: Tue, 15 Aug 2006 11:54:48 -0700 Subject: [Python-3000] Draft pre-PEP: function annotations In-Reply-To: <5.1.1.6.0.20060815120531.02601dd8@sparrow.telecommunity.com> References: <5.1.1.6.0.20060814174448.025d9018@sparrow.telecommunity.com> <5.1.1.6.0.20060815120531.02601dd8@sparrow.telecommunity.com> Message-ID: <1cb725390608151154y6ef8138dy1b029f8b84339fa9@mail.gmail.com> On 8/15/06, Phillip J. Eby wrote: > > > I don't see a reason to require an explicit wrapper except as a > disambiguator. That is, until you *actually* need them, > discriminator-wrappers are a YAGNI. How will you know you "actually" need them until you run a tool on your code and it crashes or give the wrong result? And what will you do then, go and clean up your code? And what if the libraries have defined no disambiguation syntax? Then what? Function attributes are at least disambiguated by name. You can't put a function attribute on a function without giving it a name. We need at least this level of disambiguation for metadata. Docstrings have become somewhat of a mess of various meanings. Back in the late 90s I attached XPaths to them and the Spark guy attached parser grammar instructions. Pydoc pointed at one of my xpath-embedding classes would produce useless gibberish. So in that sense there was a clash. Given that pydoc is seldom a mission critical part of any system, this is a minor issue. Confused type declarations could cause bigger problems, from crashed compilers to segmentation faults in intepreters. Paul Prescod -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060815/92902a10/attachment.htm From paul at prescod.net Tue Aug 15 21:07:57 2006 From: paul at prescod.net (Paul Prescod) Date: Tue, 15 Aug 2006 12:07:57 -0700 Subject: [Python-3000] Conventions for annotation consumers (was: Re: Draft pre-PEP: function annotations) In-Reply-To: <43aa6ff70608151128u720b59ecq7d6831177452ebea@mail.gmail.com> References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com> <43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com> <43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com> <1cb725390608150818r5e95b3fdw31b6998bc32051bb@mail.gmail.com> <43aa6ff70608150836o8f9970dr3974935edefb9f3d@mail.gmail.com> <1cb725390608151109t8c58f89p43ba13472031201a@mail.gmail.com> <43aa6ff70608151128u720b59ecq7d6831177452ebea@mail.gmail.com> Message-ID: <1cb725390608151207g36a4c692v2c2d49c1a3e821fa@mail.gmail.com> On 8/15/06, Collin Winter wrote: > > If multiple libraries use the same wrappers, then I can't use more > than one of these libraries at the same time. If a typechecking > consumer, a docstring consumer and PyPy all use the same wrapper (or > "syntax" -- you switch terms between sentences), then I can't have > typechecking and docstrings on the same functions, and I can't do > either if I'm running my program with PyPy. There is a MANY TO MANY relationship between syntaxes (as denoted by wrappers) and tools that work on those syntaxes. Think of it by analogy: there are programming languages and there are interpreters. Some programming languages run on multiple interpreters (e.gPython on .NET, JVM, PyPy, CPython). Some interpreters run multiple languages (e.g. .NET, JVM). Some interpreters run a single language (CPython). Or another analogy from my domain: there are a variety of XML syntaxes. Some are designed for a single program. Others, like Atom, are designed for many, many programs. Also, some programs can handle a single input format. Others (like RSS/Atom readers) can consume many. A Typechecking consumer and a PyPy compiler consumer might work on the same annotations because they are both interested in TYPES (but doing different things with them). These type consumers might also choose to implement more than one type checking syntax, if there were a good reason that more than one arose (perhaps Unix types versus .NET types). A docstring consumer and a typechecking consumer would *by definition* use different syntaxes/frameworks/wrappers because the information that they are looking for is different! But there could be hundreds of docstring consumers (as there are today!). Docstrings are a special case because the syntax for them is fairly obvious (an unadorned string). Paul Prescod -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060815/9fa6ca8b/attachment.html From collinw at gmail.com Tue Aug 15 21:13:16 2006 From: collinw at gmail.com (Collin Winter) Date: Tue, 15 Aug 2006 14:13:16 -0500 Subject: [Python-3000] Conventions for annotation consumers (was: Re: Draft pre-PEP: function annotations) In-Reply-To: <1cb725390608151207g36a4c692v2c2d49c1a3e821fa@mail.gmail.com> References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com> <43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com> <43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com> <1cb725390608150818r5e95b3fdw31b6998bc32051bb@mail.gmail.com> <43aa6ff70608150836o8f9970dr3974935edefb9f3d@mail.gmail.com> <1cb725390608151109t8c58f89p43ba13472031201a@mail.gmail.com> <43aa6ff70608151128u720b59ecq7d6831177452ebea@mail.gmail.com> <1cb725390608151207g36a4c692v2c2d49c1a3e821fa@mail.gmail.com> Message-ID: <43aa6ff70608151213v1ced455btdaadb51e1761738d@mail.gmail.com> On 8/15/06, Paul Prescod wrote: > A Typechecking consumer and a PyPy compiler consumer might work on the same > annotations because they are both interested in TYPES (but doing different > things with them). These type consumers might also choose to implement more > than one type checking syntax, if there were a good reason that more than > one arose (perhaps Unix types versus .NET types). > > A docstring consumer and a typechecking consumer would *by definition* use > different syntaxes/frameworks/wrappers because the information that they are > looking for is different! But there could be hundreds of docstring consumers > (as there are today!). Docstrings are a special case because the syntax for > them is fairly obvious (an unadorned string). So basically what you're saying is that there would be a more-or-less standard wrapper for each application of function annotations. How is this significantly better than my dict-based approach, which uses standardised dict keys to indicate the kind of metadata? Collin Winter From paul at prescod.net Tue Aug 15 21:30:36 2006 From: paul at prescod.net (Paul Prescod) Date: Tue, 15 Aug 2006 12:30:36 -0700 Subject: [Python-3000] Conventions for annotation consumers (was: Re: Draft pre-PEP: function annotations) In-Reply-To: <43aa6ff70608151213v1ced455btdaadb51e1761738d@mail.gmail.com> References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com> <43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com> <43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com> <1cb725390608150818r5e95b3fdw31b6998bc32051bb@mail.gmail.com> <43aa6ff70608150836o8f9970dr3974935edefb9f3d@mail.gmail.com> <1cb725390608151109t8c58f89p43ba13472031201a@mail.gmail.com> <43aa6ff70608151128u720b59ecq7d6831177452ebea@mail.gmail.com> <1cb725390608151207g36a4c692v2c2d49c1a3e821fa@mail.gmail.com> <43aa6ff70608151213v1ced455btdaadb51e1761738d@mail.gmail.com> Message-ID: <1cb725390608151230i298c1889gd93233db2b6f980a@mail.gmail.com> On 8/15/06, Collin Winter wrote: > > So basically what you're saying is that there would be a more-or-less > standard wrapper for each application of function annotations. No, I explicitly said that there may or may not arise standards based upon the existence or non-existence of community consensus and convergence of requirements. Just as there may or may not arise a standard Python web application framework depending on whether the community converges or does not. How is > this significantly better than my dict-based approach, which uses > standardised dict keys to indicate the kind of metadata? The dict-based approach introduces an extra namespace to manage. What if two different groups start fighting over the keyword "type" or "doc" or "lock"? Python already has a module system that allows you to use the word "type" and me to use the word "type" without conflict (though I can't guarantee that it won't be confusing!). Python's module system allows renaming and abbreviating: both valuable features. Also, the dict-based approach is just more punctuation to type. What is the dict equivalent for: def foo(a: type(int)) -> type(int): ... versus def foo(a: {"type":int}) -> {"type": int}: In my approach you could do this: Int = type(int) def foo(a: Int) -> Int Paul Prescod -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060815/8e7c6d23/attachment.htm From collinw at gmail.com Tue Aug 15 22:13:19 2006 From: collinw at gmail.com (Collin Winter) Date: Tue, 15 Aug 2006 15:13:19 -0500 Subject: [Python-3000] Conventions for annotation consumers (was: Re: Draft pre-PEP: function annotations) In-Reply-To: <1cb725390608151230i298c1889gd93233db2b6f980a@mail.gmail.com> References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com> <43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com> <43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com> <1cb725390608150818r5e95b3fdw31b6998bc32051bb@mail.gmail.com> <43aa6ff70608150836o8f9970dr3974935edefb9f3d@mail.gmail.com> <1cb725390608151109t8c58f89p43ba13472031201a@mail.gmail.com> <43aa6ff70608151128u720b59ecq7d6831177452ebea@mail.gmail.com> <1cb725390608151207g36a4c692v2c2d49c1a3e821fa@mail.gmail.com> <43aa6ff70608151213v1ced455btdaadb51e1761738d@mail.gmail.com> <1cb725390608151230i298c1889gd93233db2b6f980a@mail.gmail.com> Message-ID: <43aa6ff70608151313h2b945032wa556903f3f9a44c3@mail.gmail.com> On 8/15/06, Paul Prescod wrote: > On 8/15/06, Collin Winter wrote: > > How is > > this significantly better than my dict-based approach, which uses > > standardised dict keys to indicate the kind of metadata? > > The dict-based approach introduces an extra namespace to manage. What if two > different groups start fighting over the keyword "type" or "doc" or "lock"? How do you foresee this arising? Do you think users will start wanting to apply several different typechecking systems to the same function? The idea behind these standard keys is to a) keep them limited in number, and b) keep them limited in scope. At the moment, I can only foresee two of these: "type" and "doc". My justification for "type" is that users won't be using multiple type systems on the same parameter (and if they are, that their own problem); for "doc" is that a docstring is just a Python string, and there's really only own way to look at that within the scope of documentation strings. Beyond these applications, the annotation consumers are on their own. Consumers that operate in the same domain may well coordinate their keys, and popular keys might make it into the list of standard keys (like the process for getting a module into the stdlib). I hope to have a second draft of the pre-PEP within a few days that includes this idea. Collin Winter From ironfroggy at gmail.com Wed Aug 16 00:20:14 2006 From: ironfroggy at gmail.com (Calvin Spealman) Date: Tue, 15 Aug 2006 18:20:14 -0400 Subject: [Python-3000] Fwd: Conventions for annotation consumers (was: Re: Draft pre-PEP: function annotations) In-Reply-To: <76fd5acf0608151519k564723f9q2d409e3285a7918f@mail.gmail.com> References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com> <1cb725390608150818r5e95b3fdw31b6998bc32051bb@mail.gmail.com> <43aa6ff70608150836o8f9970dr3974935edefb9f3d@mail.gmail.com> <1cb725390608151109t8c58f89p43ba13472031201a@mail.gmail.com> <43aa6ff70608151128u720b59ecq7d6831177452ebea@mail.gmail.com> <1cb725390608151207g36a4c692v2c2d49c1a3e821fa@mail.gmail.com> <43aa6ff70608151213v1ced455btdaadb51e1761738d@mail.gmail.com> <1cb725390608151230i298c1889gd93233db2b6f980a@mail.gmail.com> <43aa6ff70608151313h2b945032wa556903f3f9a44c3@mail.gmail.com> <76fd5acf0608151519k564723f9q2d409e3285a7918f@mail.gmail.com> Message-ID: <76fd5acf0608151520n7692824asf06a849ac3114e5b@mail.gmail.com> On 8/15/06, Collin Winter wrote: > On 8/15/06, Paul Prescod wrote: > > On 8/15/06, Collin Winter wrote: > > > How is > > > this significantly better than my dict-based approach, which uses > > > standardised dict keys to indicate the kind of metadata? > > > > The dict-based approach introduces an extra namespace to manage. What if two > > different groups start fighting over the keyword "type" or "doc" or "lock"? > > How do you foresee this arising? Do you think users will start wanting > to apply several different typechecking systems to the same function? > > The idea behind these standard keys is to a) keep them limited in > number, and b) keep them limited in scope. At the moment, I can only > foresee two of these: "type" and "doc". My justification for "type" is > that users won't be using multiple type systems on the same parameter > (and if they are, that their own problem); for "doc" is that a > docstring is just a Python string, and there's really only own way to > look at that within the scope of documentation strings. > > Beyond these applications, the annotation consumers are on their own. > Consumers that operate in the same domain may well coordinate their > keys, and popular keys might make it into the list of standard keys > (like the process for getting a module into the stdlib). > > I hope to have a second draft of the pre-PEP within a few days that > includes this idea. > > Collin Winter The dictionary approach, although it is what I was originally planning to support, is just too ugly and too limited. String keys can be ambiguous, but objects can not. The arguments against the better approaches, which you keep trying to repeat, just don't hold up. The non-dictionary, multiple annotation proposals can stand up to your requirements perfectly, and fulfill the requirements even better than the dictionary approach. 1) Static analysis tools (pychecker, optimising compilers, etc) must be able to use the annotations As in any example given so far, the annotations would be instansiated within the function definition itself, which means the form 'def foo(a: Bar(baz))' is to be expected. This form could even be documented as the prefered way, as opposed to instansiating the annotation object before hand and simply using its name in the function definition. This leads to simple parsing by external tools, which would be able to deduce what bar is (because before that line there was an 'from bar import Bar'. Dictionary string keys are just too limited and offer too much chance for conflicts. Better to avoid them now than after there are established and conflicting libraries expecting different things. 2) Decorator-based annotation consumers must be able to use the annotations 3) Non-decorator-based annotation consumers (pydoc, etc) must be able to use the annotations A simple filter on the type of the annotations (maybe a helper function in some basic annotation utility library would be helpful) will help any consumer get the types of annotations it needs. In the end, the biggest argument against the dictionary approach is that it is simply too ugly, and would be almost impossible to get around for even a single annotation on a parameter. From collinw at gmail.com Wed Aug 16 00:29:48 2006 From: collinw at gmail.com (Collin Winter) Date: Tue, 15 Aug 2006 17:29:48 -0500 Subject: [Python-3000] Fwd: Conventions for annotation consumers (was: Re: Draft pre-PEP: function annotations) In-Reply-To: <76fd5acf0608151520n7692824asf06a849ac3114e5b@mail.gmail.com> References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com> <43aa6ff70608150836o8f9970dr3974935edefb9f3d@mail.gmail.com> <1cb725390608151109t8c58f89p43ba13472031201a@mail.gmail.com> <43aa6ff70608151128u720b59ecq7d6831177452ebea@mail.gmail.com> <1cb725390608151207g36a4c692v2c2d49c1a3e821fa@mail.gmail.com> <43aa6ff70608151213v1ced455btdaadb51e1761738d@mail.gmail.com> <1cb725390608151230i298c1889gd93233db2b6f980a@mail.gmail.com> <43aa6ff70608151313h2b945032wa556903f3f9a44c3@mail.gmail.com> <76fd5acf0608151519k564723f9q2d409e3285a7918f@mail.gmail.com> <76fd5acf0608151520n7692824asf06a849ac3114e5b@mail.gmail.com> Message-ID: <43aa6ff70608151529q18748348g3dce7c193450a0fb@mail.gmail.com> On 8/15/06, Calvin Spealman wrote: > On 8/15/06, Collin Winter wrote: >> 1) Static analysis tools (pychecker, optimising compilers, etc) must >> be able to use the annotations > > As in any example given so far, the annotations would be instansiated > within the function definition itself, which means the form 'def > foo(a: Bar(baz))' is to be expected. This form could even be > documented as the prefered way, as opposed to instansiating the > annotation object before hand and simply using its name in the > function definition. This leads to simple parsing by external tools, > which would be able to deduce what bar is (because before that line > there was an 'from bar import Bar'. How exactly do they "deduce" what Bar is, just from the "from bar import Bar" line? pychecker would have to import and compile the Bar module first. What if being able to import bar depends on some import hooks that some other module (imported before bar) installed? I guess you'd have to follow the entire import graph just to make sure. Oh, and you'd have to end up running the module being analysed in case *it* installs some import hooks -- or maybe it defines Bar itself. Your proposal isn't workable. Collin Winter From jimjjewett at gmail.com Wed Aug 16 01:08:41 2006 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 15 Aug 2006 19:08:41 -0400 Subject: [Python-3000] Conventions for annotation consumers (was: Re: Draft pre-PEP: function annotations) In-Reply-To: <43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com> References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com> <43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com> <43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com> Message-ID: On 8/15/06, Collin Winter wrote: > Here's another stab at my earlier idea: ... > We're still using dicts to hold the annotations, but instead of having > the dict keyed on the name (function.__name__) of the annotation > consumer, the keys are arbitrary (for certain values of "arbitrary"). > To enable both in-program and static analysis, the most prominent keys > will be specified by the PEP. In this example, "type" and "doc" are > reserved keys; anything that needs the intended type of an annotation > will look at the "type" key, anything that's looking for special doc > strings will look at the "doc" key. Any other consumers are free to > define whatever keys they want (e.g., "constrain_values", above), so > long as they stay away from the reserved strings. That seems to get the worst of both worlds. Static tools will now know that something is intended to express type information, but still won't know whether it describes typical usage, an invariant, or an adapter that will make any argument work. Meanwhile, two different systems can still clash on "constrain_values" (as well as "type"), without the benefit of an actual type object (or a name associated with an import) to disambiguate. > 1) Static analysis tools (pychecker, optimising compilers, etc) must > be able to use the annotations If the ownership is by object type, then static tools can get at least a pretty good idea by looking at the name used to construct those types. Realistically, if >>> from zope.mypackage import something as anno1 ... >>> def f(a:anno1("asfd")) does not provide enough information, then nothing static ever will. -jJ From jimjjewett at gmail.com Wed Aug 16 01:22:24 2006 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 15 Aug 2006 19:22:24 -0400 Subject: [Python-3000] Conventions for annotation consumers (was: Re: Draft pre-PEP: function annotations) In-Reply-To: <43aa6ff70608151313h2b945032wa556903f3f9a44c3@mail.gmail.com> References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com> <43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com> <1cb725390608150818r5e95b3fdw31b6998bc32051bb@mail.gmail.com> <43aa6ff70608150836o8f9970dr3974935edefb9f3d@mail.gmail.com> <1cb725390608151109t8c58f89p43ba13472031201a@mail.gmail.com> <43aa6ff70608151128u720b59ecq7d6831177452ebea@mail.gmail.com> <1cb725390608151207g36a4c692v2c2d49c1a3e821fa@mail.gmail.com> <43aa6ff70608151213v1ced455btdaadb51e1761738d@mail.gmail.com> <1cb725390608151230i298c1889gd93233db2b6f980a@mail.gmail.com> <43aa6ff70608151313h2b945032wa556903f3f9a44c3@mail.gmail.com> Message-ID: On 8/15/06, Collin Winter wrote: > ... that users won't be using multiple type systems on the same parameter > (and if they are, that their own problem); for "doc" is that a > docstring is just a Python string, and there's really only own way to > look at that within the scope of documentation strings. oh ye of little cynicism. (1) I might well restrict *myself* to a single type system. But that doesn't mean I don't ever want to use someone else's modules, or that I don't want a doc tool to handle them. (2) doc strings already exist, and have already grown inconsistent microstructure. """one line summary -- may or may not include the call signature Longer documentation, which may or may not also include doctests or ReST or html or sample calls in a non-doctest format or magic tokens used by various frameworks, such as Design By Contract wrappers. Oh, and that first blank line? Some tools rely on it. Some functions don't use it. Of course, some functions don't use docstrings at all, because the writers are already afraid that a framework like unittest will misinterpret them.""" -jJ From tim.hochberg at ieee.org Wed Aug 16 01:26:30 2006 From: tim.hochberg at ieee.org (Tim Hochberg) Date: Tue, 15 Aug 2006 16:26:30 -0700 Subject: [Python-3000] Fwd: Conventions for annotation consumers (was: Re: Draft pre-PEP: function annotations) In-Reply-To: <43aa6ff70608151529q18748348g3dce7c193450a0fb@mail.gmail.com> References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com> <43aa6ff70608150836o8f9970dr3974935edefb9f3d@mail.gmail.com> <1cb725390608151109t8c58f89p43ba13472031201a@mail.gmail.com> <43aa6ff70608151128u720b59ecq7d6831177452ebea@mail.gmail.com> <1cb725390608151207g36a4c692v2c2d49c1a3e821fa@mail.gmail.com> <43aa6ff70608151213v1ced455btdaadb51e1761738d@mail.gmail.com> <1cb725390608151230i298c1889gd93233db2b6f980a@mail.gmail.com> <43aa6ff70608151313h2b945032wa556903f3f9a44c3@mail.gmail.com> <76fd5acf0608151519k564723f9q2d409e3285a7918f@mail.gmail.com> <76fd5acf0608151520n7692824asf06a849ac3114e5b@mail.gmail.com> <43aa6ff70608151529q18748348g3dce7c193450a0fb@mail.gmail.com> Message-ID: Collin Winter wrote: > On 8/15/06, Calvin Spealman wrote: >> On 8/15/06, Collin Winter wrote: >>> 1) Static analysis tools (pychecker, optimising compilers, etc) must >>> be able to use the annotations >> As in any example given so far, the annotations would be instansiated >> within the function definition itself, which means the form 'def >> foo(a: Bar(baz))' is to be expected. This form could even be >> documented as the prefered way, as opposed to instansiating the >> annotation object before hand and simply using its name in the >> function definition. This leads to simple parsing by external tools, >> which would be able to deduce what bar is (because before that line >> there was an 'from bar import Bar'. > > How exactly do they "deduce" what Bar is, just from the "from bar > import Bar" line? pychecker would have to import and compile the Bar > module first. What if being able to import bar depends on some import > hooks that some other module (imported before bar) installed? I guess > you'd have to follow the entire import graph just to make sure. Oh, > and you'd have to end up running the module being analysed in case > *it* installs some import hooks -- or maybe it defines Bar itself. Why does PyChecker need to "deduce" what Bar is at all? It seems that either bar.Bar is something that PyChecker knows about, because it indicates something that it knows how to check. Or, it's something it doesn't know about in which case it can safely ignore it. I fail to see any significant difference in def foo(a: Bar(baz)): ... versus def foo(a: {'Bar' : baz}): ... except that the latter is harder to read and more prone to name colisions. > > Your proposal isn't workable. I, at least, fail to see why at this point -tim > From pje at telecommunity.com Wed Aug 16 02:17:29 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 15 Aug 2006 20:17:29 -0400 Subject: [Python-3000] Fwd: Conventions for annotation consumers In-Reply-To: Message-ID: <5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com> At 05:29 PM 8/16/2006 -0500, "Collin Winter" wrote: >On 8/15/06, Calvin Spealman wrote: > > On 8/15/06, Collin Winter wrote: > >> 1) Static analysis tools (pychecker, optimising compilers, etc) must > >> be able to use the annotations > > > > As in any example given so far, the annotations would be instansiated > > within the function definition itself, which means the form 'def > > foo(a: Bar(baz))' is to be expected. This form could even be > > documented as the prefered way, as opposed to instansiating the > > annotation object before hand and simply using its name in the > > function definition. This leads to simple parsing by external tools, > > which would be able to deduce what bar is (because before that line > > there was an 'from bar import Bar'. > >How exactly do they "deduce" what Bar is, just from the "from bar >import Bar" line? pychecker would have to import and compile the Bar >module first. What if being able to import bar depends on some import >hooks that some other module (imported before bar) installed? I guess >you'd have to follow the entire import graph just to make sure. Oh, >and you'd have to end up running the module being analysed in case >*it* installs some import hooks -- or maybe it defines Bar itself. > >Your proposal isn't workable. By that logic, neither is Python. :) I think you mean the reverse; the proposal instead shows that requirement #1 is what's not workable here. I'm frankly baffled by the amount of "protect users from incompatibility" ranting that this issue has generated. If I wanted to use Java, I'd know where to find it. Guido has said time and again that Python's balance favors the individual developer at the expense of the group where "consenting adults" is concerned, and Py3K isn't intended to change that balance. Personally, I thought Guido's original proposal for function annotations, which included a __typecheck__ operator that was replaceable on a per-module basis (and defaulted to a no-op), was the perfect thing -- neither too much semantics nor too-little. I'd like to have it back, please. :) From guido at python.org Wed Aug 16 02:21:27 2006 From: guido at python.org (Guido van Rossum) Date: Tue, 15 Aug 2006 17:21:27 -0700 Subject: [Python-3000] threading, part 2 In-Reply-To: <1d85506f0608110033k2eac1f9h10908ddbef5db8c3@mail.gmail.com> References: <1d85506f0608101214g594d2dal282ab2ae60f29f11@mail.gmail.com> <1d85506f0608110033k2eac1f9h10908ddbef5db8c3@mail.gmail.com> Message-ID: On 8/11/06, tomer filiba wrote: > [Guido] > > I expect that Jython doesn't implement this; it doesn't handle ^C either AFAIK. > > threads are at most platform agnostic (old unices, embedded systems, etc. > are not likely to have thread support) I'm not sure what "platform agnostic" means to you. I think you mean "a platform dependent optional feature"? > so keeping this in mind, and having interrupt_main part of the standard > thread API, which as you say, may not be implementation agnostic, > why is thread.raise_exc(id, excobj) a bad API? Because it is more general than interrupt_main(). I'm happy to declare the latter a CPython exclusive feature that not all other platforms may support even if they have threads. raise_exc() would have at best the same status; I imagine the set of platforms where it can be implemented is smaller than the set of platforms that can support interrupt_main(). > and as i recall, dotNET's Thread.AbortThread or whatever it's called > works that way (raising an exception in the other thread), so IronPython > for once, should be happy with it. But Jython? > by the way, is the GIL part of the python standard? i.e., does IronPython > implement it, although it shouldn't be necessary in dotNET? No. Neither Jython nor IronPython have it. But since the presence of the GIL is never directly detectable from Python code, I'm not sure how it *could* be part of the language standard. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Aug 16 02:23:42 2006 From: guido at python.org (Guido van Rossum) Date: Tue, 15 Aug 2006 17:23:42 -0700 Subject: [Python-3000] threading, part 2 In-Reply-To: References: <1d85506f0608101214g594d2dal282ab2ae60f29f11@mail.gmail.com> <1d85506f0608110033k2eac1f9h10908ddbef5db8c3@mail.gmail.com> Message-ID: On 8/11/06, Jason Orendorff wrote: > On 8/11/06, tomer filiba wrote: > > why is thread.raise_exc(id, excobj) a bad API? > > It breaks seemingly innocent code in subtle ways. Worse, the breakage > will always be a race condition, so it'll be especially hard to > reproduce and debug. So is KeyboardInterrupt. But at least that can't be raised in threads. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Aug 16 02:28:21 2006 From: guido at python.org (Guido van Rossum) Date: Tue, 15 Aug 2006 17:28:21 -0700 Subject: [Python-3000] threading, part 2 In-Reply-To: References: <20060811102346.EFC4.SLAWOMIR.NOWACZYK.847@student.lu.se> <20060811082620.192E.JCARLSON@uci.edu> Message-ID: On 8/11/06, Jason Orendorff wrote: > On 8/11/06, Josiah Carlson wrote: > > Slawomir Nowaczyk wrote: > > > But it should not be done lightly and never when the code is not > > > specifically expecting it. > > > > If you don't want random exceptions being raised in your threads, then > > don't use this method that is capable of raising exceptions somewhat > > randomly. > > I agree. The only question is how dire the warnings should be. > > I'll answer that question with another question: Are we going to make > the standard library robust against asynchronous exceptions? For > example, class Thread has an attribute __stopped that is set using > code similar to the example code I posted. An exception at just the > wrong time would kill the thread while leaving __stopped == False. > > Maybe that particular case is worth fixing, but to find and fix them > all? Better to put strong warnings on this one method: may cause > unpredictable brokenness. That is a rather special case since this code (unlike most stdlib code) can assume it won't get asynchronous exceptions like KeyboardInterrupt, since that can't be raised in threads. I expect that the unpredictable brokenness is even bigger in most user code -- *most* people can't write threadsafe code if their life depended on it. I believe the only exception I know is Tim Peters. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Aug 16 02:29:32 2006 From: guido at python.org (Guido van Rossum) Date: Tue, 15 Aug 2006 17:29:32 -0700 Subject: [Python-3000] threading, part 2 In-Reply-To: <87fyg32oo8.fsf@qrnik.zagroda> References: <20060811102346.EFC4.SLAWOMIR.NOWACZYK.847@student.lu.se> <20060811082620.192E.JCARLSON@uci.edu> <87fyg32oo8.fsf@qrnik.zagroda> Message-ID: On 8/11/06, Marcin 'Qrczak' Kowalczyk wrote: > I do want asynchronous exceptions, but not anywhere, only in selected > regions (or excluding selected regions). This can be designed well. Please write a proto-PEP. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Aug 16 02:40:29 2006 From: guido at python.org (Guido van Rossum) Date: Tue, 15 Aug 2006 17:40:29 -0700 Subject: [Python-3000] threading, part 2 In-Reply-To: References: <1d85506f0608111713m15cf2e67v8b94f06c928e9125@mail.gmail.com> Message-ID: On 8/14/06, Georg Brandl wrote: > Guido van Rossum wrote: > > On 8/11/06, tomer filiba wrote: > >> i mailed this to several people separately, but then i thought it could > >> benefit the entire group: > >> > >> http://sebulba.wikispaces.com/recipe+thread2 > >> > >> it's an implementation of the proposed " thread.raise_exc", through an extension > >> to the threading.Thread class. you can test it for yourself; if it proves useful, > >> it should be exposed as thread.raise_exc in the stdlib (instead of the ctypes > >> hack)... and of course it should be reflected in threading.Thread as welll. > > > > Cool. Question: what's the problem with raising exception instances? > > Especially in the light of my proposal to use > > > > raise SomeException(42) > > > > in preference over (and perhaps exclusively instead of) > > > > raise SomeException, 42 > > > > in Py3k. The latter IMO is a relic from the days of string exceptions > > which are as numbered as they come. :-) > > I think this is the answer: > > http://mail.python.org/pipermail/python-dev/2006-August/068165.html Hopefully we can fix this in 2.6 or 3.0. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg.ewing at canterbury.ac.nz Wed Aug 16 03:09:54 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 16 Aug 2006 13:09:54 +1200 Subject: [Python-3000] Function annotations considered obfuscatory (Re: Conventions for annotation consumers) In-Reply-To: <43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com> References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com> <43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com> <43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com> Message-ID: <44E27062.2040406@canterbury.ac.nz> Collin Winter wrote: > @docstring > @typechecker > @constrain_values > def foo(a: {'doc': "Frobnication count", > 'type': Number, > 'constrain_values': range(3, 9)}, > b: {'type': Number, > # This can be only 4, 8 or 12 > 'constrain_values': [4, 8, 12]}) -> {'type': Number} There's another thing that's bothering me about all this. The main reason Guido rejected the originally suggested syntax for function decorators was that it put too much stuff into the function header and obscured the signature. Now we seem to be about to open ourselves up to the same problem on an even bigger scale. Who can honestly say that the above function declaration is easy to read? To me it looks downright ugly. -- Greg From ironfroggy at gmail.com Wed Aug 16 03:21:22 2006 From: ironfroggy at gmail.com (Calvin Spealman) Date: Tue, 15 Aug 2006 21:21:22 -0400 Subject: [Python-3000] Fwd: Conventions for annotation consumers (was: Re: Draft pre-PEP: function annotations) In-Reply-To: <43aa6ff70608151529q18748348g3dce7c193450a0fb@mail.gmail.com> References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com> <1cb725390608151109t8c58f89p43ba13472031201a@mail.gmail.com> <43aa6ff70608151128u720b59ecq7d6831177452ebea@mail.gmail.com> <1cb725390608151207g36a4c692v2c2d49c1a3e821fa@mail.gmail.com> <43aa6ff70608151213v1ced455btdaadb51e1761738d@mail.gmail.com> <1cb725390608151230i298c1889gd93233db2b6f980a@mail.gmail.com> <43aa6ff70608151313h2b945032wa556903f3f9a44c3@mail.gmail.com> <76fd5acf0608151519k564723f9q2d409e3285a7918f@mail.gmail.com> <76fd5acf0608151520n7692824asf06a849ac3114e5b@mail.gmail.com> <43aa6ff70608151529q18748348g3dce7c193450a0fb@mail.gmail.com> Message-ID: <76fd5acf0608151821m25ab8048ia6f3d6f288d59338@mail.gmail.com> On 8/15/06, Collin Winter wrote: > On 8/15/06, Calvin Spealman wrote: > > On 8/15/06, Collin Winter wrote: > >> 1) Static analysis tools (pychecker, optimising compilers, etc) must > >> be able to use the annotations > > > > As in any example given so far, the annotations would be instansiated > > within the function definition itself, which means the form 'def > > foo(a: Bar(baz))' is to be expected. This form could even be > > documented as the prefered way, as opposed to instansiating the > > annotation object before hand and simply using its name in the > > function definition. This leads to simple parsing by external tools, > > which would be able to deduce what bar is (because before that line > > there was an 'from bar import Bar'. > > How exactly do they "deduce" what Bar is, just from the "from bar > import Bar" line? pychecker would have to import and compile the Bar > module first. What if being able to import bar depends on some import > hooks that some other module (imported before bar) installed? I guess > you'd have to follow the entire import graph just to make sure. Oh, > and you'd have to end up running the module being analysed in case > *it* installs some import hooks -- or maybe it defines Bar itself. > > Your proposal isn't workable. > > Collin Winter Any external tool, which would need to analyze the annotations statically would either know what the module bar is and what the object bar.Bar is, or it would ignore it. Thus it has no need to import or statically parse the modules imported for annotation objects at all. For example, you may 'from annodoc import doc' and then 'def foo(a: doc("the only argument"))', so a documentation generator would be aware of what the annodoc module was and doesn't need to introspect it in order to understand the annotations. You're outright refusal to accept the arguments against these points of your proposal is dragging this discussion on to an unneeded length. The majority consensus is pointing away from the dictionary multi-annotations you try to propose or the leave-and-let-be stand point you originally tried to keep, while type-based annotations seem much more well agreed upon and has more support. This continually stretching debate needs to reach a consensus and the best recepted idea might not be yours. We really need to see the PEP draft updated to reflect something of a solution to these issues and there is much less debate than the volumn of discussion would suggest, so the answers are clear enough to move forward with. From greg.ewing at canterbury.ac.nz Wed Aug 16 03:32:51 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 16 Aug 2006 13:32:51 +1200 Subject: [Python-3000] Conventions for annotation consumers In-Reply-To: <1cb725390608151230i298c1889gd93233db2b6f980a@mail.gmail.com> References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com> <43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com> <43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com> <1cb725390608150818r5e95b3fdw31b6998bc32051bb@mail.gmail.com> <43aa6ff70608150836o8f9970dr3974935edefb9f3d@mail.gmail.com> <1cb725390608151109t8c58f89p43ba13472031201a@mail.gmail.com> <43aa6ff70608151128u720b59ecq7d6831177452ebea@mail.gmail.com> <1cb725390608151207g36a4c692v2c2d49c1a3e821fa@mail.gmail.com> <43aa6ff70608151213v1ced455btdaadb51e1761738d@mail.gmail.com> <1cb725390608151230i298c1889gd93233db2b6f980a@mail.gmail.com> Message-ID: <44E275C3.2070508@canterbury.ac.nz> Paul Prescod wrote: > What if > two different groups start fighting over the keyword "type" or "doc" or > "lock"? Python already has a module system that allows you to use the > word "type" and me to use the word "type" without conflict But, in general, performing this disambiguation requires executing the module that is making the annotations. For a processor that only wants to deal with the source, this is undesirable. -- Greg From ironfroggy at gmail.com Wed Aug 16 04:18:15 2006 From: ironfroggy at gmail.com (Calvin Spealman) Date: Tue, 15 Aug 2006 22:18:15 -0400 Subject: [Python-3000] Conventions for annotation consumers In-Reply-To: <44E275C3.2070508@canterbury.ac.nz> References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com> <43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com> <1cb725390608150818r5e95b3fdw31b6998bc32051bb@mail.gmail.com> <43aa6ff70608150836o8f9970dr3974935edefb9f3d@mail.gmail.com> <1cb725390608151109t8c58f89p43ba13472031201a@mail.gmail.com> <43aa6ff70608151128u720b59ecq7d6831177452ebea@mail.gmail.com> <1cb725390608151207g36a4c692v2c2d49c1a3e821fa@mail.gmail.com> <43aa6ff70608151213v1ced455btdaadb51e1761738d@mail.gmail.com> <1cb725390608151230i298c1889gd93233db2b6f980a@mail.gmail.com> <44E275C3.2070508@canterbury.ac.nz> Message-ID: <76fd5acf0608151918j3d572b7cq9d61b5170ce966a3@mail.gmail.com> On 8/15/06, Greg Ewing wrote: > Paul Prescod wrote: > > What if > > two different groups start fighting over the keyword "type" or "doc" or > > "lock"? Python already has a module system that allows you to use the > > word "type" and me to use the word "type" without conflict > > But, in general, performing this disambiguation requires > executing the module that is making the annotations. For > a processor that only wants to deal with the source, this > is undesirable. The path to the module should be considered more like a namespace identifier. When I see the annotation Number is in annolib.types, 'annolib.types' can be taken as a unique namespace identifier to understand the context of the name 'Number'. This doesn't need any processing of the annolib.types module itself, because the contents of that module are not important, only the name. From guido at python.org Wed Aug 16 06:04:41 2006 From: guido at python.org (Guido van Rossum) Date: Tue, 15 Aug 2006 21:04:41 -0700 Subject: [Python-3000] Function annotations considered obfuscatory (Re: Conventions for annotation consumers) In-Reply-To: <44E27062.2040406@canterbury.ac.nz> References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com> <43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com> <43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com> <44E27062.2040406@canterbury.ac.nz> Message-ID: On 8/15/06, Greg Ewing wrote: > Collin Winter wrote: > > > @docstring > > @typechecker > > @constrain_values > > def foo(a: {'doc': "Frobnication count", > > 'type': Number, > > 'constrain_values': range(3, 9)}, > > b: {'type': Number, > > # This can be only 4, 8 or 12 > > 'constrain_values': [4, 8, 12]}) -> {'type': Number} > > There's another thing that's bothering me about all this. > The main reason Guido rejected the originally suggested > syntax for function decorators was that it put too much > stuff into the function header and obscured the signature. > > Now we seem to be about to open ourselves up to the > same problem on an even bigger scale. Who can honestly > say that the above function declaration is easy to read? > To me it looks downright ugly. It's a worse-case scenario suggesting how one could solve a very hairy problem. I don't expect that something this extreme will be at all common (otherwise I'd be against it too). PS. http://meyerweb.com/eric/comment/chech.html -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Aug 16 06:13:11 2006 From: guido at python.org (Guido van Rossum) Date: Tue, 15 Aug 2006 21:13:11 -0700 Subject: [Python-3000] Bound and unbound methods In-Reply-To: <44DFE092.8030604@canterbury.ac.nz> References: <44DF0D38.6070507@acm.org> <20060813102036.1985.JCARLSON@uci.edu> <44DF86AA.7050207@acm.org> <44DFE092.8030604@canterbury.ac.nz> Message-ID: On 8/13/06, Greg Ewing wrote: > Talin wrote: > > the compiler would note the combination of the attribute access and the > > call, and combine them into an opcode that skips the whole method > > creation step. > > Something like that could probably be made to work. You'd > want to be careful to do the optimisation only when the > attribute in question is an ordinary attribute, not > a property or other descriptor. > > I'm also -1 on eliminating bound methods entirely. > I worked through that idea in considerable depth during my > discussions with the author of Prothon, which was also to > have been without any notion of bound methods. The > consequences are further-reaching than you might think at > first. The bottom line is that without bound methods, > Python wouldn't really be Python any more. Right. I'm against anything that changes the current semantics. I'm all for a compiler optimization that turns " . ( )" into a single opcode that somehow manages to avoid creating the bound object. As long as it also does the right thing in case the name refers to something that's not quite a standard method -- be it a class method or static method, or a class, or anything else callable (or even not callable :-). And it would be fine if that optimization wasn't used if there are keyword arguments, or *args or **kwds, or more than N arguments for some N > 3 or so. But, as Thomas says, it was tried before and didn't quite work. Maybe we can borrow some ideas from IronPython, which boasts a 7x faster method call (or was it function call? it was a call anyway); and according to Jim Hugunin only half of that speed-up (on a linear or logarithmic scale? he didn't say) can be explained through the .NET JIT. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From paul at prescod.net Wed Aug 16 07:22:34 2006 From: paul at prescod.net (Paul Prescod) Date: Tue, 15 Aug 2006 22:22:34 -0700 Subject: [Python-3000] Fwd: Conventions for annotation consumers (was: Re: Draft pre-PEP: function annotations) In-Reply-To: <43aa6ff70608151529q18748348g3dce7c193450a0fb@mail.gmail.com> References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com> <1cb725390608151109t8c58f89p43ba13472031201a@mail.gmail.com> <43aa6ff70608151128u720b59ecq7d6831177452ebea@mail.gmail.com> <1cb725390608151207g36a4c692v2c2d49c1a3e821fa@mail.gmail.com> <43aa6ff70608151213v1ced455btdaadb51e1761738d@mail.gmail.com> <1cb725390608151230i298c1889gd93233db2b6f980a@mail.gmail.com> <43aa6ff70608151313h2b945032wa556903f3f9a44c3@mail.gmail.com> <76fd5acf0608151519k564723f9q2d409e3285a7918f@mail.gmail.com> <76fd5acf0608151520n7692824asf06a849ac3114e5b@mail.gmail.com> <43aa6ff70608151529q18748348g3dce7c193450a0fb@mail.gmail.com> Message-ID: <1cb725390608152222j32727946ob3c07e43fd004299@mail.gmail.com> On 8/15/06, Collin Winter wrote: > > > How exactly do they "deduce" what Bar is, just from the "from bar > import Bar" line? pychecker would have to import and compile the Bar > module first. What if being able to import bar depends on some import > hooks that some other module (imported before bar) installed? I guess > you'd have to follow the entire import graph just to make sure. Oh, > and you'd have to end up running the module being analysed in case > *it* installs some import hooks -- or maybe it defines Bar itself. The end-user and the type checker creator can negotiate the boundary between convenience and easy to parse syntax. At first the type checker creator might say that things must be in a very predictable form with no variants and no renames. Then they might do a bit more analysis and be able to handle renames. Then they might evolve towards whole-program analysis and be able to handle very complicated imports. Surely you know that decorators can also be renamed, imported, etc. Same with base classes (which are considered key to type checking). This is just how Python works. Where people need to use static subsets of Python (like RPython, or the "freeze" program or the compilable subset used by Jython) they just define the subset and move on. The languages' core behaviour is defined dynamically. Paul Prescod -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060815/fa53a041/attachment.html From paul at prescod.net Wed Aug 16 07:34:46 2006 From: paul at prescod.net (Paul Prescod) Date: Tue, 15 Aug 2006 22:34:46 -0700 Subject: [Python-3000] Conventions for annotation consumers In-Reply-To: <44E275C3.2070508@canterbury.ac.nz> References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com> <43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com> <1cb725390608150818r5e95b3fdw31b6998bc32051bb@mail.gmail.com> <43aa6ff70608150836o8f9970dr3974935edefb9f3d@mail.gmail.com> <1cb725390608151109t8c58f89p43ba13472031201a@mail.gmail.com> <43aa6ff70608151128u720b59ecq7d6831177452ebea@mail.gmail.com> <1cb725390608151207g36a4c692v2c2d49c1a3e821fa@mail.gmail.com> <43aa6ff70608151213v1ced455btdaadb51e1761738d@mail.gmail.com> <1cb725390608151230i298c1889gd93233db2b6f980a@mail.gmail.com> <44E275C3.2070508@canterbury.ac.nz> Message-ID: <1cb725390608152234g4010aedbs9cb92c2c361b390f@mail.gmail.com> On 8/15/06, Greg Ewing wrote: > > Paul Prescod wrote: > > What if > > two different groups start fighting over the keyword "type" or "doc" or > > "lock"? Python already has a module system that allows you to use the > > word "type" and me to use the word "type" without conflict > > But, in general, performing this disambiguation requires > executing the module that is making the annotations. For > a processor that only wants to deal with the source, this > is undesirable. This is true for every proposal we've described. Proposal 1 is: Foo(int) Bar(module.type1) Proposal two is: {"Foo": int, "Bar": module.type1} In either case, "int" and "module.type1" can be rebound. To say otherwise is to change Python's evaluation model drastically. >>> int = None >>> float = file >>> Once you accept Python's dynamism, it makes sense to accept it both for the annotation "key" as for the "value". If you can convince Guido and the rest of the Python-dev team to reject it, then you can reject it for both equally. So the issue is a red herring. Paul Prescod -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060815/02465dc1/attachment.htm From ironfroggy at gmail.com Wed Aug 16 07:48:00 2006 From: ironfroggy at gmail.com (Calvin Spealman) Date: Wed, 16 Aug 2006 01:48:00 -0400 Subject: [Python-3000] Bound and unbound methods In-Reply-To: References: <44DF0D38.6070507@acm.org> <20060813102036.1985.JCARLSON@uci.edu> <44DF86AA.7050207@acm.org> <44DFE092.8030604@canterbury.ac.nz> Message-ID: <76fd5acf0608152248j76f38d2x88ba241a8c66c835@mail.gmail.com> On 8/16/06, Guido van Rossum wrote: > On 8/13/06, Greg Ewing wrote: > > Talin wrote: > > > the compiler would note the combination of the attribute access and the > > > call, and combine them into an opcode that skips the whole method > > > creation step. > > > > Something like that could probably be made to work. You'd > > want to be careful to do the optimisation only when the > > attribute in question is an ordinary attribute, not > > a property or other descriptor. > > > > I'm also -1 on eliminating bound methods entirely. > > I worked through that idea in considerable depth during my > > discussions with the author of Prothon, which was also to > > have been without any notion of bound methods. The > > consequences are further-reaching than you might think at > > first. The bottom line is that without bound methods, > > Python wouldn't really be Python any more. > > > Right. I'm against anything that changes the current semantics. I'm > all for a compiler optimization that turns " . ( > )" into a single opcode that somehow manages to avoid creating the > bound object. As long as it also does the right thing in case the name > refers to something that's not quite a standard method -- be it a > class method or static method, or a class, or anything else callable > (or even not callable :-). And it would be fine if that optimization > wasn't used if there are keyword arguments, or *args or **kwds, or > more than N arguments for some N > 3 or so. > > But, as Thomas says, it was tried before and didn't quite work. Maybe > we can borrow some ideas from IronPython, which boasts a 7x faster > method call (or was it function call? it was a call anyway); and > according to Jim Hugunin only half of that speed-up (on a linear or > logarithmic scale? he didn't say) can be explained through the .NET > JIT. > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) Would a possible special method name __methodcall__ be accepted, where if it exists on a callable, you can expect to use it as __call__ but with the understanding that it accepts as self when called in an optimizable form? This would reduce the method call to two attribute lookups before the call instead of an instansiation and all the heavy lifting currently done. For normal functions, 'f.__methodcall__ is f.__call__' may be true, but the existance of that __methodcall__ name just gives you an extra contract. From jimjjewett at gmail.com Wed Aug 16 16:26:44 2006 From: jimjjewett at gmail.com (Jim Jewett) Date: Wed, 16 Aug 2006 10:26:44 -0400 Subject: [Python-3000] Function annotations considered obfuscatory (Re: Conventions for annotation consumers) In-Reply-To: References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com> <43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com> <43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com> <44E27062.2040406@canterbury.ac.nz> Message-ID: On 8/16/06, Guido van Rossum wrote: > On 8/15/06, Greg Ewing wrote: [9 lines for a two argument def] > > There's another thing that's bothering me about all this. > > The main reason Guido rejected the originally suggested > > syntax for function decorators was that it put too much > > stuff into the function header and obscured the signature. > It's a worse-case scenario suggesting how one could solve a very hairy > problem. I don't expect that something this extreme will be at all > common (otherwise I'd be against it too). Yes and no; I don't think it will be that uncommon to have multiple annotations, somewhat similar to "public static final int". Also note that needing to disambiguate the annotations will tend to increase their length. I hope that needing more than one line per argument will be unusual, but needing more than one line for a definition may not be. That is one reason I wonder whether all annotations/modifications have to actually be part of the prologue, or whether they could be applied to the Signature afterwards. -jJ From guido at python.org Wed Aug 16 16:45:46 2006 From: guido at python.org (Guido van Rossum) Date: Wed, 16 Aug 2006 07:45:46 -0700 Subject: [Python-3000] Function annotations considered obfuscatory (Re: Conventions for annotation consumers) In-Reply-To: References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com> <43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com> <43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com> <44E27062.2040406@canterbury.ac.nz> Message-ID: On 8/16/06, Jim Jewett wrote: > Yes and no; I don't think it will be that uncommon to have multiple > annotations, somewhat similar to "public static final int". Also note > that needing to disambiguate the annotations will tend to increase > their length. God save us from public static final int. > I hope that needing more than one line per argument will be unusual, > but needing more than one line for a definition may not be. I expect the latter will be too, as it would only matter for code that somehow straddles two or more frameworks. > That is one reason I wonder whether all annotations/modifications have > to actually be part of the prologue, or whether they could be applied > to the Signature afterwards. And how would that reduce the clutter? The information still has to be entered by the user, presumably with the same disambiguating tags, and some punctuation. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From collinw at gmail.com Wed Aug 16 17:09:39 2006 From: collinw at gmail.com (Collin Winter) Date: Wed, 16 Aug 2006 10:09:39 -0500 Subject: [Python-3000] Fwd: Conventions for annotation consumers In-Reply-To: <5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com> References: <5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com> Message-ID: <43aa6ff70608160809qb8882e1m6b471fda3eee8d10@mail.gmail.com> On 8/15/06, Phillip J. Eby wrote: > Personally, I thought Guido's original proposal for function annotations, > which included a __typecheck__ operator that was replaceable on a > per-module basis (and defaulted to a no-op), was the perfect thing -- > neither too much semantics nor too-little. I'd like to have it back, > please. :) I'd be perfectly happy to go back to talking about "type annotations", rather than the more general "function annotations", especially since most of the discussion thus far has been about how to multiple things with annotations at the same time. Restricting annotations to type information would be fine by me. Collin Winter From guido at python.org Wed Aug 16 17:45:12 2006 From: guido at python.org (Guido van Rossum) Date: Wed, 16 Aug 2006 08:45:12 -0700 Subject: [Python-3000] Conventions for annotation consumers In-Reply-To: <44E275C3.2070508@canterbury.ac.nz> References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com> <43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com> <1cb725390608150818r5e95b3fdw31b6998bc32051bb@mail.gmail.com> <43aa6ff70608150836o8f9970dr3974935edefb9f3d@mail.gmail.com> <1cb725390608151109t8c58f89p43ba13472031201a@mail.gmail.com> <43aa6ff70608151128u720b59ecq7d6831177452ebea@mail.gmail.com> <1cb725390608151207g36a4c692v2c2d49c1a3e821fa@mail.gmail.com> <43aa6ff70608151213v1ced455btdaadb51e1761738d@mail.gmail.com> <1cb725390608151230i298c1889gd93233db2b6f980a@mail.gmail.com> <44E275C3.2070508@canterbury.ac.nz> Message-ID: On 8/15/06, Greg Ewing wrote: > But, in general, performing this disambiguation requires > executing the module that is making the annotations. For > a processor that only wants to deal with the source, this > is undesirable. Um, when did we start off in the direction of source-level processing of function annotations? Are we still talking about Python? I'm confused (especially since this thread seems to start in the middle). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Aug 16 17:48:46 2006 From: guido at python.org (Guido van Rossum) Date: Wed, 16 Aug 2006 08:48:46 -0700 Subject: [Python-3000] Bound and unbound methods In-Reply-To: <76fd5acf0608152248j76f38d2x88ba241a8c66c835@mail.gmail.com> References: <44DF0D38.6070507@acm.org> <20060813102036.1985.JCARLSON@uci.edu> <44DF86AA.7050207@acm.org> <44DFE092.8030604@canterbury.ac.nz> <76fd5acf0608152248j76f38d2x88ba241a8c66c835@mail.gmail.com> Message-ID: On 8/15/06, Calvin Spealman wrote: > On 8/16/06, Guido van Rossum wrote: > > Right. I'm against anything that changes the current semantics. I'm > > all for a compiler optimization that turns " . ( > > )" into a single opcode that somehow manages to avoid creating the > > bound object. As long as it also does the right thing in case the name > > refers to something that's not quite a standard method -- be it a > > class method or static method, or a class, or anything else callable > > (or even not callable :-). And it would be fine if that optimization > > wasn't used if there are keyword arguments, or *args or **kwds, or > > more than N arguments for some N > 3 or so. > > > > But, as Thomas says, it was tried before and didn't quite work. Maybe > > we can borrow some ideas from IronPython, which boasts a 7x faster > > method call (or was it function call? it was a call anyway); and > > according to Jim Hugunin only half of that speed-up (on a linear or > > logarithmic scale? he didn't say) can be explained through the .NET > > JIT. > Would a possible special method name __methodcall__ be accepted, where > if it exists on a callable, you can expect to use it as __call__ but > with the understanding that it accepts as self when called in > an optimizable form? This would reduce the method call to two > attribute lookups before the call instead of an instansiation and all > the heavy lifting currently done. For normal functions, > 'f.__methodcall__ is f.__call__' may be true, but the existance of > that __methodcall__ name just gives you an extra contract. I'd like to answer "no" (since I think this whole idea is not a very fruitful avenue) but frankly, I have no idea what you are trying to describe. Are you even aware of the descriptor protocol (__get__) and how it's used to create a bound method (or something else)? No reply is needed. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From pje at telecommunity.com Wed Aug 16 18:35:00 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed, 16 Aug 2006 12:35:00 -0400 Subject: [Python-3000] Fwd: Conventions for annotation consumers In-Reply-To: <43aa6ff70608160809qb8882e1m6b471fda3eee8d10@mail.gmail.com > References: <5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com> <5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20060816123318.02406018@sparrow.telecommunity.com> At 10:09 AM 8/16/2006 -0500, Collin Winter wrote: >On 8/15/06, Phillip J. Eby wrote: >>Personally, I thought Guido's original proposal for function annotations, >>which included a __typecheck__ operator that was replaceable on a >>per-module basis (and defaulted to a no-op), was the perfect thing -- >>neither too much semantics nor too-little. I'd like to have it back, >>please. :) > >I'd be perfectly happy to go back to talking about "type annotations", >rather than the more general "function annotations", especially since >most of the discussion thus far has been about how to multiple things >with annotations at the same time. Restricting annotations to type >information would be fine by me. Who said anything about restricting annotations to type information? I just said I liked Guido's original proposal better -- because it doesn't restrict a darned thing, and makes it clear that the semantics are up to you. The annotations of course should still be exposed as a function attribute. From paul at prescod.net Wed Aug 16 18:38:21 2006 From: paul at prescod.net (Paul Prescod) Date: Wed, 16 Aug 2006 09:38:21 -0700 Subject: [Python-3000] Fwd: Conventions for annotation consumers In-Reply-To: <43aa6ff70608160809qb8882e1m6b471fda3eee8d10@mail.gmail.com> References: <5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com> <43aa6ff70608160809qb8882e1m6b471fda3eee8d10@mail.gmail.com> Message-ID: <1cb725390608160938h7ddcd317o39e21aac0416a432@mail.gmail.com> On 8/16/06, Collin Winter wrote: > > I'd be perfectly happy to go back to talking about "type annotations", > rather than the more general "function annotations", especially since > most of the discussion thus far has been about how to multiple things > with annotations at the same time. Restricting annotations to type > information would be fine by me. I don't understand why we would want to go backwards. You wrote a PEP. We haven't suggested any major technical changes to it, rather just a few guidelines. How would restricting the domain of the PEP solve any issues about dynamicity? By the way, I think it may be naive to presume that there is only one relevant type system. People may well want to establish mappings from their types to programming language types. For example, to COM types, .NET types and Java types. 80% of these may be inferencable from platform-independent declarations but the other 20% may require a second layer of platform-specific type declarations. Paul Prescod -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060816/bafe8aa0/attachment.html From collinw at gmail.com Wed Aug 16 18:41:38 2006 From: collinw at gmail.com (Collin Winter) Date: Wed, 16 Aug 2006 11:41:38 -0500 Subject: [Python-3000] Fwd: Conventions for annotation consumers In-Reply-To: <5.1.1.6.0.20060816123318.02406018@sparrow.telecommunity.com> References: <5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com> <5.1.1.6.0.20060816123318.02406018@sparrow.telecommunity.com> Message-ID: <43aa6ff70608160941j724324b2kd8653df2374778be@mail.gmail.com> On 8/16/06, Phillip J. Eby wrote: > At 10:09 AM 8/16/2006 -0500, Collin Winter wrote: > >On 8/15/06, Phillip J. Eby wrote: > >>Personally, I thought Guido's original proposal for function annotations, > >>which included a __typecheck__ operator that was replaceable on a > >>per-module basis (and defaulted to a no-op), was the perfect thing -- > >>neither too much semantics nor too-little. I'd like to have it back, > >>please. :) > > > >I'd be perfectly happy to go back to talking about "type annotations", > >rather than the more general "function annotations", especially since > >most of the discussion thus far has been about how to multiple things > >with annotations at the same time. Restricting annotations to type > >information would be fine by me. > > Who said anything about restricting annotations to type information? I > just said I liked Guido's original proposal better -- because it doesn't > restrict a darned thing, and makes it clear that the semantics are up to you. > > The annotations of course should still be exposed as a function attribute. Sorry, I meant "restrict" as in having it stated that the annotations are for typechecking, rather than attempting to support a dozen different uses simultaneously. The annotations would still be free-form, with the semantics up to whoever's implementing the __typecheck__ function, and Python itself wouldn't take any steps to enforce what can or can't go in the annotations. Is this more along the lines of what you meant? Collin Winter From pje at telecommunity.com Wed Aug 16 18:54:02 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed, 16 Aug 2006 12:54:02 -0400 Subject: [Python-3000] Fwd: Conventions for annotation consumers In-Reply-To: <43aa6ff70608160941j724324b2kd8653df2374778be@mail.gmail.co m> References: <5.1.1.6.0.20060816123318.02406018@sparrow.telecommunity.com> <5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com> <5.1.1.6.0.20060816123318.02406018@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20060816124756.023cc448@sparrow.telecommunity.com> At 11:41 AM 8/16/2006 -0500, Collin Winter wrote: >On 8/16/06, Phillip J. Eby wrote: >>At 10:09 AM 8/16/2006 -0500, Collin Winter wrote: >> >On 8/15/06, Phillip J. Eby wrote: >> >>Personally, I thought Guido's original proposal for function annotations, >> >>which included a __typecheck__ operator that was replaceable on a >> >>per-module basis (and defaulted to a no-op), was the perfect thing -- >> >>neither too much semantics nor too-little. I'd like to have it back, >> >>please. :) >> > >> >I'd be perfectly happy to go back to talking about "type annotations", >> >rather than the more general "function annotations", especially since >> >most of the discussion thus far has been about how to multiple things >> >with annotations at the same time. Restricting annotations to type >> >information would be fine by me. >> >>Who said anything about restricting annotations to type information? I >>just said I liked Guido's original proposal better -- because it doesn't >>restrict a darned thing, and makes it clear that the semantics are up to you. >> >>The annotations of course should still be exposed as a function attribute. > >Sorry, I meant "restrict" as in having it stated that the annotations >are for typechecking, rather than attempting to support a dozen >different uses simultaneously. The annotations would still be >free-form, with the semantics up to whoever's implementing the >__typecheck__ function, and Python itself wouldn't take any steps to >enforce what can or can't go in the annotations. > >Is this more along the lines of what you meant? Yes, but it doesn't mean that the notion of "type" may not be fairly expansive. For example, I can foresee wanting to use this "type" information to manage marshalling from web forms or XML-RPC requests... defining command-line options and help... GUI field/widget information for command objects, and so on. In other words, I want open-ended annotation semantics to allow all sorts of metadata-driven behavior. I think the notion that there's a problem with "attempting to support a dozen different uses simultaneously" is a red herring. Docstrings and function attributes do just that, and civilization as we know it has not collapsed. From paul at prescod.net Wed Aug 16 18:55:31 2006 From: paul at prescod.net (Paul Prescod) Date: Wed, 16 Aug 2006 09:55:31 -0700 Subject: [Python-3000] Fwd: Conventions for annotation consumers In-Reply-To: <43aa6ff70608160941j724324b2kd8653df2374778be@mail.gmail.com> References: <5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com> <5.1.1.6.0.20060816123318.02406018@sparrow.telecommunity.com> <43aa6ff70608160941j724324b2kd8653df2374778be@mail.gmail.com> Message-ID: <1cb725390608160955y6a9776c8x4db1cab893a24875@mail.gmail.com> On 8/16/06, Collin Winter wrote: > > Sorry, I meant "restrict" as in having it stated that the annotations > are for typechecking, rather than attempting to support a dozen > different uses simultaneously. The annotations would still be > free-form, with the semantics up to whoever's implementing the > __typecheck__ function, and Python itself wouldn't take any steps to > enforce what can or can't go in the annotations. Nobody every suggested that Python should take any steps to enforce what can or can't go in the annotations! It seems that we're inventing disagreement where there is none. All I ever suggested is a) that we put some guidelines in the spec *discouraging* people from using built-in Python types for their own private meanings without some kind of discriminator clarifying that they are doing so and b) that we define the shared meanings of a couple of useful types: lists and tuples. This leaves the Python development team the maximum latitude to specify the meanings for the other types (especially type type) later. Paul Prescod -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060816/2505b89f/attachment.htm From guido at python.org Wed Aug 16 18:57:20 2006 From: guido at python.org (Guido van Rossum) Date: Wed, 16 Aug 2006 09:57:20 -0700 Subject: [Python-3000] Fwd: Conventions for annotation consumers (was: Re: Draft pre-PEP: function annotations) In-Reply-To: <1cb725390608152222j32727946ob3c07e43fd004299@mail.gmail.com> References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com> <43aa6ff70608151128u720b59ecq7d6831177452ebea@mail.gmail.com> <1cb725390608151207g36a4c692v2c2d49c1a3e821fa@mail.gmail.com> <43aa6ff70608151213v1ced455btdaadb51e1761738d@mail.gmail.com> <1cb725390608151230i298c1889gd93233db2b6f980a@mail.gmail.com> <43aa6ff70608151313h2b945032wa556903f3f9a44c3@mail.gmail.com> <76fd5acf0608151519k564723f9q2d409e3285a7918f@mail.gmail.com> <76fd5acf0608151520n7692824asf06a849ac3114e5b@mail.gmail.com> <43aa6ff70608151529q18748348g3dce7c193450a0fb@mail.gmail.com> <1cb725390608152222j32727946ob3c07e43fd004299@mail.gmail.com> Message-ID: There's much in this thread that I haven't followed, for lack of time. But it seems clear to me that you've wandered off the path now that you're discussing what should go into the annotations and how to make it so that multiple frameworks can coexist. I don't see how any of that can be analyzed up front -- you have to build an implementation and try to use it and *then* perhaps you can think about the problems that occur. Collin wrote a great PEP that doesn't commit to any kind of semantics for annotations. (I still have to read it more closely, but from skimming, it looks fine.) Let's focus some efforts on implementing that first, and see how we can use it, before we consider the use case of a framework for frameworks. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From paul at prescod.net Wed Aug 16 18:58:29 2006 From: paul at prescod.net (Paul Prescod) Date: Wed, 16 Aug 2006 09:58:29 -0700 Subject: [Python-3000] Fwd: Conventions for annotation consumers In-Reply-To: <1cb725390608160955y6a9776c8x4db1cab893a24875@mail.gmail.com> References: <5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com> <5.1.1.6.0.20060816123318.02406018@sparrow.telecommunity.com> <43aa6ff70608160941j724324b2kd8653df2374778be@mail.gmail.com> <1cb725390608160955y6a9776c8x4db1cab893a24875@mail.gmail.com> Message-ID: <1cb725390608160958s1c8985f3i432ac41cf30d570a@mail.gmail.com> I said "lists and tuples" where I meant "lists and strings". On 8/16/06, Paul Prescod wrote: > > On 8/16/06, Collin Winter wrote: > > > Sorry, I meant "restrict" as in having it stated that the annotations > > are for typechecking, rather than attempting to support a dozen > > different uses simultaneously. The annotations would still be > > free-form, with the semantics up to whoever's implementing the > > __typecheck__ function, and Python itself wouldn't take any steps to > > enforce what can or can't go in the annotations. > > > Nobody every suggested that Python should take any steps to enforce what > can or can't go in the annotations! It seems that we're inventing > disagreement where there is none. All I ever suggested is a) that we put > some guidelines in the spec *discouraging* people from using built-in Python > types for their own private meanings without some kind of discriminator > clarifying that they are doing so and b) that we define the shared meanings > of a couple of useful types: lists and tuples. This leaves the Python > development team the maximum latitude to specify the meanings for the other > types (especially type type) later. > > Paul Prescod > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060816/8810da78/attachment.html From jcarlson at uci.edu Wed Aug 16 19:03:05 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Wed, 16 Aug 2006 10:03:05 -0700 Subject: [Python-3000] Function annotations considered obfuscatory (Re: Conventions for annotation consumers) In-Reply-To: References: Message-ID: <20060816090147.19DA.JCARLSON@uci.edu> "Guido van Rossum" wrote: > On 8/16/06, Jim Jewett wrote: > > That is one reason I wonder whether all annotations/modifications have > > to actually be part of the prologue, or whether they could be applied > > to the Signature afterwards. > > And how would that reduce the clutter? The information still has to be > entered by the user, presumably with the same disambiguating tags, and > some punctuation. I'd say that pulling out annotations from the function signature, which was argued to be the most important piece of information about a function during the decorator discussion, could do at least as much to reduce clutter and increase readability and understandability, as anything else discussed with regards to the PEP so far. To pull back out that 9 line function... > @docstring > @typechecker > @constrain_values > def foo(a: {'doc': "Frobnication count", > 'type': Number, > 'constrain_values': range(3, 9)}, > b: {'type': Number, > # This can be only 4, 8 or 12 > 'constrain_values': [4, 8, 12]}) -> {'type': Number} First cleaning up the annotations to not use a dictionary: @docstring @typechecker @constrain_values def foo(a: [doc("frobination count"), type(Number), constrain_values(range(3,9))], b: [type(Number), # This can be only 4, 8 or 12 constrain_values([4,8,12])]) -> type(Number): Now lets pull those annotations out of the signature... @docstring @typechecker @constrain_values @__signature__([doc("frobination count"), type(Number), constrain_values(range(3,9))], [type(Number), # This can be only 4, 8 or 12 constrain_values((4,8,12))], returns=type(Number)) def foo(a, b): Ultimately the full function definition (including decorators) is just as cluttered, but now we can see that we have a function that takes two arguments, without having to scan for 'name:' . If it is necessary for somone to know what kinds of values, types, docs, etc., then they can use the documentation-producing tool that will hopefully come with their annotation consumer(s). - Josiah P.S. Then there is the blasphemous: @docstring(a="frobination count") @typechecker(a=type(Number), b=type(Number)) @constrain values(a=range(3,9), b=(4,8,12), returns=type(Number)) def foo(a, b): From jimjjewett at gmail.com Wed Aug 16 19:03:12 2006 From: jimjjewett at gmail.com (Jim Jewett) Date: Wed, 16 Aug 2006 13:03:12 -0400 Subject: [Python-3000] Function annotations considered obfuscatory (Re: Conventions for annotation consumers) In-Reply-To: References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com> <43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com> <43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com> <44E27062.2040406@canterbury.ac.nz> Message-ID: On 8/16/06, Guido van Rossum wrote: > On 8/16/06, Jim Jewett wrote: > > I hope that needing more than one line per argument will be unusual, > > but needing more than one line for a definition may not be. > I expect the latter will be too, as it would only matter for code that > somehow straddles two or more frameworks. >>> def f(position:[int, "negative possible"]): ... "int" and the comment are both documentation which doesn't really need any framework. They are both things I would like to see when introspecting that particular function, though perhaps not when just scanning function defs. Together, they're already long enough that I would prefer to see any second argument on its own line. > > That is one reason I wonder whether all annotations/modifications have > > to actually be part of the prologue, or whether they could be applied > > to the Signature afterwards. > And how would that reduce the clutter? The information still has to be > entered by the user, presumably with the same disambiguating tags, and > some punctuation. The summary of a function shows up in its prologue, but the details span the next several lines (the full docstring and body suite). My feeling is that when annotations start to get long, they're details that should no longer be in the summary. -jJ From guido at python.org Wed Aug 16 19:13:42 2006 From: guido at python.org (Guido van Rossum) Date: Wed, 16 Aug 2006 10:13:42 -0700 Subject: [Python-3000] Function annotations considered obfuscatory (Re: Conventions for annotation consumers) In-Reply-To: <20060816090147.19DA.JCARLSON@uci.edu> References: <20060816090147.19DA.JCARLSON@uci.edu> Message-ID: On 8/16/06, Josiah Carlson wrote: > > "Guido van Rossum" wrote: > > On 8/16/06, Jim Jewett wrote: > > > That is one reason I wonder whether all annotations/modifications have > > > to actually be part of the prologue, or whether they could be applied > > > to the Signature afterwards. > > > > And how would that reduce the clutter? The information still has to be > > entered by the user, presumably with the same disambiguating tags, and > > some punctuation. > > I'd say that pulling out annotations from the function signature, which > was argued to be the most important piece of information about a > function during the decorator discussion, could do at least as much to > reduce clutter and increase readability and understandability, as > anything else discussed with regards to the PEP so far. > > To pull back out that 9 line function... > > > @docstring > > @typechecker > > @constrain_values > > def foo(a: {'doc': "Frobnication count", > > 'type': Number, > > 'constrain_values': range(3, 9)}, > > b: {'type': Number, > > # This can be only 4, 8 or 12 > > 'constrain_values': [4, 8, 12]}) -> {'type': Number} > > First cleaning up the annotations to not use a dictionary: > > > @docstring > @typechecker > @constrain_values > def foo(a: [doc("frobination count"), > type(Number), > constrain_values(range(3,9))], > b: [type(Number), > # This can be only 4, 8 or 12 > constrain_values([4,8,12])]) -> type(Number): > > Now lets pull those annotations out of the signature... > > @docstring > @typechecker > @constrain_values > @__signature__([doc("frobination count"), > type(Number), > constrain_values(range(3,9))], > [type(Number), > # This can be only 4, 8 or 12 > constrain_values((4,8,12))], returns=type(Number)) > def foo(a, b): I think you just have disproved your point. Apart from losing a few string quotes this is just as unreadable as the example you started with, and those string quotes were due to a different convention for multiple annotations, not due to moving the information into a descriptor. > Ultimately the full function definition (including decorators) is just > as cluttered, but now we can see that we have a function that takes two > arguments, without having to scan for 'name:' . If it is necessary for > somone to know what kinds of values, types, docs, etc., then they can > use the documentation-producing tool that will hopefully come with their > annotation consumer(s). The whole point of putting decorators up front was so that they share prime real estate ("above the fold" if you will :-) with the function signature. Claiming that what's in the decorators doesn't distract from the def itself doesn't make it true. But, as I said 15 minutes ago, please stop worrying about this so much. Try to implement Collin's PEP (which doesn't have any constraints on the semantics or use of annotations). There's a Py3k sprint at Google (MV and NY) next week -- perhaps we can work on it there! -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Aug 16 19:18:47 2006 From: guido at python.org (Guido van Rossum) Date: Wed, 16 Aug 2006 10:18:47 -0700 Subject: [Python-3000] Fwd: Conventions for annotation consumers In-Reply-To: <43aa6ff70608160941j724324b2kd8653df2374778be@mail.gmail.com> References: <5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com> <5.1.1.6.0.20060816123318.02406018@sparrow.telecommunity.com> <43aa6ff70608160941j724324b2kd8653df2374778be@mail.gmail.com> Message-ID: On 8/16/06, Collin Winter wrote: > Sorry, I meant "restrict" as in having it stated that the annotations > are for typechecking, rather than attempting to support a dozen > different uses simultaneously. The annotations would still be > free-form, with the semantics up to whoever's implementing the > __typecheck__ function, and Python itself wouldn't take any steps to > enforce what can or can't go in the annotations. -1. Th annotations should be available for whatever the user wants to use them for. Remember, lots of folks do *not* use shared frameworks -- the only framework they care about is the one they write for themselves, and they should not feel guilty about using annotations for whatever metadata they need. To take up an old rule from the X11 world, the language should provide mechanism without policy. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jcarlson at uci.edu Wed Aug 16 20:12:20 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Wed, 16 Aug 2006 11:12:20 -0700 Subject: [Python-3000] Function annotations considered obfuscatory (Re: Conventions for annotation consumers) In-Reply-To: References: <20060816090147.19DA.JCARLSON@uci.edu> Message-ID: <20060816102652.19E3.JCARLSON@uci.edu> "Guido van Rossum" wrote: > On 8/16/06, Josiah Carlson wrote: > > @docstring > > @typechecker > > @constrain_values > > @__signature__([doc("frobination count"), > > type(Number), > > constrain_values(range(3,9))], > > [type(Number), > > # This can be only 4, 8 or 12 > > constrain_values((4,8,12))], returns=type(Number)) > > def foo(a, b): > > I think you just have disproved your point. Apart from losing a few > string quotes this is just as unreadable as the example you started > with, and those string quotes were due to a different convention for > multiple annotations, not due to moving the information into a > descriptor. > > > Ultimately the full function definition (including decorators) is just > > as cluttered, but now we can see that we have a function that takes two > > arguments, without having to scan for 'name:' . If it is necessary for > > somone to know what kinds of values, types, docs, etc., then they can > > use the documentation-producing tool that will hopefully come with their > > annotation consumer(s). > > The whole point of putting decorators up front was so that they share > prime real estate ("above the fold" if you will :-) with the function > signature. Claiming that what's in the decorators doesn't distract > from the def itself doesn't make it true. From using Python, my brain has become trained to look for new indent levels, so when I'm looking for function definitions, this is what I see... @CRAPCRAPCRAPCRAPCRAPCRAPCRAPCRAP @CRAPCRAPCRAPCRAPCRAPCRAPCRAP @CRAPCRAPCRAPCRAPCRAPCRAPCRAPCRAPCRAPCRAPCRAPCRAPCRAP def foo(...): #stuff a = ... b = ... for ...: ... ... In my opinion, decorators that don't include their own indentation for readability do not distract from the def. I would imagine that many people (not just me) have trained themselves to look for new indent levels, and would agree at some level with this. Indents within decorators generally induce false positives during visual scanning, but aside from including a line in the Python style guide about not using multi-line decorators (and people being kind to readers of their code), there's not much we can do. > But, as I said 15 minutes ago, please stop worrying about this so > much. Try to implement Collin's PEP (which doesn't have any > constraints on the semantics or use of annotations). There's a Py3k > sprint at Google (MV and NY) next week -- perhaps we can work on it > there! I'm trying to keep function *signatures* readable. Including one *small* annotation per argument isn't a big deal, but when simple function signatures (from the def to the suite colon) start spanning multiple lines, they are getting both ungreppable and unreadable. My primary concern is users grepping, reading, and understanding. If annotations detract from any of those three, then the annotation is a waste of time (in my opinion). This was one of the concerns brought up in the decorator discussion, and why none of the decorator proposals that sat between the def and the closing paren even have typed-out examples listed as contenders on the PythonDecorators wiki (they each get a bullet list as to why they suck). But maybe I'm misremembering the discussion, maybe decorators make it very difficult to visually scan for function definitions, and maybe people want all that garbage in their function signature. - Josiah From guido at python.org Wed Aug 16 20:17:33 2006 From: guido at python.org (Guido van Rossum) Date: Wed, 16 Aug 2006 11:17:33 -0700 Subject: [Python-3000] Function annotations considered obfuscatory (Re: Conventions for annotation consumers) In-Reply-To: <20060816102652.19E3.JCARLSON@uci.edu> References: <20060816090147.19DA.JCARLSON@uci.edu> <20060816102652.19E3.JCARLSON@uci.edu> Message-ID: On 8/16/06, Josiah Carlson wrote: > > "Guido van Rossum" wrote: > > On 8/16/06, Josiah Carlson wrote: > > > @docstring > > > @typechecker > > > @constrain_values > > > @__signature__([doc("frobination count"), > > > type(Number), > > > constrain_values(range(3,9))], > > > [type(Number), > > > # This can be only 4, 8 or 12 > > > constrain_values((4,8,12))], returns=type(Number)) > > > def foo(a, b): > > > > I think you just have disproved your point. Apart from losing a few > > string quotes this is just as unreadable as the example you started > > with, and those string quotes were due to a different convention for > > multiple annotations, not due to moving the information into a > > descriptor. > > > > > Ultimately the full function definition (including decorators) is just > > > as cluttered, but now we can see that we have a function that takes two > > > arguments, without having to scan for 'name:' . If it is necessary for > > > somone to know what kinds of values, types, docs, etc., then they can > > > use the documentation-producing tool that will hopefully come with their > > > annotation consumer(s). > > > > The whole point of putting decorators up front was so that they share > > prime real estate ("above the fold" if you will :-) with the function > > signature. Claiming that what's in the decorators doesn't distract > > from the def itself doesn't make it true. > > From using Python, my brain has become trained to look for new indent > levels, so when I'm looking for function definitions, this is what I see... > > @CRAPCRAPCRAPCRAPCRAPCRAPCRAPCRAP > @CRAPCRAPCRAPCRAPCRAPCRAPCRAP > @CRAPCRAPCRAPCRAPCRAPCRAPCRAPCRAPCRAPCRAPCRAPCRAPCRAP > def foo(...): > #stuff > a = ... > b = ... > for ...: > ... > ... Well, then the problem becomes finding the tiny 'def' between all that CRAP. > In my opinion, decorators that don't include their own indentation for > readability do not distract from the def. I would imagine that many > people (not just me) have trained themselves to look for new indent > levels, and would agree at some level with this. But notice that the example *did* include multi-line decorators with indented continuation lines. > Indents within decorators generally induce false positives during visual > scanning, but aside from including a line in the Python style guide > about not using multi-line decorators (and people being kind to readers > of their code), there's not much we can do. There's another style: type_a = {"foo": some_type_for_framework_foo, "bar": some_other_type, etc.} type_b = {...similar...} def my_fun(a: type_a, b: type_b) -> type_c: ... This works just as well for the list style of having multiple annotations. If you write a lot of code that uses multiple annotations, I'd be very surprised if there weren't a bunch of common combinations that could be shared like this. > > But, as I said 15 minutes ago, please stop worrying about this so > > much. Try to implement Collin's PEP (which doesn't have any > > constraints on the semantics or use of annotations). There's a Py3k > > sprint at Google (MV and NY) next week -- perhaps we can work on it > > there! > > I'm trying to keep function *signatures* readable. Including one *small* > annotation per argument isn't a big deal, but when simple function > signatures (from the def to the suite colon) start spanning multiple > lines, they are getting both ungreppable and unreadable. My primary > concern is users grepping, reading, and understanding. If annotations > detract from any of those three, then the annotation is a waste of time > (in my opinion). What exactly are you grepping for where a multi-line arglist would get in the way? The most complicated pattern for which I grep is probably something along the lines of '^def \w+\('. > This was one of the concerns brought up in the decorator discussion, and > why none of the decorator proposals that sat between the def and the > closing paren even have typed-out examples listed as contenders on the > PythonDecorators wiki (they each get a bullet list as to why they suck). > > But maybe I'm misremembering the discussion, maybe decorators make it > very difficult to visually scan for function definitions, and maybe > people want all that garbage in their function signature. They don't want it, but if they're forced to have it occasionally they'll cope. I still think you're way overestimating the importance of this use case. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From paul at prescod.net Thu Aug 17 02:11:34 2006 From: paul at prescod.net (Paul Prescod) Date: Wed, 16 Aug 2006 17:11:34 -0700 Subject: [Python-3000] Fwd: Conventions for annotation consumers (was: Re: Draft pre-PEP: function annotations) In-Reply-To: References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com> <1cb725390608151207g36a4c692v2c2d49c1a3e821fa@mail.gmail.com> <43aa6ff70608151213v1ced455btdaadb51e1761738d@mail.gmail.com> <1cb725390608151230i298c1889gd93233db2b6f980a@mail.gmail.com> <43aa6ff70608151313h2b945032wa556903f3f9a44c3@mail.gmail.com> <76fd5acf0608151519k564723f9q2d409e3285a7918f@mail.gmail.com> <76fd5acf0608151520n7692824asf06a849ac3114e5b@mail.gmail.com> <43aa6ff70608151529q18748348g3dce7c193450a0fb@mail.gmail.com> <1cb725390608152222j32727946ob3c07e43fd004299@mail.gmail.com> Message-ID: <1cb725390608161711v7c0a93b9i3f9e2032da9254af@mail.gmail.com> Okay, you're the boss. The conversation did go pretty far afield but the main thing I wanted was just that if a user wanted to have annotations from framework 1 and framework 2 they could reliably express that as def foo(a: [Anno1, Anno2]): All that that requires is a statement in the spec saying: "If you're processing annotations and you see an annotation you don't understand, skip it. And if you see a list, look inside it rather than processing it in some proprietary fashion." It kind of seemed obvious to me, but I guess everyone's ideas seem obvious to them. There were other secondary things I would have liked but this seemed like the minimum required to protect programmers from "greedy frameworks" that don't play nice in the face of unfamiliar annotations. On 8/16/06, Guido van Rossum wrote: > > There's much in this thread that I haven't followed, for lack of time. > > But it seems clear to me that you've wandered off the path now that > you're discussing what should go into the annotations and how to make > it so that multiple frameworks can coexist. > > I don't see how any of that can be analyzed up front -- you have to > build an implementation and try to use it and *then* perhaps you can > think about the problems that occur. > > Collin wrote a great PEP that doesn't commit to any kind of semantics > for annotations. (I still have to read it more closely, but from > skimming, it looks fine.) Let's focus some efforts on implementing > that first, and see how we can use it, before we consider the use case > of a framework for frameworks. > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060816/b8481a6c/attachment.html From greg.ewing at canterbury.ac.nz Thu Aug 17 03:37:04 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 17 Aug 2006 13:37:04 +1200 Subject: [Python-3000] Function annotations considered obfuscatory (Re: Conventions for annotation consumers) In-Reply-To: References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com> <43aa6ff70608141551o2db297d8ue30c552a5eff5a95@mail.gmail.com> <43aa6ff70608150706q1ecea414ob6b339ceef95a4d9@mail.gmail.com> <44E27062.2040406@canterbury.ac.nz> Message-ID: <44E3C840.1000602@canterbury.ac.nz> Guido van Rossum wrote: > And how would that reduce the clutter? The information still has to be > entered by the user, presumably with the same disambiguating tags, and > some punctuation. But at least the function header itself would retain its wysiwyt[1] character of being mostly just a list of parameter names. -- [1] What You See Is What You Type From greg.ewing at canterbury.ac.nz Thu Aug 17 03:46:43 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 17 Aug 2006 13:46:43 +1200 Subject: [Python-3000] Fwd: Conventions for annotation consumers In-Reply-To: References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com> <43aa6ff70608151128u720b59ecq7d6831177452ebea@mail.gmail.com> <1cb725390608151207g36a4c692v2c2d49c1a3e821fa@mail.gmail.com> <43aa6ff70608151213v1ced455btdaadb51e1761738d@mail.gmail.com> <1cb725390608151230i298c1889gd93233db2b6f980a@mail.gmail.com> <43aa6ff70608151313h2b945032wa556903f3f9a44c3@mail.gmail.com> <76fd5acf0608151519k564723f9q2d409e3285a7918f@mail.gmail.com> <76fd5acf0608151520n7692824asf06a849ac3114e5b@mail.gmail.com> <43aa6ff70608151529q18748348g3dce7c193450a0fb@mail.gmail.com> <1cb725390608152222j32727946ob3c07e43fd004299@mail.gmail.com> Message-ID: <44E3CA83.6030801@canterbury.ac.nz> Guido van Rossum wrote: > Collin wrote a great PEP that doesn't commit to any kind of semantics > for annotations. I think the argument started because Collin's PEP actually went further than that, and asserted that there wouldn't be any problems created by this lack of specification, for reasons which are highly debatable. Not surprisingly, a high amount of debate on that point ensued. If the PEP simply said something like "We'll look at this again after we've had some experience", it might not have been so controversial. -- Greg From talin at acm.org Thu Aug 17 08:21:44 2006 From: talin at acm.org (Talin) Date: Wed, 16 Aug 2006 23:21:44 -0700 Subject: [Python-3000] Fwd: Conventions for annotation consumers In-Reply-To: <43aa6ff70608160809qb8882e1m6b471fda3eee8d10@mail.gmail.com> References: <5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com> <43aa6ff70608160809qb8882e1m6b471fda3eee8d10@mail.gmail.com> Message-ID: <44E40AF8.8060400@acm.org> Collin Winter wrote: > On 8/15/06, Phillip J. Eby wrote: >> Personally, I thought Guido's original proposal for function annotations, >> which included a __typecheck__ operator that was replaceable on a >> per-module basis (and defaulted to a no-op), was the perfect thing -- >> neither too much semantics nor too-little. I'd like to have it back, >> please. :) > > I'd be perfectly happy to go back to talking about "type annotations", > rather than the more general "function annotations", especially since > most of the discussion thus far has been about how to multiple things > with annotations at the same time. Restricting annotations to type > information would be fine by me. I'd be happy to do that as well :) So far, there has been a great deal of confusion and disagreement about this proposal. Some people might be surprised by that - however, my point from the beginning is that this confusion and disagreement is *inherent* in the concept of function annotations as currently envisioned. What the current discussion demonstrates is that the number of different ways in which function annotations can be used is far larger and more diverse than anticipated ("Never underestimate the creative power of an infinite number of monkeys".) Normally, this wouldn't be seen as a problem, but rather a strength of the design. Whenever you have a broad and diverse set of use cases for a given feature, that's usually an indication that the feature has been designed well. However, having a vast set of use cases only works if those use cases can have some degree of isolation from one another. If I write a decorator, I'm not too concerned about what other decorator classes may exist; I may not even be too concerned about what other decorators are applied to the same function as mine are. However, function decorators are a little different than the usual case. Specifically, they need to be fairly concise, otherwise they are obfuscatory (as someone pointed out). One of the ways of achieving this conciseness is to remove the requirement to explicitly identify each annotation, and instead allow the meanings of the annotations to be implicit. (i.e. the use of built-in types rather than a dictionary of key/value pairs.) The problem with implicit identification is that the category boundaries for each annotation are no longer clearly defined. This wouldn't be a problem if the number of use cases were small and widely separated. However, as the recent discussion has shown, the number of use cases is vast and diverse. This means that the implicitly defined categories are inevitably going to collide. What I and others are worried about is that it appears that we are heading in a direction in which different users of function annotations will be forced to jostle elbows with each other - where each consumer of annotations, instead of being able to develop their annotation system in private, will be forced to consider the other annotation systems that exist already. For someone who is developing an annotation library that is intended for widespread use, the *mere existence* of other annotation libraries impacts their design and must be taken into account. I feel that this is an intolerable burden on the designers of such systems. Some have proposed resolving this by going back to explicit identification of annotations, either by keyword or by unique types. However, this destroys some of the conciseness and simplicity of the annotations, something which others have objected to. Personally, I think that the function annotation concept suffers from being too ambitious, attempting to be all things to all people. I don't think we really need docstring annotations - there are other ways to achieve the same effect. The same goes for type checkers and lint checkers and most of the other ideas for using annotations. All those things are nice, but if they never get done I'm not going to worry about it -- and none of these things are worth the level of madness and confusion generated by an N-way collision of incompatible frameworks. I'm going to take a somewhat hard line here, and say that if it were up to me, I would ask Phillip Eby exactly what annotation features he needs to make his overload dispatching mechanism work, and then I would restrict the PEP to just that. In other words, rather than saying "annotations can be anything the programmer wants", I would instead say "This set of annotations is used for dispatching, any other use of annotations is undefined." Which is not to say that a programmer can't make up their own -- but that programmer should have no expectations that their code is going to be able to interoperate with anyone else's. -- Talin From collinw at gmail.com Thu Aug 17 15:01:13 2006 From: collinw at gmail.com (Collin Winter) Date: Thu, 17 Aug 2006 08:01:13 -0500 Subject: [Python-3000] Fwd: Conventions for annotation consumers (was: Re: Draft pre-PEP: function annotations) In-Reply-To: <1cb725390608161711v7c0a93b9i3f9e2032da9254af@mail.gmail.com> References: <43aa6ff70608141459r434f7170sb725468c117ff080@mail.gmail.com> <43aa6ff70608151213v1ced455btdaadb51e1761738d@mail.gmail.com> <1cb725390608151230i298c1889gd93233db2b6f980a@mail.gmail.com> <43aa6ff70608151313h2b945032wa556903f3f9a44c3@mail.gmail.com> <76fd5acf0608151519k564723f9q2d409e3285a7918f@mail.gmail.com> <76fd5acf0608151520n7692824asf06a849ac3114e5b@mail.gmail.com> <43aa6ff70608151529q18748348g3dce7c193450a0fb@mail.gmail.com> <1cb725390608152222j32727946ob3c07e43fd004299@mail.gmail.com> <1cb725390608161711v7c0a93b9i3f9e2032da9254af@mail.gmail.com> Message-ID: <43aa6ff70608170601v4ef2435eq7824f35867767c7d@mail.gmail.com> On 8/16/06, Paul Prescod wrote: > Okay, you're the boss. The conversation did go pretty far afield but the > main thing I wanted was just that if a user wanted to have annotations from > framework 1 and framework 2 they could reliably express that as > > def foo(a: [Anno1, Anno2]): > > All that that requires is a statement in the spec saying: "If you're > processing annotations and you see an annotation you don't understand, skip > it. And if you see a list, look inside it rather than processing it in some > proprietary fashion." So, time for an embarrassing confession: I had a bit of a eureka moment this morning, and I think I finally understand where you were coming from with this idea. I honestly don't know what I thought you were proposing, but now that I get it, my old conception seems like rubbish. Consider the dict-based proposal withdrawn. Apologies for my part in dragging this discussion into a triple-digit comment count : ) Collin Winter From bmx007 at gmail.com Fri Aug 18 11:59:49 2006 From: bmx007 at gmail.com (bmx007) Date: Fri, 18 Aug 2006 10:59:49 +0100 Subject: [Python-3000] Fwd: Conventions for annotation consumers Message-ID: <3f2f9e8c0608180259ybfdf102r48eda9daafdebedf@mail.gmail.com> Hi, I haven't read all the thread because it's pretty long, but if I have well understood Paul and what is my opinion (and why I use docstring in my own typecheker module) is that it's a good idea to not mix function definition and its type. I think the difference between langages is not what they allow to do, but how it's easy to write something and easy to READ it (the read-factor is why I switch from p... to python). So separation of semantic and type is good thing because we don't usualy need (as reader) to know both as the same time. So we can read what we want. As exemple, consider a function def find(token, line): ... with string as parameter a boolean as return value. I allmost cases what we need to not is only the order or the parameter so do I call find(token, line) or find(line, token) ? In this case I shoud find easily the semantic of the parameters, and I don't mind with their type (because I work in a context where I expect them). It obviously easiest to find if their is no extra-information so the old way def find(token, line) is the best. And it's the same for the type, when I care about the type is usually a problem of consistency and I don't mind about semantic. This occurs for exemple in "template" function as def max(x, y): ... where max could be int, int -> int float, float ->float string, string -> string and something like (or any equivalent) def max(x, y): :: int, int -> int :: float, float -> float :: string, string -> string is (I think) easy to read, write and we can skip easily the information we don't mind. Maxime From ncoghlan at gmail.com Fri Aug 18 17:14:16 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 19 Aug 2006 01:14:16 +1000 Subject: [Python-3000] Bound and unbound methods In-Reply-To: References: <44DF0D38.6070507@acm.org> <20060813102036.1985.JCARLSON@uci.edu> <44DF86AA.7050207@acm.org> <44DFE092.8030604@canterbury.ac.nz> <76fd5acf0608152248j76f38d2x88ba241a8c66c835@mail.gmail.com> Message-ID: <44E5D948.7070503@gmail.com> Guido van Rossum wrote: >> Would a possible special method name __methodcall__ be accepted, where >> if it exists on a callable, you can expect to use it as __call__ but >> with the understanding that it accepts as self when called in >> an optimizable form? This would reduce the method call to two >> attribute lookups before the call instead of an instansiation and all >> the heavy lifting currently done. For normal functions, >> 'f.__methodcall__ is f.__call__' may be true, but the existance of >> that __methodcall__ name just gives you an extra contract. > > I'd like to answer "no" (since I think this whole idea is not a very > fruitful avenue) but frankly, I have no idea what you are trying to > describe. Are you even aware of the descriptor protocol (__get__) and > how it's used to create a bound method (or something else)? > > No reply is needed. If I understand Calvin right, the best speed up we could get for the status quo is for the "METHOD_CALL" opcode to: 1. Do a lookup that bypasses the descriptor machinery (i.e. any __get__ methods are not called at this point) 2. If the object is a function object, invoke __call__ directly, supplying the instance as the first argument 3. If the object is a classmethod object, invoke __call__ directly, supplying the class as the first argument 4. If the object is a staticmethod object, invoke __call__ directly, without supplying any extra arguments 5. If the object has a __get__ method, call it and invoke __call__ on the result 6. Otherwise, invoke __call__ on the object (Caveat: this omits details of the lookup process regarding how descriptors are handled that an actual implementation would need to deal with). I think what Calvin is suggesting is, instead of embedding all those special cases in the op code, allow a descriptor to define __methodcall__ as an optimised combination of calling __get__ and then invoking __call__ on the result. Then the sequence of events in the op code would be to: 1. Do a lookup that bypasses the descriptor machinery 2. If the object defines it, invoke __methodcall__ directly, supplying the instance as the first argument and the class as the second argument (similar to __get__), followed by the args tuple as the 3rd argument and the keyword dictionary as the 4th argument. 5. If the object doesn't define __methodcall__, but has a __get__ method, then call it and invoke __call__ on the result 6. Otherwise, invoke __call__ on the returned object For example, on a function object, __methodcall__ would look like: def __methodcall__(self, obj, cls, args, kwds): if obj is None: raise TypeError("Cannot call unbound method") return self(obj, *args, **kwds) On a class method descriptor: def __methodcall__(self, obj, cls, args, kwds): return self._function(cls, *args, **kwds) On a static method descriptor: def __methodcall__(self, obj, cls, args, kwds): return self._function(*args, **kwds) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ncoghlan at gmail.com Fri Aug 18 18:18:39 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 19 Aug 2006 02:18:39 +1000 Subject: [Python-3000] Fwd: Conventions for annotation consumers In-Reply-To: <5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com> References: <5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com> Message-ID: <44E5E85F.6080508@gmail.com> Phillip J. Eby wrote: > I'm frankly baffled by the amount of "protect users from incompatibility" > ranting that this issue has generated. If I wanted to use Java, I'd know > where to find it. Guido has said time and again that Python's balance > favors the individual developer at the expense of the group where > "consenting adults" is concerned, and Py3K isn't intended to change that > balance. I actually thought Collin's approach in the PEP was reasonable (deferring the details of combining annotations until we had some more experience with how they could be made useful in practice). Some of the wording was a little strong (suggesting that the conventions would *never* be develop), but the idea was sound. To try and put this in perspective: 1. I believe argument annotations have the most potential to be beneficial when used in conjunction with a single decorator chosen or written by the developer to support things like Foreign Function Interface type mapping (PyObjC, ctypes, XML-RPC, etc), or function overloading (RuleDispatch, etc). 2. If a developer wishes to use multiple annotations together, they can define their own annotation processing decorator that invokes the necessary operations using non-annotation based APIs provided by the appropriate framework, many of which already exist, and will continue to exist in Py3k due to the need to be able to process functions which have not been annotated at all (such as functions written in C). 3. The question has been raised as to whether or not there is a practical way for a developer to use annotations that make sense to a *static* analysis tool that doesn't actually execute the Python code If someone figures out a way to handle the last point *without* compromising the ease of use for annotations designed to handle point 1, all well and good. Otherwise, I'd call YAGNI. OK, annotations wouldn't be useful for tools like pychecker in that case. So be it - to be really useful for a tool like pychecker they'd have to be ubiquitous, and that's really not Python any more. All that said, I'm still not entirely convinced that function annotations are a good idea in the first place - I'm inclined to believe that signature objects providing a "bind" method that returns a dictionary mapping the method call's arguments to the function's named parameters will prove far more useful. With this approach, the 'annotations' would continue to be supplied as arguments to decorator factories instead of as expressions directly in the function header. IOW, I've yet to see any use case that is significantly easier to write with function annotations instead of decorator arguments, and several cases where function annotations are significantly worse. For one thing, function annotations are useless for decorating a function that was defined elsewhere, whereas it doesn't matter where the function came from when using decorator arguments. The latter also has a major benefit in unambiguously associating each annotation with the decorator that is the intended consumer. Consider an extreme example Josiah used elsewhere in this discussion: > @docstring > @typechecker > @constrain_values > def foo(a: [doc("frobination count"), > type(Number), > constrain_values(range(3,9))], > b: [type(Number), > # This can be only 4, 8 or 12 > constrain_values([4,8,12])]) -> type(Number): Here's how it looks with decorator factories instead: # Using keyword arguments @docstring(a="frobination count") @typechecker(a=Number, b=Number, _return=Number) @constrain_values(a=range(3,9), b=[4,8,12]) def foo(a, b): # the code # Using positional arguments @docstring("frobination count") @typechecker(Number, Number, _return=Number) @constrain_values(range(3,9), [4,8,12]) def foo(a, b): # the code All the disambiguation cruft is gone, the association between the decorators and the values they are processing is clear, the expressions are split naturally across the different decorator lines, and the basic signature is found easily by scanning for the last line before the indented section. The _return=Number is a bit ugly, but that could be handled by syntactic sugar that processed a "->expr" in a function call as equivalent to "return=expr" (i.e. adding the result of the expression to the keywords dictionary under the key "return"). Another advantage of the decorator-with-arguments approach is that you can call the decorator factory once, store the result in a variable, and then reuse that throughout your module, which is harder with annotations directly in the function header (which means that you can only share single annotations, not combinations of annotations). For example: floats2_to_float2tuple = typechecker(float, float, _return=(float, float)) @floats2_to_float2tuple def cartesian_to_polar(x, y): return math.sqrt(x*x + y*y), math.atan2(y, x) @floats2_to_float2tuple def polar_to_cartesian(r, theta): return r*math.cos(theta), r*math.sin(theta) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From rrr at ronadam.com Sat Aug 19 12:29:31 2006 From: rrr at ronadam.com (Ron Adam) Date: Sat, 19 Aug 2006 05:29:31 -0500 Subject: [Python-3000] Fwd: Conventions for annotation consumers In-Reply-To: <44E5E85F.6080508@gmail.com> References: <5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com> <44E5E85F.6080508@gmail.com> Message-ID: Nick Coghlan wrote: [Clipped other good points.] > 3. The question has been raised as to whether or not there is a practical way > for a developer to use annotations that make sense to a *static* analysis tool > that doesn't actually execute the Python code > > If someone figures out a way to handle the last point *without* compromising > the ease of use for annotations designed to handle point 1, all well and good. > Otherwise, I'd call YAGNI. OK, annotations wouldn't be useful for tools like > pychecker in that case. So be it - to be really useful for a tool like > pychecker they'd have to be ubiquitous, and that's really not Python any more. Something I've been looking for is an alternate way to generate function signatures that are closer to those used in the documents. Where help(str.find) gives: find(...) S.find(sub [,start [,end]]) -> int Return the lowest index in S where substring sub is found, such that sub is contained within s[start,end]. Optional arguments start and end are interpreted as in slice notation. Return -1 on failure. But I am wondering if the annotations could help with both pydoc and pychecker. Then maybe function specifications could be generated and look more like ... str.find(sub:IsString [,start:IsInt [,end:IsInt]]) -> IsInt instead of just... find(...) [See below where I'm going with this.] > All that said, I'm still not entirely convinced that function annotations are > a good idea in the first place - I'm inclined to believe that signature > objects providing a "bind" method that returns a dictionary mapping the method > call's arguments to the function's named parameters will prove far more > useful. With this approach, the 'annotations' would continue to be supplied as > arguments to decorator factories instead of as expressions directly in the > function header. IOW, I've yet to see any use case that is significantly > easier to write with function annotations instead of decorator arguments, and > several cases where function annotations are significantly worse. > > For one thing, function annotations are useless for decorating a function that > was defined elsewhere, whereas it doesn't matter where the function came from > when using decorator arguments. The latter also has a major benefit in > unambiguously associating each annotation with the decorator that is the > intended consumer. I've been thinking about this also. It seems maybe there is an effort to separate the "meta-data" and the "use of meta-data" a bit too finely. So What you then get is lock and key effect where the decorators that use the meta-data and the meta-data itself are separate, but at the same time, strongly associated by location (module) and developer. This may be a bit overstated in order to describe it, but I do think it's a concern as well. But this is also probably more of a style of use issue than an issue with annotations them selves. The meta-data can also *be* the validator. So instead of just using Float, Int, Long, etc... and writing a smart validator to read and check each of those, You can just call the meta-data directly with each related argument to validate/modify/or do whatever to it. So this ... > @docstring > @typechecker > @constrain_values > def foo(a: [doc("frobination count"), > type(Number), > constrain_values(range(3,9))], > b: [type(Number), > # This can be only 4, 8 or 12 > constrain_values([4,8,12])]) -> type(Number): could be reduced to ... (removing redundant checks as well) from metalib import * @callmeta def foo( a: [ SetDoc("frobination count"), InRange(3,9) ], b: InSet([4,8,12]) ) -> IsNumber: # code Which isn't too bad. Or even as positional decorator arguments... from metalib import * @callmeta( [SetDoc("frobination count"), InRange(3,9)], InSet([4,8,12]), IsNumber ) def foo(a, b): # code Both of these are very similar. The callmeta decorator would be impemented different, but by using the validators as the meta-data, it makes both versions easier to read and use. IMHO of course. The metalib routines could be something (roughly) like... def IsNumber(arg): return type(arg) in (float, int, long) def IsString(arg): return type(arg) in (str, unicode) def InSet(list_): def inset(arg): return arg in list_ return inset def InRange(start, stop): def inrange(arg): return start <= arg <= stop return inrange etc... (Or it might be better for them to be objects.) Anyway it's vary late and I'm probably overlooking something, and I haven't actually tried any of these so your mileage may vary. ;-) Cheers, Ron From pedronis at strakt.com Sat Aug 19 12:54:19 2006 From: pedronis at strakt.com (Samuele Pedroni) Date: Sat, 19 Aug 2006 12:54:19 +0200 Subject: [Python-3000] signature annotation in the function signature or a separate line In-Reply-To: References: <20060816090147.19DA.JCARLSON@uci.edu> <20060816102652.19E3.JCARLSON@uci.edu> Message-ID: <44E6EDDB.9070604@strakt.com> Guido van Rossum wrote: >>But maybe I'm misremembering the discussion, maybe decorators make it >>very difficult to visually scan for function definitions, and maybe >>people want all that garbage in their function signature. > > > They don't want it, but if they're forced to have it occasionally > they'll cope. I still think you're way overestimating the importance > of this use case. > Given that the meaning of annotations is meant not be predefined, given that people are comining with arbitrarely verbose examples thereof, given the precedent of type inferenced languages that use a separate line for optional type information I think devising a way to have the annotation on a different line with a decorator like introduction instead of mixed with the function head would be saner: One possibility would be to have a syntax for signature expressions and then allow them as decorators with the obvious effect of attaching themself: @sig int,int -> int def f(a,b): return a+b or with argument optional argument names: @sig a: int,b: int -> int def f(a,b): return a+b sig expressions (possibly with parens) would be first class and be able to appear anywhere an expression is allowed, they would produce an object embedding the signature information. So both of these would be possible: @typecheck @sig int,int -> int def f(a,b): return a+b @typecheck(sig int,int -> int) def f(a,b): return a+b For example having first-class signatures would help express nicely reflective queries on overloaded/generic functions, etc... regards. From guido at python.org Sat Aug 19 17:09:53 2006 From: guido at python.org (Guido van Rossum) Date: Sat, 19 Aug 2006 08:09:53 -0700 Subject: [Python-3000] int-long unification Message-ID: Martin, I've thought about it more, and I think it's fine to use a single type. It will surely simplify many things, and that alone might help us win back some of the inefficiency this introduces. And it is best for Python-level users. Are you interested in doing this at the Google sprint next week? Here's how I would approach it: 0. Benchmark. (Py3k is slower than 2.5 at the moment, I don't know why.) I would pick the benchmark that showed the biggest sensitivity in your recent comparisons. 1. Completely gut intobject.[ch], making all PyInt APIs equivalent to the corresponding PyLong APIs (through macros if possible). The PyInt macros become functions. I'm not sure whether it would be better for PyInt_Check() to always return False or to always return True. In bltinmodule, export "int" as an alias for "long". 2. Bang on the rest of the code until it compiles and passes all unit tests (except the 5 that I haven't managed to fix yet -- test_class, test_descr, test_minidom, and the two etree tests). (Right now many more are broken due to the elimination of has_key; I'll fix these over the weekend.) 3. Go over much of the C code where it special-cases PyInt and PyLong separately, and change this to only use the PyLong calls. Keep the unittests working. 4. Benchmark. 5. Introduce some optimizations into longobject.c, e.g. a cache for small ints (like we had in intobject.c), and perhaps a special representation for values less than maxint (or for anything that fits in a long long). Or anything else you can think of. 6. Benchmark. 7. Repeat from 5 until satisfied. At this point I wouldn't rip out the PyInt APIs; leaving them in aliased to PyLong APIs for a while will let us put off the work on some of the more obscure extension modules. What do you think? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From ncoghlan at gmail.com Sat Aug 19 19:54:00 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 20 Aug 2006 03:54:00 +1000 Subject: [Python-3000] signature annotation in the function signature or a separate line In-Reply-To: <44E6EDDB.9070604@strakt.com> References: <20060816090147.19DA.JCARLSON@uci.edu> <20060816102652.19E3.JCARLSON@uci.edu> <44E6EDDB.9070604@strakt.com> Message-ID: <44E75038.1090007@gmail.com> Samuele Pedroni wrote: > Guido van Rossum wrote: >>> But maybe I'm misremembering the discussion, maybe decorators make it >>> very difficult to visually scan for function definitions, and maybe >>> people want all that garbage in their function signature. >> >> They don't want it, but if they're forced to have it occasionally >> they'll cope. I still think you're way overestimating the importance >> of this use case. >> > > Given that the meaning of annotations is meant not be predefined, > given that people are comining with arbitrarely verbose examples > thereof, given the precedent of type inferenced languages > that use a separate line for optional type information I think > devising a way to have the annotation on a different line > with a decorator like introduction instead of mixed with > the function head would be saner: > > One possibility would be to have a syntax for signature expressions > and then allow them as decorators with the obvious effect of attaching > themself: > > @sig int,int -> int > def f(a,b): > return a+b > > or with argument optional argument names: > > @sig a: int,b: int -> int > def f(a,b): > return a+b > > sig expressions (possibly with parens) would be first class > and be able to appear anywhere an expression is allowed, > they would produce an object embedding the signature information. What would a separate sig expression buy you over defining "->expr" as a special form of keyword argument that binds to the keyword name "return" in the dictionary for storing extra keyword arguments? With the argument based approach, the two above examples would look like: @sig(int, int, ->int) def f(a,b): return a+b @sig(a=int, b=int, ->int) def f(a,b): return a+b The implementation of sig might look something like: def sig(*args, **kwds): def annotator(f): # Assume bind() is defined to pass through any # 'return' binding into the returned mapping # Otherwise, it uses normal parameter binding notes = f.__signature__.bind(*args, **kwds) f.__signature__.annotations = notes return f return annotator The longer this discussion goes on, the more convinced I become that making it easier to write decorator factories that produce decorators that map the factory's arguments to the decorated function's parameters is a better idea than adding function annotations directly to the function signature. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From pedronis at strakt.com Sat Aug 19 19:59:23 2006 From: pedronis at strakt.com (Samuele Pedroni) Date: Sat, 19 Aug 2006 19:59:23 +0200 Subject: [Python-3000] signature annotation in the function signature or a separate line In-Reply-To: <44E75038.1090007@gmail.com> References: <20060816090147.19DA.JCARLSON@uci.edu> <20060816102652.19E3.JCARLSON@uci.edu> <44E6EDDB.9070604@strakt.com> <44E75038.1090007@gmail.com> Message-ID: <44E7517B.9020300@strakt.com> Nick Coghlan wrote: > > What would a separate sig expression buy you over defining "->expr" as a > special form of keyword argument that binds to the keyword name "return" > in the dictionary for storing extra keyword arguments? seems to me a quirky addition of sugar, also could not be limited; I prefer going the full length and supporting argument name introduction with : etc. as shown in the example. But it seems we agree that interspensing the annotation in the main head of the function is not such a great idea after all. From pedronis at strakt.com Sat Aug 19 20:08:08 2006 From: pedronis at strakt.com (Samuele Pedroni) Date: Sat, 19 Aug 2006 20:08:08 +0200 Subject: [Python-3000] signature annotation in the function signature or a separate line In-Reply-To: <44E7517B.9020300@strakt.com> References: <20060816090147.19DA.JCARLSON@uci.edu> <20060816102652.19E3.JCARLSON@uci.edu> <44E6EDDB.9070604@strakt.com> <44E75038.1090007@gmail.com> <44E7517B.9020300@strakt.com> Message-ID: <44E75388.2080907@strakt.com> Samuele Pedroni wrote: > Nick Coghlan wrote: > >>What would a separate sig expression buy you over defining "->expr" as a >>special form of keyword argument that binds to the keyword name "return" >>in the dictionary for storing extra keyword arguments? > > > seems to me a quirky addition of sugar, also could not be limited; I > prefer going the full length and supporting argument name introduction > with : etc. as shown in the example. > to be more precise, I find: @sig a: int, b: int -> int more readable and to the point than: @sig(a=int,b=int,->int). First-class sig expressions can have rules to leave out parens as genexp etc. Also it can be extended to support attaching annotations to * and ** args. It would be hard to devise separate sugar for those. > But it seems we agree that interspensing the annotation in the main > head of the function is not such a great idea after all. > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/pedronis%40strakt.com From brett at python.org Sat Aug 19 21:06:02 2006 From: brett at python.org (Brett Cannon) Date: Sat, 19 Aug 2006 12:06:02 -0700 Subject: [Python-3000] int-long unification In-Reply-To: References: Message-ID: On 8/19/06, Guido van Rossum wrote: > > Martin, > > I've thought about it more, and I think it's fine to use a single > type. It will surely simplify many things, and that alone might help > us win back some of the inefficiency this introduces. And it is best > for Python-level users. Woohoo! I totally support this idea (along with anything else that comes up to simplify the C API; I almost feel like we need a dumbed-down API along with the full-powered API behind it). I also support Martin doing the work =) (but that's mostly because I know he is in a good position to do it well). -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060819/c6bd844a/attachment.htm From paul at prescod.net Sat Aug 19 21:19:54 2006 From: paul at prescod.net (Paul Prescod) Date: Sat, 19 Aug 2006 12:19:54 -0700 Subject: [Python-3000] Fwd: Conventions for annotation consumers In-Reply-To: References: <5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com> <44E5E85F.6080508@gmail.com> Message-ID: <1cb725390608191219q71cf34dfi7a6a892c1fd9eddf@mail.gmail.com> On 8/19/06, Ron Adam wrote: > > @callmeta > def foo( a: [ SetDoc("frobination count"), InRange(3,9) ], > b: InSet([4,8,12]) ) > -> IsNumber: > # code What extra information or value does the callmeta decorator provide? For the sake of argument, I'll presume it has some useful function. Even so, it doesn't make sense to explictly attach it to every function. Imagine a hundred such functions in a module. Would it be better to do this: @callmeta def func1(..): ... @callmeta def func2(..): ... @callmeta def func3(..): ... @callmeta def func4(..): ... @callmeta def func5(..): ... Or to do this: func1(...):... func2(...):... func3(...):... func4(...):... func5(...):... callmeta() Paul Prescod -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060819/c0ac90cc/attachment.html From rrr at ronadam.com Sun Aug 20 00:27:06 2006 From: rrr at ronadam.com (Ron Adam) Date: Sat, 19 Aug 2006 17:27:06 -0500 Subject: [Python-3000] Fwd: Conventions for annotation consumers In-Reply-To: <1cb725390608191219q71cf34dfi7a6a892c1fd9eddf@mail.gmail.com> References: <5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com> <44E5E85F.6080508@gmail.com> <1cb725390608191219q71cf34dfi7a6a892c1fd9eddf@mail.gmail.com> Message-ID: Paul Prescod wrote: > On 8/19/06, *Ron Adam* > wrote: > > @callmeta > def foo( a: [ SetDoc("frobination count"), InRange(3,9) ], > b: InSet([4,8,12]) ) > -> IsNumber: > # code > > > What extra information or value does the callmeta decorator provide? For > the sake of argument, I'll presume it has some useful function. Even so, > it doesn't make sense to explictly attach it to every function. The callmeta decorator wouldn't provide any extra information itself, all it does is decorate(wrap) the functions so that the meta data gets called. It activates the meta data calls. > Imagine a hundred such functions in a module. Would it be better to do this: > > @callmeta > def func1(..): ... > > @callmeta > def func2(..): ... > > @callmeta > def func3(..): ... > > @callmeta > def func4(..): ... > > @callmeta > def func5(..): ... Isn't this the same? > Or to do this: > > func1(...):... > > func2(...):... > > func3(...):... > > func4(...):... > > func5(...):... > > callmeta() So here callmeta() wraps all the functions to activate the meta data? That should also work if you want to activate all the functions or a large list of functions with meta data. It could just skip those without callable meta data. > Paul Prescod From bob at redivi.com Sun Aug 20 01:57:47 2006 From: bob at redivi.com (Bob Ippolito) Date: Sat, 19 Aug 2006 16:57:47 -0700 Subject: [Python-3000] int-long unification In-Reply-To: References: Message-ID: <6a36e7290608191657h36645421u3b0859dc504c40b3@mail.gmail.com> On 8/19/06, Brett Cannon wrote: > > On 8/19/06, Guido van Rossum wrote: > > Martin, > > > > I've thought about it more, and I think it's fine to use a single > > type. It will surely simplify many things, and that alone might help > > us win back some of the inefficiency this introduces. And it is best > > for Python-level users. > > > Woohoo! I totally support this idea (along with anything else that comes up > to simplify the C API; I almost feel like we need a dumbed-down API along > with the full-powered API behind it). I also support Martin doing the work > =) (but that's mostly because I know he is in a good position to do it > well). The easiest thing we could do to simplify extension writing would be to supply a script that generates extension source and a setup.py from a generic template. The template would demonstrate the current best practices for defining a function, a constant, an Exception subclass, and a class that wraps a C struct with a method or two. -bob From paul at prescod.net Sun Aug 20 02:06:23 2006 From: paul at prescod.net (Paul Prescod) Date: Sat, 19 Aug 2006 17:06:23 -0700 Subject: [Python-3000] Fwd: Conventions for annotation consumers In-Reply-To: References: <5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com> <44E5E85F.6080508@gmail.com> <1cb725390608191219q71cf34dfi7a6a892c1fd9eddf@mail.gmail.com> Message-ID: <1cb725390608191706l113558f7h98e6810ddc422d2a@mail.gmail.com> On 8/19/06, Ron Adam wrote: > > > The callmeta decorator wouldn't provide any extra information itself, > all it does is decorate(wrap) the functions so that the meta data gets > called. It activates the meta data calls. I think we're using the word "metadata" differently. In my universe, metadata is a form of data and you don't "call" data. You just assert it. I think that what you are trying to do is USE metadata as a form of runtime precondition. That's totally fine as long as we are clear that there are many uses for metadata that do not require anything to "happen" during the function's instantiation. A documentation annotation or annotation to map to a foreign type system are examples. So the decorator is allowed but optional. Given that that's the case, I guess I don't understand the virtue of bringing decorators into the picture. Yes, they are one consumer of metadata. Module-scoped functions are another. Application scoped functions are another. Third party data extraction programs are another. Decorators working with metadata are just special cases of runtime processors of it. Paul Prescod -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060819/e291e9b7/attachment.html From rrr at ronadam.com Sun Aug 20 05:17:28 2006 From: rrr at ronadam.com (Ron Adam) Date: Sat, 19 Aug 2006 22:17:28 -0500 Subject: [Python-3000] Fwd: Conventions for annotation consumers In-Reply-To: <1cb725390608191706l113558f7h98e6810ddc422d2a@mail.gmail.com> References: <5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com> <44E5E85F.6080508@gmail.com> <1cb725390608191219q71cf34dfi7a6a892c1fd9eddf@mail.gmail.com> <1cb725390608191706l113558f7h98e6810ddc422d2a@mail.gmail.com> Message-ID: Paul Prescod wrote: > On 8/19/06, *Ron Adam* > wrote: > > > The callmeta decorator wouldn't provide any extra information itself, > all it does is decorate(wrap) the functions so that the meta data gets > called. It activates the meta data calls. > > > I think we're using the word "metadata" differently. In my universe, > metadata is a form of data and you don't "call" data. You just assert > it. I think that what you are trying to do is USE metadata as a form of > runtime precondition. Yes, I am extending the term in this case to include the details of implementing the meta-data. If you describe something in enough detail it might as well python code. And if it is python code, well why not make use of that? Each of these describes "some info" to greater detail. (1) "some info" (2) some_info = "Brief description of some_info." (3) some_info = """ Detailed description of what some info is, and how to use it. Pseudo code to how to implement some_info property. (pseudo code ...) """ (4) some_info = """ Detailed description of what some info is, and how to use it. # example python code to implement the # some_info property of x. def some_info_foo(x): ... """ (5) some_info(x): """ some info - descripton of what some_info is. """ This last one describes it so precisely it can actually be called if it is desired. So why not use it? > That's totally fine as long as we are clear that > there are many uses for metadata that do not require anything to > "happen" during the function's instantiation. A documentation annotation > or annotation to map to a foreign type system are examples. So the > decorator is allowed but optional. Yes, it is optional in this case too. Just because it's callable, doesn't mean it has to be called to be used. I could just as well use the doc attribute of the meta function, or the function name in any way I want and ignore the code completely. > Given that that's the case, I guess I > don't understand the virtue of bringing decorators into the picture. > Yes, they are one consumer of metadata. Module-scoped functions are > another. Application scoped functions are another. Third party data > extraction programs are another. Decorators working with metadata are > just special cases of runtime processors of it. > > Paul Prescod Decorators reduce repetition and put their labels before a function instead of after it. But they aren't required at *any* time because you can do the same thing without them, but they can help make the code more readable. @decorator def foo(x): # code is the same as... def foo(x): # code foo = decorator(foo) The name is repeated three times in the non-decorator version and because it is located after the function, it might not be noticed. Other than that, they are never required. (Unless I'm unawares of special cases.) (Not meaning to start a decorator discussion, just clarifying.) Cheers, Ron From ncoghlan at gmail.com Sun Aug 20 06:45:43 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 20 Aug 2006 14:45:43 +1000 Subject: [Python-3000] Fwd: Conventions for annotation consumers In-Reply-To: <1cb725390608191706l113558f7h98e6810ddc422d2a@mail.gmail.com> References: <5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com> <44E5E85F.6080508@gmail.com> <1cb725390608191219q71cf34dfi7a6a892c1fd9eddf@mail.gmail.com> <1cb725390608191706l113558f7h98e6810ddc422d2a@mail.gmail.com> Message-ID: <44E7E8F7.30101@gmail.com> Paul Prescod wrote: > Given that that's the case, I guess I > don't understand the virtue of bringing decorators into the picture. > Yes, they are one consumer of metadata. Module-scoped functions are > another. Application scoped functions are another. Third party data > extraction programs are another. Decorators working with metadata are > just special cases of runtime processors of it. The reason I believe decorators are relevant is because the question that has caused this discussion to go on for so long is one of *disambiguation*. That is, there are *lots* of different reasons for annotating a function signature, so how does a progammer indicate which particular interpretation is the one they mean? Obviously, you can say, "I'm using the signature annotations in my module for purpose X". However, a later maintainer of your module may go "but I wanted to use those annotations for purpose Y!". Without function signature annotations in the syntax, *this is not a problem*. The One Obvious Way to implement both purpose X and purpose Y is as decorator factories that accept as arguments the information corresponding to each of the function parameters. Multiple decorators can already be stacked on a single function, and the names of the different decorators allow the different uses to be easily distinguished using the full power of Python's variable namespaces. If signature annotations are added to the language, however, you have a new way of doing things: put the information in the signature annotations and write a decorator that consumes the signature information. And if two different utilities do that, then you have a conflict, and have to invent a mechanism for resolving it. And this disambiguation has to happen for each individual signature annotation instead of being done once for the whole function as would be the case with using separate decorators. So, as far as I can see, adding signature annotations doesn't let us do anything that can't already be done with less ambiguity using decorator factories that accept the appropriate arguments. Samuele's idea of "signature expressions" (i.e. a literal or builtin function for producing objects that describe a function's signature) seems like a *much* more fruitful avenue for exploration, as it would provide a genuine increase in expressiveness (decorator factories would be able to accept a single signature argument instead of separate arguments that then need to be mapped to the relevant function parameter). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From guido at python.org Sun Aug 20 17:04:33 2006 From: guido at python.org (Guido van Rossum) Date: Sun, 20 Aug 2006 08:04:33 -0700 Subject: [Python-3000] signature annotation in the function signature or a separate line In-Reply-To: <44E6EDDB.9070604@strakt.com> References: <20060816090147.19DA.JCARLSON@uci.edu> <20060816102652.19E3.JCARLSON@uci.edu> <44E6EDDB.9070604@strakt.com> Message-ID: On 8/19/06, Samuele Pedroni wrote: > Given that the meaning of annotations is meant not be predefined, Not sure what that has to do with it. > given that people are comining with arbitrarely verbose examples > thereof, Which I believe are worst-case scenarios and not what we'll see in practice. > given the precedent of type inferenced languages > that use a separate line for optional type information Can you show us an example or two? > I think > devising a way to have the annotation on a different line > with a decorator like introduction instead of mixed with > the function head would be saner: > > One possibility would be to have a syntax for signature expressions > and then allow them as decorators with the obvious effect of attaching > themself: > > @sig int,int -> int > def f(a,b): > return a+b One problem with this is that for larger argument lists it's hard for the (human) reader to match types up with arguments. In general I don't like having two parallel lists of things that must be matched up; I'd much rather have a single list containing all the info packed together. > or with argument optional argument names: > > @sig a: int,b: int -> int > def f(a,b): > return a+b This seems like it would merely move the problem to the previous line; it doesn't solve the problem that the signature becomes unreadable when the type expressions are long lists or dicts. My own recommended solution for long signatures is to generously use the Python equivalent of 'typedef'; instead of writing def f(a: [PEAK("some peakish expression here"), Zope("some zopeish expression here")], b: [...more of the same...]) -> [PEAK("...", Zope("...")]: return a+b I think most cases can be made a lot more readable by saying type_a = [PEAK("some peakish expression here"), Zope("some zopeish expression here")] type_b = [...more of the same...]) -> [PEAK("...", Zope("...")] type_f = [PEAK("...", Zope("...")] def f(a: type_a, b: type_b) -> type_f: return a+b especially since I expect that in many cases there will be typedefs that can be shared between multiple signatures. > sig expressions (possibly with parens) would be first class > and be able to appear anywhere an expression is allowed, > they would produce an object embedding the signature information. I think it's a good idea to have a way to produce a signature object without tying it to a function definition; but I'd rather not introduce any new syntax for just this purpose. For purely positional signatures, this could be done using a built-in function, e.g. s = sig(int, int, returns=int) I'm not sure what to do to create signatures that include the variable names, the best I can come up with is s = sig(('a', int), ('b', int), returns=int) (Note that you can't use keyword parameters because that would lose the ordering of the parameters. Possibly signatures could be limited to describing parameters that are purely positional and parameters that are purely keyword but no mixed-mode parameters? Nah, too restrictive.) But I still don't want to introduce new syntax just for this. In extreme cases you can always define a dummy function and extract its __signature__ object. > So both of these would be possible: > > @typecheck > @sig int,int -> int > def f(a,b): > return a+b > > @typecheck(sig int,int -> int) > def f(a,b): > return a+b I'm not sure we need more ways to express the same thing. :-) > For example having first-class signatures would help express nicely > reflective queries on overloaded/generic functions, etc... Agreed. But I think there's a way without forcing the annotations out of the 'def' line. -- --Guido van Rossum (home page: http://www.python.org/~guido/) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060820/84237306/attachment.html From g.brandl at gmx.net Sun Aug 20 17:21:35 2006 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 20 Aug 2006 17:21:35 +0200 Subject: [Python-3000] raise with traceback? Message-ID: Hi, as raise ValueError, "something went wrong" is going to go away, how will one raise with a custom traceback? The obvious raise ValueError("something went wrong"), traceback or something more esoteric like raise ValueError("something went wrong") with traceback ? Georg From guido at python.org Sun Aug 20 17:53:49 2006 From: guido at python.org (Guido van Rossum) Date: Sun, 20 Aug 2006 08:53:49 -0700 Subject: [Python-3000] raise with traceback? In-Reply-To: References: Message-ID: The 'with' syntax is attractive because it will flag all unconverted code as a syntax error. I wonder if "raise ValueError" should still be allowed (as equivalent to "raise ValueError()") or that it should be disallowed. --Guido On 8/20/06, Georg Brandl wrote: > Hi, > > as > > raise ValueError, "something went wrong" > > is going to go away, how will one raise with a custom traceback? > The obvious > > raise ValueError("something went wrong"), traceback > > or something more esoteric like > > raise ValueError("something went wrong") with traceback > > ? > > Georg > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From tjreedy at udel.edu Sun Aug 20 18:08:55 2006 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 20 Aug 2006 12:08:55 -0400 Subject: [Python-3000] raise with traceback? References: Message-ID: "Guido van Rossum" wrote in message news:ca471dc20608200853i318d1051kc8cc8cfff1b7eb0a at mail.gmail.com... > I wonder if "raise ValueError" should still be allowed (as equivalent > to "raise ValueError()") or that it should be disallowed. +1 for disallow. raise is a simple rule to remember. Having VE == VE() in certain contexts is/would be like haveing s.len == s.len() or func == func() (a moderately frequent newbie request) in certain contexts. Plus, why encourage less-helpful, no message exceptions ;-) Terry Jan Reedy From guido at python.org Sun Aug 20 18:10:55 2006 From: guido at python.org (Guido van Rossum) Date: Sun, 20 Aug 2006 09:10:55 -0700 Subject: [Python-3000] Google Sprint Ideas Message-ID: I've created a wiki page with some ideas for Python 3000 things we could do at the Google sprint (starting Monday). See: http://wiki.python.org/moin/GoogleSprintPy3k For general info about this sprint -- it's not too late to come! -- see: http://wiki.python.org/moin/GoogleSprint -- --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at python.org Sun Aug 20 18:11:32 2006 From: barry at python.org (Barry Warsaw) Date: Sun, 20 Aug 2006 12:11:32 -0400 Subject: [Python-3000] raise with traceback? In-Reply-To: References: Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 20, 2006, at 11:53 AM, Guido van Rossum wrote: > The 'with' syntax is attractive because it will flag all unconverted > code as a syntax error. > > I wonder if "raise ValueError" should still be allowed (as equivalent > to "raise ValueError()") or that it should be disallowed. I say keep it. I don't see much value in requiring empty parentheses, except maybe to keep my left pinkie limber. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iQCVAwUBROiJtXEjvBPtnXfVAQKkQwP/WTYvfFYYlA5ukmDmvTg3G5BVCYEyC8hQ 8jZXfnzm0j8PdCGJp2ym16ux0+MIRsMx1taU0VGRpULF4hPfRPHG92EQm/YDRGBm 1X5fXNmQ2sbMAb84GqO6HiQxbUkP70Zu5DbgQj3pCqCO3oJLuqXie1gj5neezBoR lj2yQHiUnP8= =JFG+ -----END PGP SIGNATURE----- From g.brandl at gmx.net Sun Aug 20 18:12:48 2006 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 20 Aug 2006 18:12:48 +0200 Subject: [Python-3000] raise with traceback? In-Reply-To: References: Message-ID: Terry Reedy wrote: > "Guido van Rossum" wrote in message > news:ca471dc20608200853i318d1051kc8cc8cfff1b7eb0a at mail.gmail.com... >> I wonder if "raise ValueError" should still be allowed (as equivalent >> to "raise ValueError()") or that it should be disallowed. > > +1 for disallow. > > raise is a simple rule to remember. > > Having VE == VE() in certain contexts is/would be like haveing s.len == > s.len() or func == func() (a moderately frequent newbie request) in certain > contexts. > > Plus, why encourage less-helpful, no message exceptions ;-) Some exceptions don't need a message, such as StopIteration, and other possibly user-defined ones meant to be caught immediately in surrounding code. Though I agree that it makes explanations (and probably some bits of code) easier to only allow instances after raise. Georg From martin at v.loewis.de Sun Aug 20 18:30:23 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 20 Aug 2006 18:30:23 +0200 Subject: [Python-3000] int-long unification In-Reply-To: References: Message-ID: <44E88E1F.6010607@v.loewis.de> Guido van Rossum schrieb: > Are you interested in doing this at the Google sprint next week? Sure; I hadn't any special plans so far. > What do you think? Sounds good. There are two problems I see: - how to benchmark? - there are subtle details in the API that require changes to extension code. In particular, PyInt_AsLong currently cannot fail, but can fail with a range error after the unification. However, to evaluate the performance, it is possible to work around that. For this specific problem, I would propose to introduce another API, say int PyLong_ToLong(PyObject* val, long* result); which will return true(1) for success, and set an exception in case of a failure. Then, we get long PyLong_AsLong(PyObj *val) { long result; if(!PyLong_ToLong(val, &result))return -1; return result; } and perhaps long PyInt_AsLong(PyObj* val) { long result; if(!PyLong_ToLong(val, &result)) Py_FatalError("old-style integer conversion failed"); return result; } Regards, Martin From guido at python.org Sun Aug 20 18:43:05 2006 From: guido at python.org (Guido van Rossum) Date: Sun, 20 Aug 2006 09:43:05 -0700 Subject: [Python-3000] int-long unification In-Reply-To: <44E88E1F.6010607@v.loewis.de> References: <44E88E1F.6010607@v.loewis.de> Message-ID: On 8/20/06, "Martin v. L?wis" wrote: > Guido van Rossum schrieb: > > Are you interested in doing this at the Google sprint next week? > > Sure; I hadn't any special plans so far. > > > What do you think? > > Sounds good. There are two problems I see: > > - how to benchmark? We could possibly do a lot of int allocations and deallocations in a temporary extension module. > - there are subtle details in the API that require changes > to extension code. In particular, PyInt_AsLong currently > cannot fail, but can fail with a range error after the > unification. > > However, to evaluate the performance, it is possible to work > around that. > > For this specific problem, I would propose to introduce > another API, say > > int PyLong_ToLong(PyObject* val, long* result); > > which will return true(1) for success, and set an exception > in case of a failure. Then, we get > > long PyLong_AsLong(PyObj *val) > { > long result; > if(!PyLong_ToLong(val, &result))return -1; > return result; > } > > and perhaps > > long PyInt_AsLong(PyObj* val) > { > long result; > if(!PyLong_ToLong(val, &result)) > Py_FatalError("old-style integer conversion failed"); > return result; > } The fatal error strikes me as unpleasant. Perhaps PyInt_Check[Exact] should return false if the value won't fit in a C long? Or perhaps we could just return -sys.maxint-1? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jimjjewett at gmail.com Sun Aug 20 20:12:29 2006 From: jimjjewett at gmail.com (Jim Jewett) Date: Sun, 20 Aug 2006 14:12:29 -0400 Subject: [Python-3000] Fwd: Conventions for annotation consumers In-Reply-To: <1cb725390608191706l113558f7h98e6810ddc422d2a@mail.gmail.com> References: <5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com> <44E5E85F.6080508@gmail.com> <1cb725390608191219q71cf34dfi7a6a892c1fd9eddf@mail.gmail.com> <1cb725390608191706l113558f7h98e6810ddc422d2a@mail.gmail.com> Message-ID: On 8/19/06, Paul Prescod wrote: > On 8/19/06, Ron Adam wrote: > ... don't understand the virtue of bringing > decorators into the picture. Yes, they are > one consumer of metadata. They aren't being brought in as sample *consumers*; they are being suggested as *producers* of metadata. The following works to assert the data >>> def f(a, b): ... >>> f.a=int We're discussing the alternative of >>> def f(a:int, b): which is better for some things -- but much worse for others; if the metadata is any longer than int, it is almost certainly worse. So (I believe) he is suggesting that we just reuse decorator syntax >>> @sig(a=int) ... def f(a, b): This keeps the single function declaration line short and sweet, reflecting (modulo "self" and a colon) how it is actually called. It gets the annotations (including type information) up where they should be, but they don't overwhelm the variable names. Whether to also add signature expressions (to make @sig decorators easier to write) is a separate question; the key point is not to mess with the one-line function summary. -jJ From osantana at gmail.com Sun Aug 20 21:03:36 2006 From: osantana at gmail.com (Osvaldo Santana) Date: Sun, 20 Aug 2006 16:03:36 -0300 Subject: [Python-3000] Google Sprint Ideas In-Reply-To: References: Message-ID: Hi Guido, On 8/20/06, Guido van Rossum wrote: > I've created a wiki page with some ideas for Python 3000 things we > could do at the Google sprint (starting Monday). See: > > http://wiki.python.org/moin/GoogleSprintPy3k I'm interested in contribute with the task "Rewrite import in Python (Brett Cannon)". I've started to study the Python import mechanism at interpreter startup to understand how it works (http://pythonologia.org/python_import/) and I've some ideas to make this rewrite too. I'll have full time at my job to work on this. Thanks, Osvaldo -- Osvaldo Santana Neto Python for Maemo developer icq, url = (11287184, "http://www.pythonbrasil.com.br") From guido at python.org Sun Aug 20 21:59:01 2006 From: guido at python.org (Guido van Rossum) Date: Sun, 20 Aug 2006 12:59:01 -0700 Subject: [Python-3000] Google Sprint Ideas In-Reply-To: References: Message-ID: Excellent! I'm adding Brett to the CC's. Can you update the wiki page adding your name to that task? Are you coming to the sprint in person or are you just going to be sprinting at your own place? --Guido On 8/20/06, Osvaldo Santana wrote: > Hi Guido, > > On 8/20/06, Guido van Rossum wrote: > > I've created a wiki page with some ideas for Python 3000 things we > > could do at the Google sprint (starting Monday). See: > > > > http://wiki.python.org/moin/GoogleSprintPy3k > > I'm interested in contribute with the task "Rewrite import in Python > (Brett Cannon)". > > I've started to study the Python import mechanism at interpreter > startup to understand how it works > (http://pythonologia.org/python_import/) and I've some ideas to make > this rewrite too. > > I'll have full time at my job to work on this. > > Thanks, > Osvaldo > > -- > Osvaldo Santana Neto > Python for Maemo developer > icq, url = (11287184, "http://www.pythonbrasil.com.br") > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From paul at prescod.net Sun Aug 20 22:07:18 2006 From: paul at prescod.net (Paul Prescod) Date: Sun, 20 Aug 2006 13:07:18 -0700 Subject: [Python-3000] Fwd: Conventions for annotation consumers In-Reply-To: References: <5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com> <44E5E85F.6080508@gmail.com> <1cb725390608191219q71cf34dfi7a6a892c1fd9eddf@mail.gmail.com> <1cb725390608191706l113558f7h98e6810ddc422d2a@mail.gmail.com> Message-ID: <1cb725390608201307i2b4a2711y7679279b8b2fc871@mail.gmail.com> On 8/20/06, Jim Jewett wrote: > > We're discussing the alternative of > > >>> def f(a:int, b): > > which is better for some things -- but much worse for others; if the > metadata is any longer than int, it is almost certainly worse. So (I > believe) he is suggesting that we just reuse decorator syntax > > >>> @sig(a=int) > ... def f(a, b): I don't believe that's true, because this is the syntax he showed: > @callmeta > def foo( a: [ SetDoc("frobination count"), InRange(3,9) ], > b: InSet([4,8,12]) ) > -> IsNumber: I guess I still don't really understand what he's getting at or what the value of @callmeta is in that example. It just seems like extra noise with no value to me... Ron: what *precisely* does the @callmeta decorator do? If you can express it in code, so much the better. Paul Prescod -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060820/e8257c01/attachment.htm From osantana at gmail.com Sun Aug 20 22:27:31 2006 From: osantana at gmail.com (Osvaldo Santana) Date: Sun, 20 Aug 2006 17:27:31 -0300 Subject: [Python-3000] Google Sprint Ideas In-Reply-To: References: Message-ID: On 8/20/06, Guido van Rossum wrote: > Excellent! I'm adding Brett to the CC's. Cool. Has Brett planned something to this rewrite? > Can you update the wiki page adding your name to that task? Done. > Are you coming to the sprint in person > or are you just going to be sprinting at your own place? I'll sprint at my job. I can access IRC from there. Thanks, Osvaldo -- Osvaldo Santana Neto (aCiDBaSe) icq, url = (11287184, "http://www.pythonbrasil.com.br") From nnorwitz at gmail.com Sun Aug 20 22:51:40 2006 From: nnorwitz at gmail.com (Neal Norwitz) Date: Sun, 20 Aug 2006 16:51:40 -0400 Subject: [Python-3000] Google Sprint Ideas In-Reply-To: References: Message-ID: On 8/20/06, Osvaldo Santana wrote: > On 8/20/06, Guido van Rossum wrote: > > Excellent! I'm adding Brett to the CC's. > > Cool. Has Brett planned something to this rewrite? I'm not sure exactly what you are asking. It's mostly planned to be a re-implementation of the current behaviour in python. Hopefully various corner-cases will be cleaned up/documented and generally smooth over some of the differences between importing a file from the file system and a zip package. I don't think he's started any of this yet, beyond looking at the PyPy implementation. It helps him in his work to sandbox Python. Also, various optimizations or playing with different semantics become much easier if import is implemented in python. n From p.f.moore at gmail.com Sun Aug 20 22:52:56 2006 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 20 Aug 2006 21:52:56 +0100 Subject: [Python-3000] Google Sprint Ideas In-Reply-To: References: Message-ID: <79990c6b0608201352i74e5def4t16e944db7de22768@mail.gmail.com> On 8/20/06, Guido van Rossum wrote: > I've created a wiki page with some ideas for Python 3000 things we > could do at the Google sprint (starting Monday). See: > > http://wiki.python.org/moin/GoogleSprintPy3k I notice that one of the items on there is "Work on the new I/O library (I have much interest in this but need help -- Guido)". I also have an interest in this, although I won't be at the sprint (and in general have very little time for coding these days, unfortunately). Is there any description of the plans for the new I/O library anywhere? I assume that ultimately there will be a PEP, but in the meantime, I recall very little in the way of details having been discussed. Paul. From rrr at ronadam.com Sun Aug 20 23:01:46 2006 From: rrr at ronadam.com (Ron Adam) Date: Sun, 20 Aug 2006 16:01:46 -0500 Subject: [Python-3000] Fwd: Conventions for annotation consumers In-Reply-To: <1cb725390608201307i2b4a2711y7679279b8b2fc871@mail.gmail.com> References: <5.1.1.6.0.20060815200505.026038e8@sparrow.telecommunity.com> <44E5E85F.6080508@gmail.com> <1cb725390608191219q71cf34dfi7a6a892c1fd9eddf@mail.gmail.com> <1cb725390608191706l113558f7h98e6810ddc422d2a@mail.gmail.com> <1cb725390608201307i2b4a2711y7679279b8b2fc871@mail.gmail.com> Message-ID: Paul Prescod wrote: > I guess I still don't really understand what he's getting at or what the > value of @callmeta is in that example. It just seems like extra noise > with no value to me... > > Ron: what *precisely* does the @callmeta decorator do? If you can > express it in code, so much the better. > > Paul Prescod > Here's a working example. @callmeta could be named something else like @asserter, @checker, or whatever. And it should do more checks to avoid non callable annotations and to keep from writing over pre existing annotations, etc... As I said this could all be put in a module and it's easy to create new assert tests without having to know about decorators or any special classes. Ron # ----- Some assert test functions. def IsAny(arg): pass def IsNumber(arg): assert type(arg) in (int, long, float), \ "%r is not a number" % arg def IsInt(arg): assert type(arg) in (int, long), \ "%r is not an Int" % arg def IsFloat(arg): assert isinstance(arg, float), \ "%r is not a flaot" % arg def InRange(start, stop): def inrange(arg): assert start <= arg <= stop, \ "%r is not in range %r through %r" % (arg, start, stop) return inrange def InSet(list_): s = set(list_) def inset(arg): assert arg in s, \ "%r is not in %r" % (arg, s) return inset # ------- The add-annotation decorator. def annotate(**kwds): def setter(func): func.__setattr__('__signature__', dict()) func.__signature__['annotations'] = kwds return func return setter # ------ The do-asserts decorator. def callmeta(f): def new_f(*args, **kwds): d = dict(zip(f.func_code.co_varnames, args)) d.update(kwds) tests = f.__signature__['annotations'] for key in d: if key != 'returns': tests[key](d[key]) result = f(*args, **kwds) if 'returns' in tests: tests['returns'](result) return result new_f.func_name = f.func_name return new_f # --------- Examples of using callable annotations. @callmeta @annotate(a=Any, b=IsInt, returns=IsInt) def add(a, b): return a + b print add(1, 4) @callmeta @annotate(a=IsInt, b=IsInt, returns=IsInt) def add(a, b): return a + b print add(1, 4.1) # assertion error here. # which could also be... """ @callmeta def add(a:IsInt, b:IsInt) ->IsInt: return a + b """ From qrczak at knm.org.pl Sun Aug 20 23:06:28 2006 From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk) Date: Sun, 20 Aug 2006 23:06:28 +0200 Subject: [Python-3000] int-long unification In-Reply-To: (Guido van Rossum's message of "Sun, 20 Aug 2006 09:43:05 -0700") References: <44E88E1F.6010607@v.loewis.de> Message-ID: <871wrbyv57.fsf@qrnik.zagroda> "Guido van Rossum" writes: > The fatal error strikes me as unpleasant. Perhaps PyInt_Check[Exact] > should return false if the value won't fit in a C long? Maybe. > Or perhaps we could just return -sys.maxint-1? This would be a bad idea: some errors in use programs would yield nonsensical results or be masked instead of being signalled with exceptions. I made C macros for the following patterns of extracting C integers from my language: 1. If the object is an integer with its value in the given range, put the value into a C integer variable. Otherwise fail with an exception which tells that the value is out of range (includes the value, the range, and a string explaining what does this value represent), or that is not an integer. 2. As above, but the range is the full range of the C type. 3. As above, but the low end is 0 or given explicitly and the high end is the range of the C type. Only in rare cases I needed to separate checking whether the number is in the given range, and extracting the value under the assumption that it has been checked earlier. Sometimes the action performed for out of range is different than throwing an exception, but this is rare too. The C type can be smaller or larger than the threshold which separates the representations of small integers and big integers in my runtime (which in my case is 1 bit smaller than some C type, so it never matches exactly). This is handled transparently by these C macros. I always try to find out the maximum sensible range of the given parameter. For example: - bzip2, compression parameters (verbosity 0..4, compression level 1..9, work factor 1..250), gzip similarly - case 1 - Python's unichr(): character code 0..0x10FFFF - case 1 - conversions int<->str, base 2..36 - case 1 - seeking into files - cases 2 and 3 - curses, color pair number 0..PAIR_NUMBER(A_COLOR) - case 1 - curses, screen coordinates and character counts - case 3 - curses, KEY_F(n) 0..63 - case 1 - sockets, address family code 0..AF_MAX or 0..255 - case 1 - sockets, port number 0..65535 - case 1 - sockets, socket type code and protocol number - case 3 - readline, function code in keymap 0..255 (or 0..KEYMAP_SIZE-2, but KEYMAP_SIZE is always 257) - case 1 - readline, repetition count of commands - case 2 - readline, rl_display_match_list, screen width 0..INT_MAX-2 - case 1 - readline, history entry positions - case 3 - readline, terminal width & height - case 3 - kill() and waitpid(), pid - case 3 (starting from 1 for an individual process or 2 for process group) - kill(), signal number 0..NSIG-1 or 0.._NSIG-1 or 0..32 - case 1 The effect when writing a C extension is that the same C code works no matter what is the relation between ranges of the target C type and int / size_t. Python had to code extraction of the seeking offset specially because off_t may be larger, and silently assumes that the sensible ranges of pid_t, uid_t etc. are the same as of C int. The visible effect is that Python has inconsistent exceptions: >>> unichr(0x123456) ValueError: unichr() arg not in range(0x110000) (wide Python build) >>> unichr(0x1234567890) OverflowError: long int too large to convert to int Kogut is consistent here: > Char 0x123456 Value out of range: character code must be between 0 and 1114111, but 1193046 was given > Char 0x1234567890 Value out of range: character code must be between 0 and 1114111, but 78187493520 was given Python: >>> posix.kill(0, 128) OSError: [Errno 22] Invalid argument >>> posix.kill(0, 2**32) OverflowError: long int too large to convert to int Kogut: > SignalProcess #group (SystemSignal 128) Value out of range: signal number must be between 0 and 64, but 128 was given > SignalProcess #group (SystemSignal (2 %Power 32)) Value out of range: signal number must be between 0 and 64, but 4294967296 was given The same applies in the other direction, converting from C. C in Python: #ifdef HAVE_LARGEFILE_SUPPORT PyStructSequence_SET_ITEM(v, 1, PyLong_FromLongLong((PY_LONG_LONG)st.st_ino)); #else PyStructSequence_SET_ITEM(v, 1, PyInt_FromLong((long)st.st_ino)); #endif C in Kogut: KO_INT(ko_value_of_file_status(this)->st_ino) This is a C expression returning the equivalent of PyObject *, taking sizeof the argument into account. -- __("< Marcin Kowalczyk \__/ qrczak at knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/ From martin at v.loewis.de Sun Aug 20 23:10:41 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 20 Aug 2006 23:10:41 +0200 Subject: [Python-3000] Ctypes as cross-interpreter C calling interface In-Reply-To: <1cb725390608092219v695b7f24t92534d3aa444ca8c@mail.gmail.com> References: <1cb725390608092219v695b7f24t92534d3aa444ca8c@mail.gmail.com> Message-ID: <44E8CFD1.9090403@v.loewis.de> Paul Prescod schrieb: > Thanks for everyone who contributed. It seems that the emerging > consensus (bar a security question from Guido) is that ctypes it the way > forward for calling C code in Python 3000. I don't think that can ever work (so I don't participate in that consensus). There are too many issues with C that make ctypes not general enough. a) it requires code to be packaged in a DLL; static libraries are not supported (conceptually) b) it requires you to know the layout of data structures, or atleast to duplicate declarations in Python. As the layout of the same structure may change over time or across implementations (e.g. FILE in stdio), you can never get good platform coverage. c) A good deal of C API is through macros, for various usages (symbolic constants, function inlining, customization/configuration/conditional compilation) d) No real support for C++ (where there are even more ABI issues: (multiple) inheritance, vtables, constructors, operator overload, templates, ...) To access a C API, the only "right" way is to use a C compiler. ctypes is for people who want to avoid using a C compiler at all costs. Regards, Martin From seojiwon at gmail.com Sun Aug 20 23:52:32 2006 From: seojiwon at gmail.com (Jiwon Seo) Date: Sun, 20 Aug 2006 14:52:32 -0700 Subject: [Python-3000] Keyword Only Argument Message-ID: For the implementation of Implement PEP [PEP]3102 - Keyword Only Argument, it would be nice to have a (abstract) data structure representing the signature of a function. Currently, the code object only has # of arguments, # of default values, so if we want to allow something like, def foo(a,b=10,*,c,d): ... or, def foo(a,b=10,*,c,d=20): ... and signature data structure will be very helpful. Signature data structure is roughly described in http://mail.python.org/pipermail/python-3000/2006-April/001249.html , but has anyone got detailed idea or implemented it (doesn't matter how naive the implementation is) ? Brett, is that document most recent one describing signature data structure? -Jiwon From jcarlson at uci.edu Mon Aug 21 00:47:52 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Sun, 20 Aug 2006 15:47:52 -0700 Subject: [Python-3000] signature annotation in the function signature or a separate line In-Reply-To: References: <44E6EDDB.9070604@strakt.com> Message-ID: <20060820152716.1A09.JCARLSON@uci.edu> "Guido van Rossum" wrote: > > given the precedent of type inferenced languages > > that use a separate line for optional type information > > Can you show us an example or two? C/C++ probably doesn't count, being that type information is required, but one can relocate type information to other lines... void cross(inp1, inp2, inpl1, inpl2, outp) double* inp1; double* inp2; long inpl1; long inpl2; double* outp { /* body goes here */ } - Josisha From free.condiments at gmail.com Mon Aug 21 01:26:57 2006 From: free.condiments at gmail.com (Sam Pointon) Date: Mon, 21 Aug 2006 00:26:57 +0100 Subject: [Python-3000] signature annotation in the function signature or a separate line In-Reply-To: References: <20060816090147.19DA.JCARLSON@uci.edu> <20060816102652.19E3.JCARLSON@uci.edu> <44E6EDDB.9070604@strakt.com> Message-ID: On 20/08/06, Guido van Rossum wrote: > On 8/19/06, Samuele Pedroni wrote: > > given the precedent of type inferenced languages > > that use a separate line for optional type information > > Can you show us an example or two? Haskell: map :: (a -> b) -> [a] -> [b] map f xs = ... Note that type information can also be contained in an expression (and by extension on the same line), though the convention for defined functions is to have it on a separate line. This type information is not quite 100% optional - there are some corner-cases where the typechecker needs a shove in the correct direction, or the inferred type could be too general. --Sam From guido at python.org Mon Aug 21 01:27:15 2006 From: guido at python.org (Guido van Rossum) Date: Sun, 20 Aug 2006 16:27:15 -0700 Subject: [Python-3000] Google Sprint Ideas In-Reply-To: <79990c6b0608201352i74e5def4t16e944db7de22768@mail.gmail.com> References: <79990c6b0608201352i74e5def4t16e944db7de22768@mail.gmail.com> Message-ID: On 8/20/06, Paul Moore wrote: > On 8/20/06, Guido van Rossum wrote: > > I've created a wiki page with some ideas for Python 3000 things we > > could do at the Google sprint (starting Monday). See: > > > > http://wiki.python.org/moin/GoogleSprintPy3k > > I notice that one of the items on there is "Work on the new I/O > library (I have much interest in this but need help -- Guido)". I also > have an interest in this, although I won't be at the sprint (and in > general have very little time for coding these days, unfortunately). > > Is there any description of the plans for the new I/O library > anywhere? I assume that ultimately there will be a PEP, but in the > meantime, I recall very little in the way of details having been > discussed. Without endorsing every detail of his design, tomer filiba has written several blog (?) entries about this, the latest being http://sebulba.wikispaces.com/project+iostack+v2 . You can also look at sandbox/sio/sio.py in svn. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From talin at acm.org Mon Aug 21 01:42:08 2006 From: talin at acm.org (Talin) Date: Sun, 20 Aug 2006 16:42:08 -0700 Subject: [Python-3000] Google Sprint Ideas In-Reply-To: References: <79990c6b0608201352i74e5def4t16e944db7de22768@mail.gmail.com> Message-ID: <44E8F350.8070509@acm.org> Guido van Rossum wrote: > On 8/20/06, Paul Moore wrote: > Without endorsing every detail of his design, tomer filiba has written > several blog (?) entries about this, the latest being > http://sebulba.wikispaces.com/project+iostack+v2 . You can also look > at sandbox/sio/sio.py in svn. One comment after reading this: If we're going to re-invent the Java/C# i/o library, could we at least use the same terminology? In particular, the term "Layer" has connotations which may be confusing in this context - I would prefer something like "Adapter" or "Filter". Also, I notice that this proposal removes what I consider to be a nice feature of Python, which is that you can take a plain file object and iterate over the lines of the file -- it would require a separate line buffering adapter to be created. I think I understand the reasoning behind this - in a world with multiple text encodings, the definition of "line" may not be so simple. However, I would assume that the "built-in" streams would support the most basic, least-common-denominator encodings for convenience. -- Talin From greg.ewing at canterbury.ac.nz Mon Aug 21 03:03:27 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 21 Aug 2006 13:03:27 +1200 Subject: [Python-3000] raise with traceback? In-Reply-To: References: Message-ID: <44E9065F.7030802@canterbury.ac.nz> Terry Reedy wrote: > "Guido van Rossum" wrote in message > news:ca471dc20608200853i318d1051kc8cc8cfff1b7eb0a at mail.gmail.com... > >>I wonder if "raise ValueError" should still be allowed (as equivalent >>to "raise ValueError()") or that it should be disallowed. > > +1 for disallow. Seems like that would break a lot of code with no obvious way of flagging things which need to be changed. Also it would preclude the possibility of any future optimisation to avoid instantiating the exception when its value isn't needed. -- Greg From guido at python.org Mon Aug 21 03:06:02 2006 From: guido at python.org (Guido van Rossum) Date: Sun, 20 Aug 2006 18:06:02 -0700 Subject: [Python-3000] Google Sprint Ideas In-Reply-To: <44E8F350.8070509@acm.org> References: <79990c6b0608201352i74e5def4t16e944db7de22768@mail.gmail.com> <44E8F350.8070509@acm.org> Message-ID: On 8/20/06, Talin wrote: > Guido van Rossum wrote: > > On 8/20/06, Paul Moore wrote: > > > Without endorsing every detail of his design, tomer filiba has written > > several blog (?) entries about this, the latest being > > http://sebulba.wikispaces.com/project+iostack+v2 . You can also look > > at sandbox/sio/sio.py in svn. > > One comment after reading this: If we're going to re-invent the Java/C# > i/o library, could we at least use the same terminology? In particular, > the term "Layer" has connotations which may be confusing in this context > - I would prefer something like "Adapter" or "Filter". That's an example of what I meant when I said "without endorsing every detail". I don't know which terminology C++ uses beyond streams. I think Java uses Streams for the lower-level stuff and Reader/Writer for the higher-level stuff -- or is it the other way around? > Also, I notice that this proposal removes what I consider to be a nice > feature of Python, which is that you can take a plain file object and > iterate over the lines of the file -- it would require a separate line > buffering adapter to be created. I think I understand the reasoning > behind this - in a world with multiple text encodings, the definition of > "line" may not be so simple. However, I would assume that the "built-in" > streams would support the most basic, least-common-denominator encodings > for convenience. First time I noticed that. But perhaps it's the concept of "plain file object" that changed? My own hierarchy (which I arrived at without reading tomer's proposal) is something like this: (1) Basic level (implemented in C) -- open, close, read, write, seek, tell. Completely unbuffered, maps directly to system calls. Does binary I/O only. (2) Buffering. Implements the same API as (1) but adds buffering. This is what one normally uses for binary file I/O. It builds on (1), but can also be built on raw sockets instead. It adds an API to inquire about the amount of buffered data, a flush() method, and ways to change the buffer size. (3) Encoding and line endings. Implements a somewhat different API, for reading/writing text files; the API resembles Python 2's I/O library more. This is where readline() and next() giving the next line are implemented. It also does newline translation to/from the platform's native convention (CRLF or LF, or perhaps CR if anyone still cares about Mac OS <= 9) and Python's convention (always \n). I think I want to put these two features (encoding and line endings) in the same layer because they are both text related. Of course you can specify ASCII or Latin-1 to effectively disable the encoding part. Does this make more sense? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From talin at acm.org Mon Aug 21 03:34:28 2006 From: talin at acm.org (Talin) Date: Sun, 20 Aug 2006 18:34:28 -0700 Subject: [Python-3000] Google Sprint Ideas In-Reply-To: References: <79990c6b0608201352i74e5def4t16e944db7de22768@mail.gmail.com> <44E8F350.8070509@acm.org> Message-ID: <44E90DA4.1040203@acm.org> Guido van Rossum wrote: > On 8/20/06, Talin wrote: >> Guido van Rossum wrote: >> > On 8/20/06, Paul Moore wrote: >> >> > Without endorsing every detail of his design, tomer filiba has written >> > several blog (?) entries about this, the latest being >> > http://sebulba.wikispaces.com/project+iostack+v2 . You can also look >> > at sandbox/sio/sio.py in svn. >> >> One comment after reading this: If we're going to re-invent the Java/C# >> i/o library, could we at least use the same terminology? In particular, >> the term "Layer" has connotations which may be confusing in this context >> - I would prefer something like "Adapter" or "Filter". > > That's an example of what I meant when I said "without endorsing every > detail". > > I don't know which terminology C++ uses beyond streams. I think Java > uses Streams for the lower-level stuff and Reader/Writer for the > higher-level stuff -- or is it the other way around? Well, the situation with Java is kind of complex. There are two sets of stream classes, but rather than classifying them as "low-level" and "high-level", a better classification is "old" and "new". The old classes (InputStream/OutputStream) are byte-oriented, whereas the newer ones (Reader/Writer) are character-oriented. It it not the case, however, that the character-oriented interface sits on top of the byte-oriented interface - rather, both interfaces are implemented by a number of different back ends. For purposes of Python, it probably makes more sense to look at the .Net System.IO.Stream. (As a general rule, the .Net classes are refactored versions of the Java classes, which is both good and bad. It's best to study both if one is looking for inspiration.) Hmmm, apparently the .Net documentation *does* use the term 'layer' to describe one stream wrapping another - which I still find strange. To my mind, the term 'layer' can either describe a particular design stratum within an architecture - such as the 'device layer' of an operating system - or it can describe a portion of a document, such as a drawing layer in a CAD program. I don't normally think of a single instance of a class wrapping another instance as constituting a "layer" - I usually use the term "adapter" or "proxy" to describe that case. (OK, so I'm pedantic about naming. Now you know why one of my side projects is writing an online programmer's thesaurus -- using Python/TurboGears of course!) >> Also, I notice that this proposal removes what I consider to be a nice >> feature of Python, which is that you can take a plain file object and >> iterate over the lines of the file -- it would require a separate line >> buffering adapter to be created. I think I understand the reasoning >> behind this - in a world with multiple text encodings, the definition of >> "line" may not be so simple. However, I would assume that the "built-in" >> streams would support the most basic, least-common-denominator encodings >> for convenience. > > First time I noticed that. But perhaps it's the concept of "plain file > object" that changed? My own hierarchy (which I arrived at without > reading tomer's proposal) is something like this: > > (1) Basic level (implemented in C) -- open, close, read, write, seek, > tell. Completely unbuffered, maps directly to system calls. Does > binary I/O only. > > (2) Buffering. Implements the same API as (1) but adds buffering. This > is what one normally uses for binary file I/O. It builds on (1), but > can also be built on raw sockets instead. It adds an API to inquire > about the amount of buffered data, a flush() method, and ways to > change the buffer size. > > (3) Encoding and line endings. Implements a somewhat different API, > for reading/writing text files; the API resembles Python 2's I/O > library more. This is where readline() and next() giving the next line > are implemented. It also does newline translation to/from the > platform's native convention (CRLF or LF, or perhaps CR if anyone > still cares about Mac OS <= 9) and Python's convention (always \n). I > think I want to put these two features (encoding and line endings) in > the same layer because they are both text related. Of course you can > specify ASCII or Latin-1 to effectively disable the encoding part. > > Does this make more sense? I understood that much -- this is pretty much the way everyone does things these days (our own custom stream library at work looks pretty much like this too.) The question I was wondering is, will the built-in 'file' function return an object of level 3? -- Talin From alexander.belopolsky at gmail.com Mon Aug 21 05:36:09 2006 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 21 Aug 2006 03:36:09 +0000 (UTC) Subject: [Python-3000] Google Sprint Ideas References: <79990c6b0608201352i74e5def4t16e944db7de22768@mail.gmail.com> <44E8F350.8070509@acm.org> Message-ID: Guido van Rossum python.org> writes: [snip] >>> Without endorsing every detail of his design, tomer filiba has written >>> several blog (?) entries about this, the latest being >>> http://sebulba.wikispaces.com/project+iostack+v2 . You can also look >>> at sandbox/sio/sio.py in svn. [snip] > > That's an example of what I meant when I said "without endorsing every > detail". Here is another detail that I would like to see addressed. The new API does not seem to provide for a way to read data directly into an existing object without creating an intermediate bytes object. Python 2.x has an undocumented readinto method that allows to read data directly into an object that supports buffer protocol. For Py3k, I would like to suggest a buffer protocol modelled after iovec structure that is used by the readv system call. On many systems readv is more efficient than repeated calls to read and I think Py3k will benefit from a direct access to that feature. From martin at v.loewis.de Mon Aug 21 06:01:00 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 21 Aug 2006 06:01:00 +0200 Subject: [Python-3000] Google Sprint Ideas In-Reply-To: References: <79990c6b0608201352i74e5def4t16e944db7de22768@mail.gmail.com> <44E8F350.8070509@acm.org> Message-ID: <44E92FFC.9080407@v.loewis.de> Alexander Belopolsky schrieb: > For Py3k, I would like to suggest a buffer protocol modelled > after iovec structure that is used by the readv system call. > On many systems readv is more efficient than repeated calls > to read and I think Py3k will benefit from a direct access to > that feature. -1. It's difficult to use, and I question that there is any benefit. I believe readv is there primarily for symmetry with writev and hasn't any sensible uses on its own. writev is there so you can add additional headers/trailers around data blocks you received from higher layers. I even doubt that exposing writev in Python would make a measurable performance difference. Regards, Martin From guido at python.org Mon Aug 21 06:32:18 2006 From: guido at python.org (Guido van Rossum) Date: Sun, 20 Aug 2006 21:32:18 -0700 Subject: [Python-3000] Google Sprint Ideas In-Reply-To: <44E90DA4.1040203@acm.org> References: <79990c6b0608201352i74e5def4t16e944db7de22768@mail.gmail.com> <44E8F350.8070509@acm.org> <44E90DA4.1040203@acm.org> Message-ID: On 8/20/06, Talin wrote: > Guido van Rossum wrote: > > On 8/20/06, Talin wrote: > >> Guido van Rossum wrote: > >> > On 8/20/06, Paul Moore wrote: > >> > >> > Without endorsing every detail of his design, tomer filiba has written > >> > several blog (?) entries about this, the latest being > >> > http://sebulba.wikispaces.com/project+iostack+v2 . You can also look > >> > at sandbox/sio/sio.py in svn. > >> > >> One comment after reading this: If we're going to re-invent the Java/C# > >> i/o library, could we at least use the same terminology? In particular, > >> the term "Layer" has connotations which may be confusing in this context > >> - I would prefer something like "Adapter" or "Filter". > > > > That's an example of what I meant when I said "without endorsing every > > detail". > > > > I don't know which terminology C++ uses beyond streams. I think Java > > uses Streams for the lower-level stuff and Reader/Writer for the > > higher-level stuff -- or is it the other way around? > > Well, the situation with Java is kind of complex. There are two sets of > stream classes, but rather than classifying them as "low-level" and > "high-level", a better classification is "old" and "new". The old > classes (InputStream/OutputStream) are byte-oriented, whereas the newer > ones (Reader/Writer) are character-oriented. It it not the case, > however, that the character-oriented interface sits on top of the > byte-oriented interface - rather, both interfaces are implemented by a > number of different back ends. How sure are you of all that? I always thought that these have about the same age, and that the main distinction is byte vs. char orientation. Also, the InputStreamReader class clearly sits on top of the InputStream class (but surprisingly recommends that for efficiency you do buffering on the reader side instead of on the stream side -- should we consider this for Python too?). And FileReader is a subclass of InputStreamReader. (OK, further investigation does show that FileInputStream exists since JDK 1.0 while InputStreamReader exists since JDK 1.1. But there's much newer Java I/O in the "nio" package, and there's work going on for "nio2", JSR 203.) > For purposes of Python, it probably makes more sense to look at the .Net > System.IO.Stream. (As a general rule, the .Net classes are refactored > versions of the Java classes, which is both good and bad. It's best to > study both if one is looking for inspiration.) Perhaps you can tell us more about that? I've used the Java I/O system sufficiently to have a feel for how it is actually used, which helps me find my way in the docs; but for .NET I fear that I would have to go on a sabbattical to make sense of it. And I don't have time for that. > Hmmm, apparently the .Net documentation *does* use the term 'layer' to > describe one stream wrapping another - which I still find strange. To my > mind, the term 'layer' can either describe a particular design stratum > within an architecture - such as the 'device layer' of an operating > system - or it can describe a portion of a document, such as a drawing > layer in a CAD program. It's used whenever you could draw a diagram of several layers of software sitting on top of each other. Perhaps usually layers are bigger (like device layers) but I see nothing wrong with declaring that Python I/O consists of three layers. > I don't normally think of a single instance of a > class wrapping another instance as constituting a "layer" - I usually > use the term "adapter" or "proxy" to describe that case. > > (OK, so I'm pedantic about naming. Now you know why one of my side > projects is writing an online programmer's thesaurus -- using > Python/TurboGears of course!) Wouldn't it make more sense to contribute to wikipedia at this point? > >> Also, I notice that this proposal removes what I consider to be a nice > >> feature of Python, which is that you can take a plain file object and > >> iterate over the lines of the file -- it would require a separate line > >> buffering adapter to be created. I think I understand the reasoning > >> behind this - in a world with multiple text encodings, the definition of > >> "line" may not be so simple. However, I would assume that the "built-in" > >> streams would support the most basic, least-common-denominator encodings > >> for convenience. > > > > First time I noticed that. But perhaps it's the concept of "plain file > > object" that changed? My own hierarchy (which I arrived at without > > reading tomer's proposal) is something like this: > > > > (1) Basic level (implemented in C) -- open, close, read, write, seek, > > tell. Completely unbuffered, maps directly to system calls. Does > > binary I/O only. > > > > (2) Buffering. Implements the same API as (1) but adds buffering. This > > is what one normally uses for binary file I/O. It builds on (1), but > > can also be built on raw sockets instead. It adds an API to inquire > > about the amount of buffered data, a flush() method, and ways to > > change the buffer size. > > > > (3) Encoding and line endings. Implements a somewhat different API, > > for reading/writing text files; the API resembles Python 2's I/O > > library more. This is where readline() and next() giving the next line > > are implemented. It also does newline translation to/from the > > platform's native convention (CRLF or LF, or perhaps CR if anyone > > still cares about Mac OS <= 9) and Python's convention (always \n). I > > think I want to put these two features (encoding and line endings) in > > the same layer because they are both text related. Of course you can > > specify ASCII or Latin-1 to effectively disable the encoding part. > > > > Does this make more sense? > > I understood that much -- this is pretty much the way everyone does > things these days (our own custom stream library at work looks pretty > much like this too.) So you have the buffering between the binary I/O and the text I/O too? > The question I was wondering is, will the built-in 'file' function > return an object of level 3? I am hoping to get rid of 'file' altogether. Instead, I want to go back to 'open'. Calling open() with a binary mode argument would return a layer 2 or layer 1 (if unbuffered) object; calling it with a text mode would return a layer 3 object. open() would grow additional keyword parameters to specify the encoding, the desired newline translation, and perhaps other aspects of the layering that might need control. BTW in response to Alexander Belopolsky: yes, I would like to continue support for something like readinto() by layer 1 and maybe 2 (perhaps even more flexible, e.g. specifying a buffer and optional start and end indices). I don't think it makes sense for layer 3 since strings are immutable. I agree with Martin von Loewis that a readv() style API would be impractical (and I note that Alexander doesn't provide any use case beyond "it's more efficient"). A use case that I do think is important is reading encoded text data asynchronously from a socket. This might mean that layers 2 and 3 may have to be aware of the asynchronous (non-blocking or timeout-driven) nature of the I/O; reading from layer 3 should give as many characters as possible without blocking for I/O more than the specified timeout. We should also decide how asynchronous I/O calls report "no more data" -- exceptions are inefficient and cause clumsy code, but if we return "", how can we tell that apart from EOF? Perhaps we can use None to indicate "no more data available without blocking", continuing "" to indicate EOF. (The other way around makes just as much sense but would be a bigger break with Python's past than this particular issue is worth to me.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From alexander.belopolsky at gmail.com Mon Aug 21 06:43:53 2006 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 21 Aug 2006 00:43:53 -0400 Subject: [Python-3000] Google Sprint Ideas In-Reply-To: <44E92FFC.9080407@v.loewis.de> References: <79990c6b0608201352i74e5def4t16e944db7de22768@mail.gmail.com> <44E8F350.8070509@acm.org> <44E92FFC.9080407@v.loewis.de> Message-ID: On Aug 21, 2006, at 12:01 AM, Martin v. L?wis wrote: > Alexander Belopolsky schrieb: >> For Py3k, I would like to suggest a buffer protocol modelled >> after iovec structure that is used by the readv system call. >> On many systems readv is more efficient than repeated calls >> to read and I think Py3k will benefit from a direct access to >> that feature. > > -1 What is this -1 for: a) buffer protocol in Py3k? b) multisegment buffer protocol? c) readinto that supports multisegment buffers? Note that in 2.x buffer protocol is multisegment, but readinto only supports single-segment buffers. > It's difficult to use, and I question that there is any > benefit. I often deal with the system (kx.com) that represents matrices as nested lists (1d lists of floats are contiguous). My matrices are stored on disk as C-style 2d arrays. If fileinto would support multisegment buffers, I would be able to update in-memory data from files on disk just with a call to it. Currently I have to do it in a loop. > I believe readv is there primarily for symmetry with > writev and hasn't any sensible uses on its own. writev is > there so you can add additional headers/trailers around data > blocks you received from higher layers. I even doubt that > exposing writev in Python would make a measurable performance > difference. I did not suggest to expose anything in Python. AFAIK, the buffer protocol is a C API only. From talin at acm.org Mon Aug 21 07:41:11 2006 From: talin at acm.org (Talin) Date: Sun, 20 Aug 2006 22:41:11 -0700 Subject: [Python-3000] Google Sprint Ideas In-Reply-To: References: <79990c6b0608201352i74e5def4t16e944db7de22768@mail.gmail.com> <44E8F350.8070509@acm.org> <44E90DA4.1040203@acm.org> Message-ID: <44E94777.9010601@acm.org> Guido van Rossum wrote: > On 8/20/06, Talin wrote: >> Guido van Rossum wrote: > How sure are you of all that? I always thought that these have about > the same age, and that the main distinction is byte vs. char > orientation. Also, the InputStreamReader class clearly sits on top of > the InputStream class (but surprisingly recommends that for efficiency > you do buffering on the reader side instead of on the stream side -- > should we consider this for Python too?). And FileReader is a subclass > of InputStreamReader. (OK, further investigation does show that > FileInputStream exists since JDK 1.0 while InputStreamReader exists > since JDK 1.1. But there's much newer Java I/O in the "nio" package, > and there's work going on for "nio2", JSR 203.) Admittedly my Java knowledge is somewhat old - I spent 2 years programming Java in the ".com era" (2000 - 2001). I remember when the new reader classes came out in JDK 1.1. So "old" and "new" are somewhat relative here. From the point of view of JDK1.5 they are probably indistinguishable as to age :) >> For purposes of Python, it probably makes more sense to look at the .Net >> System.IO.Stream. (As a general rule, the .Net classes are refactored >> versions of the Java classes, which is both good and bad. It's best to >> study both if one is looking for inspiration.) > > Perhaps you can tell us more about that? I've used the Java I/O system > sufficiently to have a feel for how it is actually used, which helps > me find my way in the docs; but for .NET I fear that I would have to > go on a sabbattical to make sense of it. And I don't have time for > that. Try this page. This will at least give you a start: http://msdn2.microsoft.com/en-us/library/system.io.streamreader_members.aspx Here's an excerpt from the "Read" method (reformatted by me): StreamReader.Read () -- Reads the next character from the input stream and advances the character position by one character. StreamReader.Read( Char[], Int32, Int32 ) -- Reads a maximum of count characters from the current stream into buffer, beginning at index. >> Hmmm, apparently the .Net documentation *does* use the term 'layer' to >> describe one stream wrapping another - which I still find strange. To my >> mind, the term 'layer' can either describe a particular design stratum >> within an architecture - such as the 'device layer' of an operating >> system - or it can describe a portion of a document, such as a drawing >> layer in a CAD program. > > It's used whenever you could draw a diagram of several layers of > software sitting on top of each other. Perhaps usually layers are > bigger (like device layers) but I see nothing wrong with declaring > that Python I/O consists of three layers. > >> I don't normally think of a single instance of a >> class wrapping another instance as constituting a "layer" - I usually >> use the term "adapter" or "proxy" to describe that case. >> >> (OK, so I'm pedantic about naming. Now you know why one of my side >> projects is writing an online programmer's thesaurus -- using >> Python/TurboGears of course!) > > Wouldn't it make more sense to contribute to wikipedia at this point? Off topic :) Seriously, though, what I am doing is very different from Wikipedia, and much more like WordNet - that is, I have a database that represents semantic relations between words, and an AJAX GUI that allows editing of those relationships. Mostly it works, but I still need a way for people to create accounts. (Source browsable at http://www.viridia.org/hg/ if interested.) >> >> Also, I notice that this proposal removes what I consider to be a nice >> >> feature of Python, which is that you can take a plain file object and >> >> iterate over the lines of the file -- it would require a separate line >> >> buffering adapter to be created. I think I understand the reasoning >> >> behind this - in a world with multiple text encodings, the >> definition of >> >> "line" may not be so simple. However, I would assume that the >> "built-in" >> >> streams would support the most basic, least-common-denominator >> encodings >> >> for convenience. >> > >> > First time I noticed that. But perhaps it's the concept of "plain file >> > object" that changed? My own hierarchy (which I arrived at without >> > reading tomer's proposal) is something like this: >> > >> > (1) Basic level (implemented in C) -- open, close, read, write, seek, >> > tell. Completely unbuffered, maps directly to system calls. Does >> > binary I/O only. >> > >> > (2) Buffering. Implements the same API as (1) but adds buffering. This >> > is what one normally uses for binary file I/O. It builds on (1), but >> > can also be built on raw sockets instead. It adds an API to inquire >> > about the amount of buffered data, a flush() method, and ways to >> > change the buffer size. >> > >> > (3) Encoding and line endings. Implements a somewhat different API, >> > for reading/writing text files; the API resembles Python 2's I/O >> > library more. This is where readline() and next() giving the next line >> > are implemented. It also does newline translation to/from the >> > platform's native convention (CRLF or LF, or perhaps CR if anyone >> > still cares about Mac OS <= 9) and Python's convention (always \n). I >> > think I want to put these two features (encoding and line endings) in >> > the same layer because they are both text related. Of course you can >> > specify ASCII or Latin-1 to effectively disable the encoding part. >> > >> > Does this make more sense? >> >> I understood that much -- this is pretty much the way everyone does >> things these days (our own custom stream library at work looks pretty >> much like this too.) > > So you have the buffering between the binary I/O and the text I/O too? Theoretically, yes - you can plug in a buffer in-between them if you want. It doesn't do this by default however (our needs are somewhat specialized.) >> The question I was wondering is, will the built-in 'file' function >> return an object of level 3? > > I am hoping to get rid of 'file' altogether. Instead, I want to go > back to 'open'. Calling open() with a binary mode argument would > return a layer 2 or layer 1 (if unbuffered) object; calling it with a > text mode would return a layer 3 object. open() would grow additional > keyword parameters to specify the encoding, the desired newline > translation, and perhaps other aspects of the layering that might need > control. > > BTW in response to Alexander Belopolsky: yes, I would like to continue > support for something like readinto() by layer 1 and maybe 2 (perhaps > even more flexible, e.g. specifying a buffer and optional start and > end indices). I don't think it makes sense for layer 3 since strings > are immutable. I agree with Martin von Loewis that a readv() style API > would be impractical (and I note that Alexander doesn't provide any > use case beyond "it's more efficient"). Note that the .Net API in the example above supports this. > A use case that I do think is important is reading encoded text data > asynchronously from a socket. This might mean that layers 2 and 3 may > have to be aware of the asynchronous (non-blocking or timeout-driven) > nature of the I/O; reading from layer 3 should give as many characters > as possible without blocking for I/O more than the specified timeout. > We should also decide how asynchronous I/O calls report "no more data" > -- exceptions are inefficient and cause clumsy code, but if we return > "", how can we tell that apart from EOF? Perhaps we can use None to > indicate "no more data available without blocking", continuing "" to > indicate EOF. (The other way around makes just as much sense but would > be a bigger break with Python's past than this particular issue is > worth to me.) > From ncoghlan at gmail.com Mon Aug 21 12:03:46 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 21 Aug 2006 20:03:46 +1000 Subject: [Python-3000] int-long unification In-Reply-To: <44E88E1F.6010607@v.loewis.de> References: <44E88E1F.6010607@v.loewis.de> Message-ID: <44E98502.5000203@gmail.com> Martin v. L?wis wrote: > Guido van Rossum schrieb: >> Are you interested in doing this at the Google sprint next week? > > Sure; I hadn't any special plans so far. > >> What do you think? > > Sounds good. There are two problems I see: > > - how to benchmark? > > - there are subtle details in the API that require changes > to extension code. In particular, PyInt_AsLong currently > cannot fail, but can fail with a range error after the > unification. PyInt_AsLong can already fail with OverflowError - pass it a PyLong object and it will try to convert it using the nb_int slot and PyLong_AsLong. PyInt_AsLong is actually somewhat misnamed - it is really PyNumber_AsLong, since it accepts arbitrary objects and coerces them to integers via __int__, instead of just accepting PyInt instances. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From qrczak at knm.org.pl Mon Aug 21 13:11:12 2006 From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk) Date: Mon, 21 Aug 2006 13:11:12 +0200 Subject: [Python-3000] int-long unification In-Reply-To: <44E98502.5000203@gmail.com> (Nick Coghlan's message of "Mon, 21 Aug 2006 20:03:46 +1000") References: <44E88E1F.6010607@v.loewis.de> <44E98502.5000203@gmail.com> Message-ID: <8764gmpcmn.fsf@qrnik.zagroda> Nick Coghlan writes: > PyInt_AsLong can already fail with OverflowError > it accepts arbitrary objects and coerces them to integers via > __int__, instead of just accepting PyInt instances. If it calls __int__, it can fail with any exception resulting from user code. Grepping sources (2.4.2) reveals that usages are split into 4 groups: 1. Calling PyInt_AsLong only after PyInt_Check succeeds. 2. Handling the case when PyInt_AsLong returns -1 and PyErr_Occurred(), or just when PyErr_Occurred(). 3. Doing both (e.g. Modules/mmapmodule.c). The test is superfluous but harmless. 4. Doing neither (e.g. Modules/parsermodule.c, Modules/posixmodule.c, Modules/selectmodule.c and possibly more). This is potentially buggy. -- __("< Marcin Kowalczyk \__/ qrczak at knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/ From krstic at solarsail.hcs.harvard.edu Mon Aug 21 13:16:03 2006 From: krstic at solarsail.hcs.harvard.edu (=?UTF-8?B?SXZhbiBLcnN0acSH?=) Date: Mon, 21 Aug 2006 07:16:03 -0400 Subject: [Python-3000] Google Sprint Ideas In-Reply-To: References: <79990c6b0608201352i74e5def4t16e944db7de22768@mail.gmail.com> <44E8F350.8070509@acm.org> Message-ID: <44E995F3.3090208@solarsail.hcs.harvard.edu> Alexander Belopolsky wrote: > The new API does not seem to provide for a way to read > data directly into an existing object without creating > an intermediate bytes object. This is among the several things that Itamar Shtull-Trauring mentioned during his PyCon 2005 talk on 'Fast Networking with Python': http://ln-s.net/D+u While not affecting the new I/O stack design directly, addressing some of the other ways Itamar lists for improving Python's network efficiency (deep support for buffers, non-copying split(), Array.Array extensions, etc) are things we should probably discuss here. -- Ivan Krsti? | GPG: 0x147C722D From g.brandl at gmx.net Mon Aug 21 21:07:44 2006 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 21 Aug 2006 21:07:44 +0200 Subject: [Python-3000] [PythonInfo Wiki] Update of "GoogleSprintPy3k" by 65.57.245.11 In-Reply-To: <20060821191023.31522.47467@ximinez.python.org> References: <20060821191023.31522.47467@ximinez.python.org> Message-ID: python.org Webmaster wrote: > Dear Wiki user, > > You have subscribed to a wiki page or wiki category on "PythonInfo Wiki" for change notification. > > The following page has been changed by 65.57.245.11: > http://wiki.python.org/moin/GoogleSprintPy3k > > ------------------------------------------------------------------------------ > > * See PEP PEP:3100 for more ideas > > - * Make zip() an iterator (like itertools.zip()) > + * Make zip() an iterator (like itertools.izip()) > + > + * Make map() and filter() iterators and make them stop at the end of the shortest input (like zip()) instead of at the end of the longest input May I suggest an additional keyword(-only?) argument to get the old behavior, stopping at the end of the longest input? Georg From collinw at gmail.com Mon Aug 21 21:12:02 2006 From: collinw at gmail.com (Collin Winter) Date: Mon, 21 Aug 2006 14:12:02 -0500 Subject: [Python-3000] [PythonInfo Wiki] Update of "GoogleSprintPy3k" by 65.57.245.11 In-Reply-To: References: <20060821191023.31522.47467@ximinez.python.org> Message-ID: <43aa6ff70608211212o271f4c7bxca5108107931e077@mail.gmail.com> On 8/21/06, Georg Brandl wrote: > python.org Webmaster wrote: > > - * Make zip() an iterator (like itertools.zip()) > > + * Make zip() an iterator (like itertools.izip()) > > + > > + * Make map() and filter() iterators and make them stop at the end of the shortest input (like zip()) instead of at the end of the longest input > > May I suggest an additional keyword(-only?) argument to get the old behavior, > stopping at the end of the longest input? I thought map() and filter() were going away in Py3k? Did that change? Collin Winter From guido at python.org Mon Aug 21 21:14:54 2006 From: guido at python.org (Guido van Rossum) Date: Mon, 21 Aug 2006 12:14:54 -0700 Subject: [Python-3000] [PythonInfo Wiki] Update of "GoogleSprintPy3k" by 65.57.245.11 In-Reply-To: References: <20060821191023.31522.47467@ximinez.python.org> Message-ID: On 8/21/06, Georg Brandl wrote: > > + * Make map() and filter() iterators and make them stop at the end of the shortest input (like zip()) instead of at the end of the longest input > > May I suggest an additional keyword(-only?) argument to get the old behavior, > stopping at the end of the longest input? I'd rather not. Why, apart from backwards compatibility? I'd like map(f, a, b) to be the same as to (f(*x) for x in zip(a, b)) so we have to explain less. (And I think even map(f, *args) === (f(*x) for x in zip(*args)).) The right way to write code that works in 2.6 and 3.0 is to only use inputs of the same length. Perhaps there could be (or is there already?) a helper in itertools that iterates over multiple iterables padding the shorter inputs with None to the length of the longest one. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Aug 21 21:16:21 2006 From: guido at python.org (Guido van Rossum) Date: Mon, 21 Aug 2006 12:16:21 -0700 Subject: [Python-3000] [PythonInfo Wiki] Update of "GoogleSprintPy3k" by 65.57.245.11 In-Reply-To: <43aa6ff70608211212o271f4c7bxca5108107931e077@mail.gmail.com> References: <20060821191023.31522.47467@ximinez.python.org> <43aa6ff70608211212o271f4c7bxca5108107931e077@mail.gmail.com> Message-ID: On 8/21/06, Collin Winter wrote: > I thought map() and filter() were going away in Py3k? Did that change? I still find them useful when using a built-in function, and unlike reduce(), I have no trouble reading and understanding such code. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From collinw at gmail.com Mon Aug 21 21:20:15 2006 From: collinw at gmail.com (Collin Winter) Date: Mon, 21 Aug 2006 14:20:15 -0500 Subject: [Python-3000] [PythonInfo Wiki] Update of "GoogleSprintPy3k" by 65.57.245.11 In-Reply-To: References: <20060821191023.31522.47467@ximinez.python.org> <43aa6ff70608211212o271f4c7bxca5108107931e077@mail.gmail.com> Message-ID: <43aa6ff70608211220i28bc20a5r4d5fe3b66740873d@mail.gmail.com> On 8/21/06, Guido van Rossum wrote: > On 8/21/06, Collin Winter wrote: > > I thought map() and filter() were going away in Py3k? Did that change? > > I still find them useful when using a built-in function, and unlike > reduce(), I have no trouble reading and understanding such code. You might want to remove them from PEP 3100, as it still lists them under "To be removed". From guido at python.org Mon Aug 21 21:21:20 2006 From: guido at python.org (Guido van Rossum) Date: Mon, 21 Aug 2006 12:21:20 -0700 Subject: [Python-3000] [PythonInfo Wiki] Update of "GoogleSprintPy3k" by 65.57.245.11 In-Reply-To: <43aa6ff70608211220i28bc20a5r4d5fe3b66740873d@mail.gmail.com> References: <20060821191023.31522.47467@ximinez.python.org> <43aa6ff70608211212o271f4c7bxca5108107931e077@mail.gmail.com> <43aa6ff70608211220i28bc20a5r4d5fe3b66740873d@mail.gmail.com> Message-ID: On 8/21/06, Collin Winter wrote: > On 8/21/06, Guido van Rossum wrote: > > On 8/21/06, Collin Winter wrote: > > > I thought map() and filter() were going away in Py3k? Did that change? > > > > I still find them useful when using a built-in function, and unlike > > reduce(), I have no trouble reading and understanding such code. > > You might want to remove them from PEP 3100, as it still lists them > under "To be removed". With three question marks. :-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik.johansson at gmail.com Mon Aug 21 21:28:30 2006 From: fredrik.johansson at gmail.com (Fredrik Johansson) Date: Mon, 21 Aug 2006 21:28:30 +0200 Subject: [Python-3000] [PythonInfo Wiki] Update of "GoogleSprintPy3k" by 65.57.245.11 In-Reply-To: References: <20060821191023.31522.47467@ximinez.python.org> Message-ID: <3d0cebfb0608211228i2369dc8dq431d8c94216b8d60@mail.gmail.com> On 8/21/06, Guido van Rossum wrote: > Perhaps there could be (or is there already?) a helper in itertools > that iterates over multiple iterables padding the shorter inputs with > None to the length of the longest one. I think the most convenient solution would be to handle this with a keyword argument to zip(), i.e., zip(a, b, pad=True). Fredrik Johansson From guido at python.org Mon Aug 21 21:53:23 2006 From: guido at python.org (Guido van Rossum) Date: Mon, 21 Aug 2006 12:53:23 -0700 Subject: [Python-3000] [PythonInfo Wiki] Update of "GoogleSprintPy3k" by 65.57.245.11 In-Reply-To: <3d0cebfb0608211228i2369dc8dq431d8c94216b8d60@mail.gmail.com> References: <20060821191023.31522.47467@ximinez.python.org> <3d0cebfb0608211228i2369dc8dq431d8c94216b8d60@mail.gmail.com> Message-ID: On 8/21/06, Fredrik Johansson wrote: > On 8/21/06, Guido van Rossum wrote: > > Perhaps there could be (or is there already?) a helper in itertools > > that iterates over multiple iterables padding the shorter inputs with > > None to the length of the longest one. > > I think the most convenient solution would be to handle this with a > keyword argument to zip(), i.e., zip(a, b, pad=True). First you'll have to show me a real use case where this behavior is actually needed. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Mon Aug 21 21:57:26 2006 From: martin at v.loewis.de (martin at v.loewis.de) Date: Mon, 21 Aug 2006 21:57:26 +0200 Subject: [Python-3000] Google Sprint Ideas In-Reply-To: References: <79990c6b0608201352i74e5def4t16e944db7de22768@mail.gmail.com> <44E8F350.8070509@acm.org> <44E92FFC.9080407@v.loewis.de> Message-ID: <1156190246.44ea1026ae2a3@www.domainfactory-webmail.de> Zitat von Alexander Belopolsky : > > Alexander Belopolsky schrieb: > >> For Py3k, I would like to suggest a buffer protocol modelled > >> after iovec structure that is used by the readv system call. > > > > -1 > > What is this -1 for: > > a) buffer protocol in Py3k? > b) multisegment buffer protocol? > c) readinto that supports multisegment buffers? b and c; I don't have an opinion a. > I did not suggest to expose anything in Python. AFAIK, the buffer > protocol is a C API only. Ah; now that the IO library will be likely 100% pure Python, this needs thought. Regards, Martin From fredrik.johansson at gmail.com Mon Aug 21 22:35:37 2006 From: fredrik.johansson at gmail.com (Fredrik Johansson) Date: Mon, 21 Aug 2006 22:35:37 +0200 Subject: [Python-3000] [PythonInfo Wiki] Update of "GoogleSprintPy3k" by 65.57.245.11 In-Reply-To: References: <20060821191023.31522.47467@ximinez.python.org> <3d0cebfb0608211228i2369dc8dq431d8c94216b8d60@mail.gmail.com> Message-ID: <3d0cebfb0608211335h38ddfc87hc582e086e3b03f93@mail.gmail.com> On 8/21/06, Guido van Rossum wrote: > On 8/21/06, Fredrik Johansson wrote: > > On 8/21/06, Guido van Rossum wrote: > > > Perhaps there could be (or is there already?) a helper in itertools > > > that iterates over multiple iterables padding the shorter inputs with > > > None to the length of the longest one. > > > > I think the most convenient solution would be to handle this with a > > keyword argument to zip(), i.e., zip(a, b, pad=True). > > First you'll have to show me a real use case where this behavior is > actually needed. I didn't suggest that this feature is needed. But if it is, extending zip() to handle both cases hardly seems to add more cruft to the language than adding a whole new function (stuffed away in a library where not even the language's creator remembers whether it exists :-). Fredrik Johansson From guido at python.org Mon Aug 21 22:40:47 2006 From: guido at python.org (Guido van Rossum) Date: Mon, 21 Aug 2006 13:40:47 -0700 Subject: [Python-3000] [PythonInfo Wiki] Update of "GoogleSprintPy3k" by 65.57.245.11 In-Reply-To: <3d0cebfb0608211335h38ddfc87hc582e086e3b03f93@mail.gmail.com> References: <20060821191023.31522.47467@ximinez.python.org> <3d0cebfb0608211228i2369dc8dq431d8c94216b8d60@mail.gmail.com> <3d0cebfb0608211335h38ddfc87hc582e086e3b03f93@mail.gmail.com> Message-ID: On 8/21/06, Fredrik Johansson wrote: > On 8/21/06, Guido van Rossum wrote: > > On 8/21/06, Fredrik Johansson wrote: > > > On 8/21/06, Guido van Rossum wrote: > > > > Perhaps there could be (or is there already?) a helper in itertools > > > > that iterates over multiple iterables padding the shorter inputs with > > > > None to the length of the longest one. > > > > > > I think the most convenient solution would be to handle this with a > > > keyword argument to zip(), i.e., zip(a, b, pad=True). > > > > First you'll have to show me a real use case where this behavior is > > actually needed. > > I didn't suggest that this feature is needed. But if it is, extending > zip() to handle both cases hardly seems to add more cruft to the > language than adding a whole new function (stuffed away in a library > where not even the language's creator remembers whether it exists :-). I beg to disagree. In general I don't like flag arguments that modify the behavior of a call, when in practice the flag value passed will nearly always be a constant. That's why we have e.g. find() and rfind(), not find(..., fromright=False). Also, I'd like to call YAGNI (and stop wasting everybody's time) unless a good use case is brought up. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jcarlson at uci.edu Mon Aug 21 23:21:30 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Mon, 21 Aug 2006 14:21:30 -0700 Subject: [Python-3000] [Python-Dev] What should the focus for 2.6 be? In-Reply-To: <44E950B2.4060305@acm.org> References: <44E950B2.4060305@acm.org> Message-ID: <20060821081944.1A0F.JCARLSON@uci.edu> Talin wrote: [snip] > I've been thinking about the transition to unicode strings, and I want > to put forward a notion that might allow the transition to be done > gradually instead of all at once. > > The idea would be to temporarily introduce a new name for 8-bit strings > - let's call it "ascii". An "ascii" object would be exactly the same as > today's 8-bit strings. There are two parts to the unicode conversion; all literals are unicode, and we don't have strings anymore, we have bytes. Without offering the bytes object, then people can't really convert their code. String literals can be handled with the -U command line option (and perhaps having the interpreter do the str=unicode assignment during startup). In any case, as I look at Py3k and the future of Python, in each release, I ask "what are the compelling features that make me want to upgrade?" In each of the 1.5-2.5 series that I've looked at, each has had some compelling feature or another that has basically required that I upgrade, or seriously consider upgrading (bugfixes for stuff that has bitten me, new syntax that I use, significant increases in speed, etc.) . As we approach Py3k, I again ask, "what are the compelling features?" Wholesale breakage of anything that uses ascii strings as text or binary data? A completely changed IO stack (requiring re-learning of everything known about Python IO)? Dictionary .keys(), .values(), and .items() being their .iter*() equivalents (making it just about impossible to optimize for Py3k dictionary behavior now)? I understand getting rid of the cruft, really I do (you should see some cruft I've been replacing lately). But some of that cruft is useful, or really, some of that cruft has no alternative currently, which will require significant rewrites of user code when Py3k is released. When everyone has to rewrite their code, they are going to ask, "Why don't I just stick with the maintenance 2.x? It's going to be maintained for a few more years yet, and I don't need to rewrite all of my disk IO, strings in dictionary code, etc. I will be right along with them (no offense intended to those currently working towards py3k). I can code defensively against buffer-sturating DOS attacks with my socket code, but I can't code defensively to handle some (never mind all) of the changes and incompatabilities that Py3k will bring. Here's my suggestion: every feature, syntax, etc., that is slated for Py3k, let us release bit by bit in the 2.x series. That lets the 2.x series evolve into the 3.x series in a somewhat more natural way than the currently proposed *everything breaks*. If it takes 1, 2, 3, or 10 more releases in the 2.x series to get to all of the 3.x features, great. At least people will have a chance to convert, or at least write correct code for the future. Say 2.6 gets bytes and special factories (or a special encoding argument) for file/socket to return bytes instead of strings, and only accept bytes objects to .write() methods (unless an encoding on the file, etc., was previously given). Given these bytes objects, it may even make sense to offer the .readinto() method that Alex B has been asking for (which would make 3 built-in objects that could reasonably support readinto: bytes, array, mmap). If the IO library is available for 2.6, toss that in there, or offer it in PyPI as an evolving library. I would suggest pushing off the dict changes until 2.7 or later, as there are 340+ examples of dict.keys() in the Python 2.5b2 standard library, at least half of which are going to need to be changed to list(dict.keys()) or otherwise. The breakage in user code will likely be at least as substantial. Those are just examples that come to mind now, but I'm sure there are others changes with similar issues. - Josiah From exarkun at divmod.com Mon Aug 21 23:38:17 2006 From: exarkun at divmod.com (Jean-Paul Calderone) Date: Mon, 21 Aug 2006 17:38:17 -0400 Subject: [Python-3000] [Python-Dev] What should the focus for 2.6 be? In-Reply-To: <20060821081944.1A0F.JCARLSON@uci.edu> Message-ID: <20060821213817.1717.1725966885.divmod.quotient.28023@ohm> On Mon, 21 Aug 2006 14:21:30 -0700, Josiah Carlson wrote: > >Talin wrote: >[snip] >> I've been thinking about the transition to unicode strings, and I want >> to put forward a notion that might allow the transition to be done >> gradually instead of all at once. >> >> The idea would be to temporarily introduce a new name for 8-bit strings >> - let's call it "ascii". An "ascii" object would be exactly the same as >> today's 8-bit strings. > >There are two parts to the unicode conversion; all literals are unicode, >and we don't have strings anymore, we have bytes. Without offering the >bytes object, then people can't really convert their code. String >literals can be handled with the -U command line option (and perhaps >having the interpreter do the str=unicode assignment during startup). > A third step would ease this transition significantly: a unicode_literals __future__ import. > >Here's my suggestion: every feature, syntax, etc., that is slated for >Py3k, let us release bit by bit in the 2.x series. That lets the 2.x >series evolve into the 3.x series in a somewhat more natural way than >the currently proposed *everything breaks*. If it takes 1, 2, 3, or 10 >more releases in the 2.x series to get to all of the 3.x features, great. >At least people will have a chance to convert, or at least write correct >code for the future. This really seems like the right idea. "Shoot the moon" upgrades are almost always worse than incremental upgrades. The incremental path is better for everyone involved. For developers of Python, it gets more people using and providing feedback on the new features being developed. For developers with Python, it keeps the scope of a particular upgrade more manageable, letting them developer focus on a much smaller set of changes to be made to their application. Jean-Paul From guido at python.org Tue Aug 22 02:36:41 2006 From: guido at python.org (Guido van Rossum) Date: Mon, 21 Aug 2006 17:36:41 -0700 Subject: [Python-3000] [Python-Dev] What should the focus for 2.6 be? In-Reply-To: <20060821081944.1A0F.JCARLSON@uci.edu> References: <44E950B2.4060305@acm.org> <20060821081944.1A0F.JCARLSON@uci.edu> Message-ID: On 8/21/06, Josiah Carlson wrote: > As we approach Py3k, I again ask, "what are the compelling features?" > Wholesale breakage of anything that uses ascii strings as text or binary > data? A completely changed IO stack (requiring re-learning of everything > known about Python IO)? Dictionary .keys(), .values(), and .items() > being their .iter*() equivalents (making it just about impossible to > optimize for Py3k dictionary behavior now)? I guess py3k is not for you yet. That's a totally defensible point of view, and that's why there will be Python 2.6, 2.7, 2.8 and 2.9 (probably) which will gradually close the gap, after which you will have the choice of maintaining 2.9 yourself or making the switch. :-) > I understand getting rid of the cruft, really I do (you should see some > cruft I've been replacing lately). But some of that cruft is useful, or > really, some of that cruft has no alternative currently, which will > require significant rewrites of user code when Py3k is released. When > everyone has to rewrite their code, they are going to ask, "Why don't I > just stick with the maintenance 2.x? It's going to be maintained for a > few more years yet, and I don't need to rewrite all of my disk IO, > strings in dictionary code, etc. I will be right along with them (no > offense intended to those currently working towards py3k). And yet offense is taken. Have you watched the video of my Py3k talk? Search for it on Google Video. > I can code defensively against buffer-sturating DOS attacks with my > socket code, but I can't code defensively to handle some (never mind all) > of the changes and incompatabilities that Py3k will bring. And that's why there will be conversion tools and aids. > Here's my suggestion: every feature, syntax, etc., that is slated for > Py3k, let us release bit by bit in the 2.x series. That lets the 2.x > series evolve into the 3.x series in a somewhat more natural way than > the currently proposed *everything breaks*. If it takes 1, 2, 3, or 10 > more releases in the 2.x series to get to all of the 3.x features, great. > At least people will have a chance to convert, or at least write correct > code for the future. That will happen, whenever possible. For other features it is infeasible. > Say 2.6 gets bytes and special factories (or a special encoding argument) > for file/socket to return bytes instead of strings, and only accept > bytes objects to .write() methods (unless an encoding on the file, etc., > was previously given). Given these bytes objects, it may even make sense > to offer the .readinto() method that Alex B has been asking for (which > would make 3 built-in objects that could reasonably support readinto: > bytes, array, mmap). > > If the IO library is available for 2.6, toss that in there, or offer it > in PyPI as an evolving library. Could do. > I would suggest pushing off the dict changes until 2.7 or later, as > there are 340+ examples of dict.keys() in the Python 2.5b2 standard > library, at least half of which are going to need to be changed to > list(dict.keys()) or otherwise. The breakage in user code will likely > be at least as substantial. Perhaps you want to help write the transition PEP? > Those are just examples that come to mind now, but I'm sure there are > others changes with similar issues. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From murman at gmail.com Tue Aug 22 05:07:04 2006 From: murman at gmail.com (Michael Urman) Date: Mon, 21 Aug 2006 22:07:04 -0500 Subject: [Python-3000] [PythonInfo Wiki] Update of "GoogleSprintPy3k" by 65.57.245.11 In-Reply-To: References: <20060821191023.31522.47467@ximinez.python.org> Message-ID: On 8/21/06, Guido van Rossum wrote: > I'd like map(f, a, b) to be the same as to (f(*x) for x in zip(a, b)) > so we have to explain less. (And I think even map(f, *args) === (f(*x) > for x in zip(*args)).) Should map(None, a, b) == zip(a, b), leaving python with multiple ways to do one thing? Or should the surprising but useful map(None, ...) behavior disappear or become even more surprising by padding? Is there any reason at all for map to take multiple sequences now that we have starmap and (i)zip? -- Michael Urman http://www.tortall.net/mu/blog From collinw at gmail.com Tue Aug 22 05:16:58 2006 From: collinw at gmail.com (Collin Winter) Date: Mon, 21 Aug 2006 22:16:58 -0500 Subject: [Python-3000] [PythonInfo Wiki] Update of "GoogleSprintPy3k" by 65.57.245.11 In-Reply-To: References: <20060821191023.31522.47467@ximinez.python.org> Message-ID: <43aa6ff70608212016m683c3b8ci9803c31858c937e7@mail.gmail.com> On 8/21/06, Michael Urman wrote: > On 8/21/06, Guido van Rossum wrote: > > I'd like map(f, a, b) to be the same as to (f(*x) for x in zip(a, b)) > > so we have to explain less. (And I think even map(f, *args) === (f(*x) > > for x in zip(*args)).) > > Should map(None, a, b) == zip(a, b), leaving python with multiple ways > to do one thing? Or should the surprising but useful map(None, ...) > behavior disappear or become even more surprising by padding? Is there > any reason at all for map to take multiple sequences now that we have > starmap and (i)zip? FWIW, I'm ambivalent as to whether map() accepts multiple sequences, but I'm strongly in favor of map(None, ....) disappearing. Similarly, I'd want to see filter(None, ...) go away, too; fastpathing the case of filter(bool, ....) will achieve the same performance benefit. Collin Winter From tjreedy at udel.edu Tue Aug 22 05:19:30 2006 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 21 Aug 2006 23:19:30 -0400 Subject: [Python-3000] [Python-Dev] What should the focus for 2.6 be? References: <44E950B2.4060305@acm.org> <20060821081944.1A0F.JCARLSON@uci.edu> Message-ID: "Guido van Rossum" wrote in message news:ca471dc20608211736h5f8903cctc92c60c5bd6e538e at mail.gmail.com... > On 8/21/06, Josiah Carlson wrote: >> When >> everyone has to rewrite their code, they are going to ask, "Why don't I >> just stick with the maintenance 2.x? It's going to be maintained for a >> few more years yet, and I don't need to rewrite all of my disk IO, >> strings in dictionary code, etc. I will be right along with them Many apps never will be converted, just as there are still things running under 1.5 and all versions since. The changeover to writing new stuff in 3.x will be at least somewhat gradual, as such things always are, and that is a good thing, lest the issue tracker be flooded with more items than can be dealt with. > Have you watched the video of my Py3k talk? > Search for it on Google Video Searching Guido Python returns http://video.google.com/videoplay?docid=-6459339159268485356 It pretty well summarizes the results of discussion here up to a month ago. Terry Jan Reedy From guido at python.org Tue Aug 22 05:55:49 2006 From: guido at python.org (Guido van Rossum) Date: Mon, 21 Aug 2006 20:55:49 -0700 Subject: [Python-3000] [PythonInfo Wiki] Update of "GoogleSprintPy3k" by 65.57.245.11 In-Reply-To: <43aa6ff70608212016m683c3b8ci9803c31858c937e7@mail.gmail.com> References: <20060821191023.31522.47467@ximinez.python.org> <43aa6ff70608212016m683c3b8ci9803c31858c937e7@mail.gmail.com> Message-ID: On 8/21/06, Collin Winter wrote: > On 8/21/06, Michael Urman wrote: > > On 8/21/06, Guido van Rossum wrote: > > > I'd like map(f, a, b) to be the same as to (f(*x) for x in zip(a, b)) > > > so we have to explain less. (And I think even map(f, *args) === (f(*x) > > > for x in zip(*args)).) > > > > Should map(None, a, b) == zip(a, b), leaving python with multiple ways > > to do one thing? Or should the surprising but useful map(None, ...) > > behavior disappear or become even more surprising by padding? Is there > > any reason at all for map to take multiple sequences now that we have > > starmap and (i)zip? > > FWIW, I'm ambivalent as to whether map() accepts multiple sequences, > but I'm strongly in favor of map(None, ....) disappearing. Similarly, > I'd want to see filter(None, ...) go away, too; fastpathing the case > of filter(bool, ....) will achieve the same performance benefit. I think map(f, a, b, ...) and filter(p, a, b, ...) should stay, but the None cases should be gotten rid of. I don't want to move starmap() out of itertools into builtins. I expect that filter(bool, a) is fast enough without greasing the tracks, but if you don't, feel free to benchmark it. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Aug 23 03:32:39 2006 From: guido at python.org (Guido van Rossum) Date: Tue, 22 Aug 2006 18:32:39 -0700 Subject: [Python-3000] Droping find/rfind? Message-ID: At today's sprint, one of the volunteers completed a patch to rip out find() and rfind(), replacing all calls with index()/rindex(). But now I'm getting cold feet -- is this really a good idea? (It's been listed in PEP 3100 for a long time, but I haven't thought about it much, really.) What do people think? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.peters at gmail.com Wed Aug 23 03:47:18 2006 From: tim.peters at gmail.com (Tim Peters) Date: Tue, 22 Aug 2006 21:47:18 -0400 Subject: [Python-3000] Droping find/rfind? In-Reply-To: References: Message-ID: <1f7befae0608221847t55c09c57r64cd65511b51f6d4@mail.gmail.com> [Guido van Rossum] > At today's sprint, one of the volunteers completed a patch to rip out > find() and rfind(), replacing all calls with index()/rindex(). But now > I'm getting cold feet -- is this really a good idea? (It's been listed > in PEP 3100 for a long time, but I haven't thought about it much, > really.) > > What do people think? I'd rather toss index/rindex myself, although I understand that [r]find's -1 return value for "not found" can trip up newbies. Like I care ;-) If you decide to toss [r]find anyway, I'd rather see "not found" be spelled with an exception more specific than ValueError (who knows what all "except ValueError:" is going to catch? /Just/ that the substring wasn't found? Ya, that's something to bet your life on ;-)). From jcarlson at uci.edu Wed Aug 23 04:38:47 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Tue, 22 Aug 2006 19:38:47 -0700 Subject: [Python-3000] Droping find/rfind? In-Reply-To: References: Message-ID: <20060822191712.1A39.JCARLSON@uci.edu> "Guido van Rossum" wrote: > At today's sprint, one of the volunteers completed a patch to rip out > find() and rfind(), replacing all calls with index()/rindex(). But now > I'm getting cold feet -- is this really a good idea? (It's been listed > in PEP 3100 for a long time, but I haven't thought about it much, > really.) > > What do people think? I have code for Python 2.x that uses [r]find, but have been transitioning some of it to use [r]partition instead (writing implementations based on [r]find, but it could have just as easily used [r]split). Ultimately I think that an unambiguous 'find without slicing' is useful. One of the issues with the -1 return on find failure is that it is ambiguous, one must really check for a -1 return. Here's an API that is non-ambiguous: x.search(y, start=0, stop=sys.maxint, count=sys.maxint) Which will return a list of up to count non-overlapping examples of y in x from start to stop. On failure, it returns an empty list. This particular API is at least as powerful as the currently existing [r]find one, is unambiguous, etc. It also has a not accidental similarity to x.split(y, count=sys.maxint), which has served Python for quite a while, though this would differ in that rather than always returning a list of at least 1, it could return an empty list. Its functionality is somewhat mirrored by re.finditer, but the above search function can be easily turned into rsearch, whereas re is forward-only. If I were in a position to suggest a change, I would agree with Tim's feeling that [r]index should go before [r]find, but I also think that [r]find could be made unambiguous; the above being an example of such, but one that I'm not going to push for except as an example unambiguous implementation. - Josiah From jack at psynchronous.com Wed Aug 23 06:41:48 2006 From: jack at psynchronous.com (Jack Diederich) Date: Wed, 23 Aug 2006 00:41:48 -0400 Subject: [Python-3000] Droping find/rfind? In-Reply-To: References: Message-ID: <20060823044148.GR5772@performancedrivers.com> On Tue, Aug 22, 2006 at 06:32:39PM -0700, Guido van Rossum wrote: > At today's sprint, one of the volunteers completed a patch to rip out > find() and rfind(), replacing all calls with index()/rindex(). But now > I'm getting cold feet -- is this really a good idea? (It's been listed > in PEP 3100 for a long time, but I haven't thought about it much, > really.) > > What do people think? Looking at my own code I use find() in two cases 1) in an "if" clause where "in" or startswith() would be appropriate This code was written when I started with python and is closer to C++ or perl or was a literal translation of a snippet of C++ or perl 2) where try/except around index() would work just fine and partition would be even better. eg/ try: parts.append(text[text.index('himom')]) except ValueError: pass This is 50 uses of find/rfind in 70 KLOCs of python. Considering I would be better off not using find() in the places I do use it I would be happy to see it go. -Jack From g.brandl at gmx.net Wed Aug 23 08:45:00 2006 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 23 Aug 2006 08:45:00 +0200 Subject: [Python-3000] Droping find/rfind? In-Reply-To: <1f7befae0608221847t55c09c57r64cd65511b51f6d4@mail.gmail.com> References: <1f7befae0608221847t55c09c57r64cd65511b51f6d4@mail.gmail.com> Message-ID: Tim Peters wrote: > [Guido van Rossum] >> At today's sprint, one of the volunteers completed a patch to rip out >> find() and rfind(), replacing all calls with index()/rindex(). But now >> I'm getting cold feet -- is this really a good idea? (It's been listed >> in PEP 3100 for a long time, but I haven't thought about it much, >> really.) >> >> What do people think? > > I'd rather toss index/rindex myself, although I understand that > [r]find's -1 return value for "not found" can trip up newbies. Like I > care ;-) Perhaps a search() method, like Josiah proposed, makes sense. > If you decide to toss [r]find anyway, I'd rather see "not found" be > spelled with an exception more specific than ValueError (who knows > what all "except ValueError:" is going to catch? /Just/ that the > substring wasn't found? Ya, that's something to bet your life on > ;-)). Seriously, this is something I have thought of from time to time: an exceptions' "source", so that you could say try: x = int(some expression) except ValueError from int: do something Obviously, it's too much work to add such a thing though. Georg From holmesbj.dev at gmail.com Wed Aug 23 08:46:22 2006 From: holmesbj.dev at gmail.com (Brian Holmes) Date: Tue, 22 Aug 2006 23:46:22 -0700 Subject: [Python-3000] Droping find/rfind? In-Reply-To: <20060823044148.GR5772@performancedrivers.com> References: <20060823044148.GR5772@performancedrivers.com> Message-ID: On 8/22/06, Jack Diederich wrote: > > On Tue, Aug 22, 2006 at 06:32:39PM -0700, Guido van Rossum wrote: > > At today's sprint, one of the volunteers completed a patch to rip out > > find() and rfind(), replacing all calls with index()/rindex(). But now > > I'm getting cold feet -- is this really a good idea? (It's been listed > > in PEP 3100 for a long time, but I haven't thought about it much, > > really.) > > > > What do people think? > > Looking at my own code I use find() in two cases > > 1) in an "if" clause where "in" or startswith() would be appropriate > This code was written when I started with python and is closer to > C++ or perl or was a literal translation of a snippet of C++ or perl > > 2) where try/except around index() would work just fine and partition > would be even better. eg/ > try: > parts.append(text[text.index('himom')]) > except ValueError: pass > > This is 50 uses of find/rfind in 70 KLOCs of python. Considering I would > be better off not using find() in the places I do use it I would be happy > to see it go. > > -Jack > _______________________________________________ > Even after reading Terry Reedy's arguments, I don't see why we need to remove this option. Let both exist. I'd prefer grandfathering something like this and leaving it in, even if it wouldn't be there had known everything from the start. I just don't think its worth causing people grief in porting to Py3k for something so trivial. I support fixing things in Py3k that are real improvements, but this doesn't really seem like its worth the trade off. - Brian -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060822/4ba3633a/attachment.html From holmesbj.dev at gmail.com Wed Aug 23 08:50:53 2006 From: holmesbj.dev at gmail.com (Brian Holmes) Date: Tue, 22 Aug 2006 23:50:53 -0700 Subject: [Python-3000] Droping find/rfind? In-Reply-To: <20060822191712.1A39.JCARLSON@uci.edu> References: <20060822191712.1A39.JCARLSON@uci.edu> Message-ID: On 8/22/06, Josiah Carlson wrote: > > > "Guido van Rossum" wrote: > > At today's sprint, one of the volunteers completed a patch to rip out > > find() and rfind(), replacing all calls with index()/rindex(). But now > > I'm getting cold feet -- is this really a good idea? (It's been listed > > in PEP 3100 for a long time, but I haven't thought about it much, > > really.) > > > > What do people think? [snip] One of the issues with the -1 return on find failure is that it is > ambiguous, one must really check for a -1 return. Here's an API that is > non-ambiguous: > x.search(y, start=0, stop=sys.maxint, count=sys.maxint) > > Which will return a list of up to count non-overlapping examples of y in > x from start to stop. On failure, it returns an empty list. This > particular API is at least as powerful as the currently existing [r]find > one, is unambiguous, etc. It also has a not accidental similarity to > x.split(y, count=sys.maxint), which has served Python for quite a while, > though this would differ in that rather than always returning a list of > at least 1, it could return an empty list. > > Its functionality is somewhat mirrored by re.finditer, but the above > search function can be easily turned into rsearch, whereas re is > forward-only. [snip] - Josiah > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/holmesbj.dev%40gmail.com > +1 I think that would make a great addition to Py3k, or even 2.6. - Brian -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060822/051f8950/attachment.htm From greg.ewing at canterbury.ac.nz Wed Aug 23 09:35:00 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 23 Aug 2006 19:35:00 +1200 Subject: [Python-3000] Droping find/rfind? In-Reply-To: <20060822191712.1A39.JCARLSON@uci.edu> References: <20060822191712.1A39.JCARLSON@uci.edu> Message-ID: <44EC0524.2060206@canterbury.ac.nz> Josiah Carlson wrote: > One of the issues with the -1 return on find failure is that it is > ambiguous, one must really check for a -1 return. Here's an API that is > non-ambiguous: An alternative would be to return None for not found. It wouldn't solve the problem of people using the return value as a boolean, but at least you'd get an exception if you tried to use the not-found value as an index. Or maybe it could return index values as a special int subclass that always tests true even when it's zero... -- Greg From jjl at pobox.com Wed Aug 23 13:04:56 2006 From: jjl at pobox.com (John J Lee) Date: Wed, 23 Aug 2006 12:04:56 +0100 (GMT Standard Time) Subject: [Python-3000] Droping find/rfind? In-Reply-To: <44EC0524.2060206@canterbury.ac.nz> References: <20060822191712.1A39.JCARLSON@uci.edu> <44EC0524.2060206@canterbury.ac.nz> Message-ID: On Wed, 23 Aug 2006, Greg Ewing wrote: > Josiah Carlson wrote: > >> One of the issues with the -1 return on find failure is that it is >> ambiguous, one must really check for a -1 return. Here's an API that is >> non-ambiguous: > > An alternative would be to return None for not found. > It wouldn't solve the problem of people using the > return value as a boolean, but at least you'd get > an exception if you tried to use the not-found value > as an index. > > Or maybe it could return index values as a special > int subclass that always tests true even when it's > zero... How about returning a str.NotFound object? John From ncoghlan at gmail.com Wed Aug 23 13:43:12 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 23 Aug 2006 21:43:12 +1000 Subject: [Python-3000] Droping find/rfind? In-Reply-To: References: Message-ID: <44EC3F50.3040609@gmail.com> Guido van Rossum wrote: > At today's sprint, one of the volunteers completed a patch to rip out > find() and rfind(), replacing all calls with index()/rindex(). But now > I'm getting cold feet -- is this really a good idea? (It's been listed > in PEP 3100 for a long time, but I haven't thought about it much, > really.) > > What do people think? > I'd be more interested in a patch that replaced standard library uses of find()/rfind() with either "if sub in string" or partition()/rpartition(). Replacing usage of find() for slicing purposes is one of the big reasons the latter methods were added, after all. I also like Josiah's idea of replacing find() with a search() method that returned an iterator of indices, so that you can do: for idx in string.search(sub): # Process the indices (if any) Then you have 5 substring searching mechanisms for different uses cases: sub in s (simple containment test) s.index(sub) (first index, exception if not found) s.search(sub) (iterator of indices, empty if not found) s.partition(sep) (split on first occurrence of substring) s.split(sep) (split on all occurrences of substring) Cheers, Nick. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From barry at python.org Wed Aug 23 14:37:08 2006 From: barry at python.org (Barry Warsaw) Date: Wed, 23 Aug 2006 08:37:08 -0400 Subject: [Python-3000] Droping find/rfind? In-Reply-To: <20060823044148.GR5772@performancedrivers.com> References: <20060823044148.GR5772@performancedrivers.com> Message-ID: <0C9E2042-0040-4123-BA6D-2780FD49F194@python.org> I agree with Tim -- if we have to get rid of one of them, let's get rid of index/rindex and keep find/rfind. Catching the exception is much less convenient than testing for -1. -Barry From guido at python.org Wed Aug 23 16:20:54 2006 From: guido at python.org (Guido van Rossum) Date: Wed, 23 Aug 2006 07:20:54 -0700 Subject: [Python-3000] Droping find/rfind? In-Reply-To: <0C9E2042-0040-4123-BA6D-2780FD49F194@python.org> References: <20060823044148.GR5772@performancedrivers.com> <0C9E2042-0040-4123-BA6D-2780FD49F194@python.org> Message-ID: On 8/23/06, Barry Warsaw wrote: > I agree with Tim -- if we have to get rid of one of them, let's get > rid of index/rindex and keep find/rfind. Catching the exception is > much less convenient than testing for -1. But the -1 is very error-prone, as many have experienced. Also, many uses of find() should be replaced by 'in' (long ago, 'in' only accepted one-character strings on the left and find() was the best alternative) or partition(). To the folks asking for it to stay because it's harmless: in py3k I want to rip out lots of "harmless" to make the language smaller. A smaller language is also a feature, and a very important one -- a frequent complaint I hear is that over time the language has lost some of its original smallness, which reduces some of the reasons why people were attracted to it in the first place. (Also, removing features makes room for new ones -- Bertrand Meyer, Eiffel's creator, often asks users demanding a new feature to point out which feature they are willing to drop to make room.) I don't want Python to become like Emacs, which I still use, but generally don't recommend to new developers any more... If you haven't grown up with it, its current state is hard to understand and hard to defend. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From steven.bethard at gmail.com Wed Aug 23 16:31:55 2006 From: steven.bethard at gmail.com (Steven Bethard) Date: Wed, 23 Aug 2006 08:31:55 -0600 Subject: [Python-3000] Droping find/rfind? In-Reply-To: References: <20060823044148.GR5772@performancedrivers.com> <0C9E2042-0040-4123-BA6D-2780FD49F194@python.org> Message-ID: On 8/23/06, Guido van Rossum wrote: > On 8/23/06, Barry Warsaw wrote: > > I agree with Tim -- if we have to get rid of one of them, let's get > > rid of index/rindex and keep find/rfind. Catching the exception is > > much less convenient than testing for -1. > > But the -1 is very error-prone, as many have experienced. Also, many > uses of find() should be replaced by 'in' (long ago, 'in' only > accepted one-character strings on the left and find() was the best > alternative) or partition(). FWLIW, I only started using Python at the tail end of 2.2, so the 'in' started working with substrings pretty early for me. I do a fair bit of work with text (my research is in natural language processing) and yet I have exactly zero instances of [r]find() in my code. So I at least wouldn't miss them if they were gone. STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From phd at mail2.phd.pp.ru Wed Aug 23 16:44:33 2006 From: phd at mail2.phd.pp.ru (Oleg Broytmann) Date: Wed, 23 Aug 2006 18:44:33 +0400 Subject: [Python-3000] Droping find/rfind? In-Reply-To: References: <20060823044148.GR5772@performancedrivers.com> <0C9E2042-0040-4123-BA6D-2780FD49F194@python.org> Message-ID: <20060823144432.GA10709@phd.pp.ru> On Wed, Aug 23, 2006 at 07:20:54AM -0700, Guido van Rossum wrote: > in py3k I > want to rip out lots of "harmless" to make the language smaller. A > smaller language is also a feature, and a very important one -- a > frequent complaint I hear is that over time the language has lost some > of its original smallness, which reduces some of the reasons why > people were attracted to it in the first place. IMHO find() is not a part of the language - it is a part of the standard library. When people complain about the *language* they AFAIU mean "print >>", [list comprehension], iterators, generators and (generator expressions), @decorators, "with", "case"... Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From guido at python.org Wed Aug 23 17:18:03 2006 From: guido at python.org (Guido van Rossum) Date: Wed, 23 Aug 2006 08:18:03 -0700 Subject: [Python-3000] Droping find/rfind? In-Reply-To: <20060823144432.GA10709@phd.pp.ru> References: <20060823044148.GR5772@performancedrivers.com> <0C9E2042-0040-4123-BA6D-2780FD49F194@python.org> <20060823144432.GA10709@phd.pp.ru> Message-ID: That's too narrow a view on the language. Surely the built-in types (especially those with direct compiler support, like literal notations) are part of the language. The people who complain most frequently about Python getting too big aren't language designers, they are users (e.g. scientists) and to them it doesn't matter what technically is or isn't in the language -- it's the complete set of tools they have to deal with. That doesn't include all of the standard library, but it surely includes the built-in types and their behavior! Otherwise the int/long and str/unicode unifications wouldn't be language changes either... -Guido On 8/23/06, Oleg Broytmann wrote: > On Wed, Aug 23, 2006 at 07:20:54AM -0700, Guido van Rossum wrote: > > in py3k I > > want to rip out lots of "harmless" to make the language smaller. A > > smaller language is also a feature, and a very important one -- a > > frequent complaint I hear is that over time the language has lost some > > of its original smallness, which reduces some of the reasons why > > people were attracted to it in the first place. > > IMHO find() is not a part of the language - it is a part of the standard > library. When people complain about the *language* they AFAIU mean "print >>", > [list comprehension], iterators, generators and (generator expressions), > @decorators, "with", "case"... > > Oleg. > -- > Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru > Programmers don't die, they just GOSUB without RETURN. > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From phd at mail2.phd.pp.ru Wed Aug 23 17:28:15 2006 From: phd at mail2.phd.pp.ru (Oleg Broytmann) Date: Wed, 23 Aug 2006 19:28:15 +0400 Subject: [Python-3000] Droping find/rfind? In-Reply-To: References: <20060823044148.GR5772@performancedrivers.com> <0C9E2042-0040-4123-BA6D-2780FD49F194@python.org> <20060823144432.GA10709@phd.pp.ru> Message-ID: <20060823152815.GA17442@phd.pp.ru> On Wed, Aug 23, 2006 at 08:18:03AM -0700, Guido van Rossum wrote: > That's too narrow a view on the language. I narrowed it by purpose for this discussion. > Surely the built-in types > (especially those with direct compiler support, like literal > notations) are part of the language. And still I believe they are two different markets, and you cannot trade features between them. I am sure it would be hard to by space for new language (in that narrow sense) features by removing methods from the standard types. Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From barry at python.org Wed Aug 23 17:52:35 2006 From: barry at python.org (Barry Warsaw) Date: Wed, 23 Aug 2006 11:52:35 -0400 Subject: [Python-3000] Droping find/rfind? In-Reply-To: References: <20060823044148.GR5772@performancedrivers.com> <0C9E2042-0040-4123-BA6D-2780FD49F194@python.org> <20060823144432.GA10709@phd.pp.ru> Message-ID: <13DEBA81-AE71-4E2C-BD5C-AC152747BFF2@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wed, 23 Aug 2006 08:18:03 -0700 "Guido van Rossum" wrote: > That's too narrow a view on the language. Surely the built-in types > (especially those with direct compiler support, like literal > notations) are part of the language. The people who complain most > frequently about Python getting too big aren't language designers, > they are users (e.g. scientists) and to them it doesn't matter what > technically is or isn't in the language -- it's the complete set of > tools they have to deal with. That doesn't include all of the standard > library, but it surely includes the built-in types and their behavior! > Otherwise the int/long and str/unicode unifications wouldn't be > language changes either... Oleg has a point though. Speaking generally, the perception of "bigness" comes down to how much you can -- and /have/ to -- keep in your head at one time while programming or reading code. Python's traditionally made excellent choices here. The language is small enough to keep in your head but the library is huge. I don't know about anybody else, but my aging brain can't keep much of the library in its RAM so I'm highly dependent on help() and the library reference manual to find things when I need them. But I almost never have to look up a particular language feature, and this was one of the primary reasons I switch from Perl to Python over a decade ago. To me, Python's growth with the last few releases is felt more deeply with language features than with library improvements. Features like list comprehensions, generators and generator expressions, and decorators have all been ingrained, and while originally felt "big" now are common tools I reach for and intuitively understand. Some of the 2.5 features such as 'with', relative imports, and condition expressions haven't reached that level of comfort and make Python feel "big" to me again. There are some counter examples: built-in sets, while making a library feature a built-in type, makes Python feel a bit smaller because sets are such a natural concept and code using them looks cleaner. For Python 3000, integrating ints and longs will definitely do this, as will (I suspect) making all strings unicode with a (probably rarely used) byte type. So the question is where string methods like index and find fall. To me, they don't feel like language features. Built-in types fall somewhere in between language features and library. Their /presence/ is a language feature but what you can do with them seems more library-ish to me. For me, the reason is that I can easily keep in my head that I have strings to represent text, ints, longs, etc. to represent numbers, sets, dicts, lists, and tuples to represent collections, etc. But I may not remember exactly how to use str.find() or dict.setdefault() because I use them more rarely (which doesn't mean they're unimportant!). I know they're there and I vaguely remember how to use them, so when I need them, it's off to the library reference or help() for a quick referesher. This suggests to me that a guiding principle ought to be reducing language features without losing important functionality, just as the int/long, str/unicode, all-newstyle classes work is doing. Here you're trying to polish the conceptual edges off the language, compound-W'ing the language warts, and generally streamlining the language so it can more easily fit in your head. Where it comes to the library, I think we ought to concentrate on reducing duplication. TOOWTDI. Get rid of the User* modules. If I need to do web-stuff, do I need urllib, urllib2, urlparse, or what? etc. As for the built-in types, let's reduce duplication here too, so if there's a better way of e.g. doing what find, rfind, index, and rindex do, then let's remove them and encourage the other uses. dict.has_key() is a perfect example here. 'in' replaces many of the use cases for str.find and friends, but not all. Maybe str.partition completes the picture, though I don't have enough experience with them to know. Anyway, enough blathering. Those are my thoughts. For this specific case, maybe we really don't need any of ?find() and ?index(), but if the choice comes down to one or the other, I still find catching the exception less convenient than checking a return value. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iQCVAwUBROx5w3EjvBPtnXfVAQI03QP/X9KyJabidsid1Vu01PWQZ0Op2ZvoMWyg b9VQrS94auA/AQD9zg6SoBQaPIIGLAWg6Oh4FjkiuuCwhsb96YHjGdiSE510VfjW R6qXg9beWTaafJVtzkjCLn0Gu+H5R9EdWnLGvwdVvF2ASPwfrZ2N0G6k/daQlCNk 3G5ucal/Jug= =vwWM -----END PGP SIGNATURE----- From steven.bethard at gmail.com Wed Aug 23 18:05:11 2006 From: steven.bethard at gmail.com (Steven Bethard) Date: Wed, 23 Aug 2006 10:05:11 -0600 Subject: [Python-3000] DictMixin (WAS: Droping find/rfind?) Message-ID: On 8/23/06, Barry Warsaw wrote: > Where it comes to the library, I think we ought to concentrate on > reducing duplication. TOOWTDI. Get rid of the User* modules. Generally a good idea, but we still need somewhere to put DictMixin. It's too bad you can't just use the unbound methods like:: dict.update(dict-like-object, *args, **kwargs) or we could drop DictMixin entirely. Steve -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From jimjjewett at gmail.com Wed Aug 23 19:08:57 2006 From: jimjjewett at gmail.com (Jim Jewett) Date: Wed, 23 Aug 2006 13:08:57 -0400 Subject: [Python-3000] DictMixin (WAS: Droping find/rfind?) In-Reply-To: References: Message-ID: On 8/23/06, Barry Warsaw wrote: > Where it comes to the library, I think we ought to concentrate on > reducing duplication. TOOWTDI. Get rid of the User* modules. Until it is possible to inherit from multiple extension types, there will be a need to mimic inheritance with delegation; User* provides a useful pattern. -jJ From jjl at pobox.com Wed Aug 23 19:47:14 2006 From: jjl at pobox.com (John J Lee) Date: Wed, 23 Aug 2006 18:47:14 +0100 (GMT Standard Time) Subject: [Python-3000] Droping find/rfind? In-Reply-To: References: <20060822191712.1A39.JCARLSON@uci.edu> <44EC0524.2060206@canterbury.ac.nz> Message-ID: On Wed, 23 Aug 2006, John J Lee wrote: [...] >> An alternative would be to return None for not found. >> It wouldn't solve the problem of people using the >> return value as a boolean, but at least you'd get >> an exception if you tried to use the not-found value >> as an index. >> >> Or maybe it could return index values as a special >> int subclass that always tests true even when it's >> zero... > > How about returning a str.NotFound object? Whoops, scratch that, doesn't solve anything more than returning None. John From steven.bethard at gmail.com Wed Aug 23 20:29:26 2006 From: steven.bethard at gmail.com (Steven Bethard) Date: Wed, 23 Aug 2006 12:29:26 -0600 Subject: [Python-3000] Droping find/rfind? In-Reply-To: <0C9E2042-0040-4123-BA6D-2780FD49F194@python.org> References: <20060823044148.GR5772@performancedrivers.com> <0C9E2042-0040-4123-BA6D-2780FD49F194@python.org> Message-ID: On 8/23/06, Barry Warsaw wrote: > I agree with Tim -- if we have to get rid of one of them, let's get > rid of index/rindex and keep find/rfind. Catching the exception is > much less convenient than testing for -1. Could you post a simple example or two? I keep imagining things like:: index = text.index(...) if 0 <= index: ... do something with index ... else: ... which looks about the same as:: try: index = text.index(...) ... do something with index ... except ValueError: ... Is it just that a lot of the else clauses are empty? STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From jcarlson at uci.edu Wed Aug 23 20:52:54 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Wed, 23 Aug 2006 11:52:54 -0700 Subject: [Python-3000] Droping find/rfind? In-Reply-To: References: <0C9E2042-0040-4123-BA6D-2780FD49F194@python.org> Message-ID: <20060823114629.1A57.JCARLSON@uci.edu> "Steven Bethard" wrote: > > On 8/23/06, Barry Warsaw wrote: > > I agree with Tim -- if we have to get rid of one of them, let's get > > rid of index/rindex and keep find/rfind. Catching the exception is > > much less convenient than testing for -1. > > Could you post a simple example or two? I keep imagining things like:: > > index = text.index(...) > if 0 <= index: > ... do something with index ... > else: > ... A more-often-used style is... index = text.find(...) if index >= 0: ... Compare this with the use of index: try: index = text.index(...) except ValueError: pass else: ... or even index = 0 while 1: index = text.find(..., index) if index == -1: break ... compared with index = 0 while 1: try: index = text.index(..., index) except ValueError: break ... > try: > index = text.index(...) > ... do something with index ... > except ValueError: > ... In these not uncommon cases, the use of str.index and having to catch ValueError is cumbersome (in terms of typing, indentation, etc.), and is about as susceptible to bugs as str.find, which you have shown by putting "... do something with index ..." in the try clause, rather than the else clause. - Josiah From steven.bethard at gmail.com Wed Aug 23 21:07:49 2006 From: steven.bethard at gmail.com (Steven Bethard) Date: Wed, 23 Aug 2006 13:07:49 -0600 Subject: [Python-3000] Droping find/rfind? In-Reply-To: <20060823114629.1A57.JCARLSON@uci.edu> References: <0C9E2042-0040-4123-BA6D-2780FD49F194@python.org> <20060823114629.1A57.JCARLSON@uci.edu> Message-ID: Steven Bethard wrote: > Could you post a simple example or two? Josiah Carlson wrote: > index = text.find(...) > if index >= 0: > ... > [snip] > index = 0 > while 1: > index = text.find(..., index) > if index == -1: > break > ... > Thanks. So with your search() function, these would be something like: indices = text.search(pattern, count=1) if indices: index, = indices ... and for index in text.search(pattern): ... if I understood the proposal right. Steve -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From paul at prescod.net Wed Aug 23 21:12:31 2006 From: paul at prescod.net (Paul Prescod) Date: Wed, 23 Aug 2006 12:12:31 -0700 Subject: [Python-3000] Droping find/rfind? In-Reply-To: References: <20060823044148.GR5772@performancedrivers.com> <0C9E2042-0040-4123-BA6D-2780FD49F194@python.org> Message-ID: <1cb725390608231212x1fbd3492jb4e9e3f0fcccee77@mail.gmail.com> Just throwing it out but what about something like: found, index = text.index("abc") if found: doSomething(index) If you were confident that the index was in there you would do something more like this: something = text[text.index("abc")[1]:] (although there are clearer ways to do that) On 8/23/06, Steven Bethard wrote: > > On 8/23/06, Barry Warsaw wrote: > > I agree with Tim -- if we have to get rid of one of them, let's get > > rid of index/rindex and keep find/rfind. Catching the exception is > > much less convenient than testing for -1. > > Could you post a simple example or two? I keep imagining things like:: > > index = text.index(...) > if 0 <= index: > ... do something with index ... > else: > ... > > which looks about the same as:: > > try: > index = text.index(...) > ... do something with index ... > except ValueError: > ... > > Is it just that a lot of the else clauses are empty? > > STeVe > -- > I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a > tiny blip on the distant coast of sanity. > --- Bucky Katt, Get Fuzzy > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/paul%40prescod.net > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060823/71692915/attachment.html From g.brandl at gmx.net Wed Aug 23 21:36:12 2006 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 23 Aug 2006 21:36:12 +0200 Subject: [Python-3000] Droping find/rfind? In-Reply-To: References: <0C9E2042-0040-4123-BA6D-2780FD49F194@python.org> <20060823114629.1A57.JCARLSON@uci.edu> Message-ID: Steven Bethard wrote: > Steven Bethard wrote: >> Could you post a simple example or two? > > Josiah Carlson wrote: >> index = text.find(...) >> if index >= 0: >> ... >> > [snip] >> index = 0 >> while 1: >> index = text.find(..., index) >> if index == -1: >> break >> ... >> > > Thanks. So with your search() function, these would be something like: > > indices = text.search(pattern, count=1) > if indices: > index, = indices > ... Or even indices = text.search(pattern, count=1) for index in indices: ... Georg From g.brandl at gmx.net Wed Aug 23 21:36:50 2006 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 23 Aug 2006 21:36:50 +0200 Subject: [Python-3000] Droping find/rfind? In-Reply-To: <1cb725390608231212x1fbd3492jb4e9e3f0fcccee77@mail.gmail.com> References: <20060823044148.GR5772@performancedrivers.com> <0C9E2042-0040-4123-BA6D-2780FD49F194@python.org> <1cb725390608231212x1fbd3492jb4e9e3f0fcccee77@mail.gmail.com> Message-ID: Paul Prescod wrote: > Just throwing it out but what about something like: > > found, index = text.index("abc") > > if found: > doSomething(index) -1. str.index()'s semantics should not be different from list.index(). Georg From jcarlson at uci.edu Wed Aug 23 21:56:21 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Wed, 23 Aug 2006 12:56:21 -0700 Subject: [Python-3000] Droping find/rfind? In-Reply-To: References: <20060823114629.1A57.JCARLSON@uci.edu> Message-ID: <20060823123719.1A5D.JCARLSON@uci.edu> "Steven Bethard" wrote: > Steven Bethard wrote: > > Could you post a simple example or two? > > Josiah Carlson wrote: > > index = text.find(...) > > if index >= 0: > > ... > > > [snip] > > index = 0 > > while 1: > > index = text.find(..., index) > > if index == -1: > > break > > ... > > Thanks. So with your search() function, these would be something like: > > indices = text.search(pattern, count=1) > if indices: > index, = indices > ... > > and > > for index in text.search(pattern): > ... > > if I understood the proposal right. Yes, you understood my (strawman) proposal correctly. The former could even be shortened to: for index in text.search(pattern, count=1): ... ... if there wasn't an else clause in the original search. Note that my point in the proposing of search was to say: 1. [r]index is cumbersome 2. [r]find can be error-prone for newbies due to the -1 return 3. the functionality seems to be useful (otherwise neither would exist) 4. let us unambiguate [r]find if possible, because it is the better of the two (in my opinion) 5. or instead of 4, replace both of them with searh People seem to like the #5 option, even though it was not my intent by posting search originally. Given that some people like it, I'm now of the opinion that if [r]find is going, then certainly [r]index should go because it suffers from being more cumbersome to use and has a similar class of bugs, and if both go, then we should have something to replace them. As a replacement, search lacks the exception annoyance of index, has an unambiguous return value, and naturally supports iterative find calls. Given search as a potential replacement, about the only question is whether count should default to sys.maxint or 1. The original description included count=sys.maxint, but if we want to use it as a somewhat drop-in replacement for find and index, then it would make more sense for it to have count=1 as a default, with some easy to access count argument to make it find all of them. - Josiah From jcarlson at uci.edu Wed Aug 23 22:22:42 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Wed, 23 Aug 2006 13:22:42 -0700 Subject: [Python-3000] [Python-Dev] What should the focus for 2.6 be? In-Reply-To: References: <20060821081944.1A0F.JCARLSON@uci.edu> Message-ID: <20060823125951.1A60.JCARLSON@uci.edu> "Guido van Rossum" wrote: > And yet offense is taken. Have you watched the video of my Py3k talk? > Search for it on Google Video. I spent some time yesterday and watched it. All I was proposing is that similar to Perl 5 and 6, users of Python 2.x may not feel an overwhelming desire to move to Python 3.x, because there will be so many incompatabilities. I understand that the point of Python 3.x is to allow for a one-time (at least for now) breakage of the backwards compatability of the language to get rid of the crap; "Backwards incompatible changes are allowed in Python 3000, but not to excess." While each individual change to the language is relatively minor by itself, putting them all together is effectively one big backwards incompatible change. Take the standard library reorganization for example. I am 100% in favor of reorganizing it, but if it is all moved at once, then people can't write code for the future, until it arrives. But if we were to create a mapping of new names -> old names, then an import hook could be written, and people could start using the new package names in 2.6 . The intent of my post was to say that all of us want Py3k to succeed, but I believe that in order for it to succeed that breakage from the 2.x series should be gradual, in a similar way to how 2.x -> 2.x+1 breakage has been gradual. I believe we agree on this basic point except for one thing; according to your talk and your posts here, you want Py3k alpha in the next year or two, while I'm thinking that Py3k alpha should come somewhere after 2.6 and probably 2.7, maybe even after 2.8 or 2.9, depending on how quickly the 2.x series is transitioned. Having a Py3k in development really just makes maintenance (bug fixing, etc.) more of a burdon. > Perhaps you want to help write the transition PEP? I'll see what I can hack up next week (I have an advancement talk tomorrow that I really should be preparing for). - Josiah From bjourne at gmail.com Wed Aug 23 23:01:23 2006 From: bjourne at gmail.com (=?ISO-8859-1?Q?BJ=F6rn_Lindqvist?=) Date: Wed, 23 Aug 2006 23:01:23 +0200 Subject: [Python-3000] Droping find/rfind? In-Reply-To: <20060823114629.1A57.JCARLSON@uci.edu> References: <0C9E2042-0040-4123-BA6D-2780FD49F194@python.org> <20060823114629.1A57.JCARLSON@uci.edu> Message-ID: <740c3aec0608231401q18ca271o72157213855e7e17@mail.gmail.com> On 8/23/06, Josiah Carlson wrote: > or even > > index = 0 > while 1: > index = text.find(..., index) > if index == -1: > break > ... > compared with > > index = 0 > while 1: > try: > index = text.index(..., index) > except ValueError: > break > ... You are supposed to use the in operator: index = 0 while 1: if not "something" in text[index:]: break IMHO, removing find() is good because index() does the same job without violating the Samurai Principle (http://c2.com/cgi/wiki?SamuraiPrinciple). It would be interesting to see the patch that replaced find() with index(), did it really make the code more cumbersome? -- mvh Bj?rn From guido at python.org Wed Aug 23 23:18:59 2006 From: guido at python.org (Guido van Rossum) Date: Wed, 23 Aug 2006 14:18:59 -0700 Subject: [Python-3000] find -> index patch Message-ID: Here's the patch (by Hasan Diwan, BTW) for people's perusal. It just gets rid of all *uses* of find/rfind from Lib; it doesn't actually modify stringobject.c or unicodeobject.c. It doesn't use [r]partition()'; someone could look for opportunities to use that separately. -- --Guido van Rossum (home page: http://www.python.org/~guido/) -------------- next part -------------- A non-text attachment was scrubbed... Name: rfind2rindex_find2index.pat Type: application/octet-stream Size: 80147 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20060823/a8f58010/attachment-0001.obj From jack at psynchronous.com Wed Aug 23 23:39:25 2006 From: jack at psynchronous.com (Jack Diederich) Date: Wed, 23 Aug 2006 17:39:25 -0400 Subject: [Python-3000] find -> index patch In-Reply-To: References: Message-ID: <20060823213924.GS5772@performancedrivers.com> On Wed, Aug 23, 2006 at 02:18:59PM -0700, Guido van Rossum wrote: > Here's the patch (by Hasan Diwan, BTW) for people's perusal. It just > gets rid of all *uses* of find/rfind from Lib; it doesn't actually > modify stringobject.c or unicodeobject.c. It doesn't use > [r]partition()'; someone could look for opportunities to use that > separately. > Is this a machine generated patch? Changing all calls to find with try: i = text.index(sep) except: i = -1 has a Yuck factor of -1000. Some of the excepts specify ValueError, but still. -Jack From jcarlson at uci.edu Wed Aug 23 23:48:40 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Wed, 23 Aug 2006 14:48:40 -0700 Subject: [Python-3000] find -> index patch In-Reply-To: References: Message-ID: <20060823143606.1A66.JCARLSON@uci.edu> "Guido van Rossum" wrote: > Here's the patch (by Hasan Diwan, BTW) for people's perusal. It just > gets rid of all *uses* of find/rfind from Lib; it doesn't actually > modify stringobject.c or unicodeobject.c. It doesn't use > [r]partition()'; someone could look for opportunities to use that > separately. > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) There's a bug in the Lib/idlelib/configHandler.py patch, likely 6 unintend bugs exposed in Lib/idlelib/PyParse.py (which are made worse by the patch), Lib/idlelib/CallTips.py is broken, 4 examples in Lib/ihooks.py don't require the try/except clause (it is prefixed with a containment test), Lib/cookielib.py has two new bugs, ... I stopped at Lib/string.py Also, there are inconsistant uses of bare except and except ValueError clauses. The patch shouldn't be applied for many reasons, not the least of which is because it breaks currently working code, it offers poorly-styled code of the form: try:... = str.index(...) except:...=-1 ...that looks to have been done by a script, it has inconsistant style compared to the code it replaces, etc. - Josiah From jcarlson at uci.edu Wed Aug 23 23:53:05 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Wed, 23 Aug 2006 14:53:05 -0700 Subject: [Python-3000] Droping find/rfind? In-Reply-To: <740c3aec0608231401q18ca271o72157213855e7e17@mail.gmail.com> References: <20060823114629.1A57.JCARLSON@uci.edu> <740c3aec0608231401q18ca271o72157213855e7e17@mail.gmail.com> Message-ID: <20060823143116.1A63.JCARLSON@uci.edu> "BJ?rn Lindqvist" wrote: > > On 8/23/06, Josiah Carlson wrote: > > > or even > > > > index = 0 > > while 1: > > index = text.find(..., index) > > if index == -1: > > break > > ... > > compared with > > > > index = 0 > > while 1: > > try: > > index = text.index(..., index) > > except ValueError: > > break > > ... > > You are supposed to use the in operator: > > index = 0 > while 1: > if not "something" in text[index:]: > break This can also lead to O(n^2) running time, causes unnecessary string allocation, memory copies, etc. If I saw that in real code, I'd probably lose respect for the author of that module and/or package. > IMHO, removing find() is good because index() does the same job > without violating the Samurai Principle > (http://c2.com/cgi/wiki?SamuraiPrinciple). It would be interesting to > see the patch that replaced find() with index(), did it really make > the code more cumbersome? Everywhere there is a test for index==str.find(...), needs to be replaced with a try/except clause. That's a cumbersome translation if there ever was one. - Josiah From hasan.diwan at gmail.com Thu Aug 24 00:09:35 2006 From: hasan.diwan at gmail.com (Hasan Diwan) Date: Wed, 23 Aug 2006 15:09:35 -0700 Subject: [Python-3000] find -> index patch In-Reply-To: <20060823143606.1A66.JCARLSON@uci.edu> References: <20060823143606.1A66.JCARLSON@uci.edu> Message-ID: <2cda2fc90608231509n7dc5a47bg5adfd2b790e29681@mail.gmail.com> On 23/08/06, Josiah Carlson wrote: > > ...that looks to have been done by a script, it has inconsistant style > compared to the code it replaces, etc. > I made the minimal change that implements the functionality suggested, in terms of find/rfind, they return -1. The least painful way to replace it with index is: try: i=str.index(foo) except ValueError: i = -1 As for the plain except clauses, that was just laziness on my part. It's not meant to be stylistically consistent or beautiful, rather it is meant to be functional and as a starting point. Feel free to change/rewrite the patch. The GENERAL CASE, i.e. one that is applicable throughout the code is the try/except clauses shown above. -- Cheers, Hasan Diwan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060823/6e57fa69/attachment.html From g.brandl at gmx.net Thu Aug 24 00:52:16 2006 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 24 Aug 2006 00:52:16 +0200 Subject: [Python-3000] find -> index patch In-Reply-To: <20060823143606.1A66.JCARLSON@uci.edu> References: <20060823143606.1A66.JCARLSON@uci.edu> Message-ID: Josiah Carlson wrote: > "Guido van Rossum" wrote: >> Here's the patch (by Hasan Diwan, BTW) for people's perusal. It just >> gets rid of all *uses* of find/rfind from Lib; it doesn't actually >> modify stringobject.c or unicodeobject.c. It doesn't use >> [r]partition()'; someone could look for opportunities to use that >> separately. >> >> -- >> --Guido van Rossum (home page: http://www.python.org/~guido/) > > There's a bug in the Lib/idlelib/configHandler.py patch, likely 6 > unintend bugs exposed in Lib/idlelib/PyParse.py (which are made worse by > the patch), Are the bugs there in current code too? You should then report them. > Lib/idlelib/CallTips.py is broken, 4 examples in > Lib/ihooks.py don't require the try/except clause (it is prefixed with a > containment test), Lib/cookielib.py has two new bugs, ... > > I stopped at Lib/string.py > > Also, there are inconsistant uses of bare except and except ValueError > clauses. Not speaking of the inconsistent use of spaces vs. tabs ;) Another newly-introduced bug: - p = str.rfind('\n', 0, p-1) + 1 + try:p = str.rindex('\n', 0, p-1) + 1 + except:p=-1 Georg From jcarlson at uci.edu Thu Aug 24 01:30:37 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Wed, 23 Aug 2006 16:30:37 -0700 Subject: [Python-3000] find -> index patch In-Reply-To: References: <20060823143606.1A66.JCARLSON@uci.edu> Message-ID: <20060823162756.1A6C.JCARLSON@uci.edu> Georg Brandl wrote: > > Josiah Carlson wrote: > > "Guido van Rossum" wrote: > >> Here's the patch (by Hasan Diwan, BTW) for people's perusal. It just > >> gets rid of all *uses* of find/rfind from Lib; it doesn't actually > >> modify stringobject.c or unicodeobject.c. It doesn't use > >> [r]partition()'; someone could look for opportunities to use that > >> separately. > >> > >> -- > >> --Guido van Rossum (home page: http://www.python.org/~guido/) > > > > There's a bug in the Lib/idlelib/configHandler.py patch, likely 6 > > unintend bugs exposed in Lib/idlelib/PyParse.py (which are made worse by > > the patch), > > Are the bugs there in current code too? You should then report them. Maybe, maybe not. I'll have to look (but not today). > > Lib/idlelib/CallTips.py is broken, 4 examples in > > Lib/ihooks.py don't require the try/except clause (it is prefixed with a > > containment test), Lib/cookielib.py has two new bugs, ... > > > > I stopped at Lib/string.py > > > > Also, there are inconsistant uses of bare except and except ValueError > > clauses. > > Not speaking of the inconsistent use of spaces vs. tabs ;) > > Another newly-introduced bug: > > - p = str.rfind('\n', 0, p-1) + 1 > + try:p = str.rindex('\n', 0, p-1) + 1 > + except:p=-1 That was the "likely 6 unintended bugs in Lib/idlelib/PyParse.py". - Josiah From jcarlson at uci.edu Thu Aug 24 01:39:03 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Wed, 23 Aug 2006 16:39:03 -0700 Subject: [Python-3000] find -> index patch In-Reply-To: <2cda2fc90608231509n7dc5a47bg5adfd2b790e29681@mail.gmail.com> References: <20060823143606.1A66.JCARLSON@uci.edu> <2cda2fc90608231509n7dc5a47bg5adfd2b790e29681@mail.gmail.com> Message-ID: <20060823163043.1A6F.JCARLSON@uci.edu> "Hasan Diwan" wrote: > On 23/08/06, Josiah Carlson wrote: > > > > ...that looks to have been done by a script, it has inconsistant style > > compared to the code it replaces, etc. > > > > I made the minimal change that implements the functionality suggested, in > terms of find/rfind, they return -1. The least painful way to replace it > with index is: > > try: > i=str.index(foo) > except ValueError: > i = -1 > > As for the plain except clauses, that was just laziness on my part. It's not > meant to be stylistically consistent or beautiful, rather it is meant to be > functional and as a starting point. Feel free to change/rewrite the patch. > The GENERAL CASE, i.e. one that is applicable throughout the code is the > try/except clauses shown above. If find is to be replaced, it should be replaced with something that isn't as cumbersome to use as index, and shouldn't be done in a bulk replacement attempt; as you have also shown that doing such can lead to unintended new bugs and the possible perpetuation of old bugs. When Raymond Hettinger did the same thing to replace some examples of find with partition, in my first pass over his proposed patch, I also discovered a handful of new and perpetuated bugs, which was in a similar skimming of the patches. I'm also not going to fix the patch because I don't believe that replacing find with index is the correct course of action, for the few reasons I've laid out in the current and previous messages on the topic. - Josiah From tjreedy at udel.edu Thu Aug 24 02:03:38 2006 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 23 Aug 2006 20:03:38 -0400 Subject: [Python-3000] [Python-Dev] What should the focus for 2.6 be? References: <20060821081944.1A0F.JCARLSON@uci.edu> <20060823125951.1A60.JCARLSON@uci.edu> Message-ID: "Josiah Carlson" wrote in message news:20060823125951.1A60.JCARLSON at uci.edu... > The intent of my post was to say that all of us want Py3k to succeed, I should hope that we all do. > but I believe that in order for it to succeed that breakage from the 2.x > series should be gradual, in a similar way to how 2.x -> 2.x+1 breakage > has been gradual. Given that the rate of intentional breakage in the core language (including builtins) has been very minimal, this would take a couple of decades, which to my mind would be a failure. > I believe we agree on this basic point To the contrary, you seem to have a basic disagreement with the plan to make all the core language changes at once and to clear the decks of old baggage so we can move forward with a learner language that is a bit easier to learn and remember. > according to your talk and your posts here, you want Py3k alpha > in the next year or two, while I'm thinking that Py3k alpha should come > somewhere after 2.6 and probably 2.7, maybe even after 2.8 or 2.9, Whereas I wish it were already out and would be delighted to see it early next year. Some of the changes have already been put off for at least five years and, to me, are overdue. Terry Jan Reedy From tjreedy at udel.edu Thu Aug 24 02:27:26 2006 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 23 Aug 2006 20:27:26 -0400 Subject: [Python-3000] Droping find/rfind? References: <20060823044148.GR5772@performancedrivers.com> Message-ID: "Brian Holmes" wrote in message news:e3c648160608222346g4587d55eiff521787ca4d915f at mail.gmail.com... >Even after reading Terry Reedy's arguments, I don't see why we need to > >remove this option. Since this is my first post in this current thread, you either meant someone else or are remembering my posts about in- and out-of-band error signaling from the last time we discussed this. > Let both exist. I'd prefer grandfathering something like this and > leaving it >in, even if it wouldn't be there had known everything from > the start. One point of the 3.0 cleanup is to remove or change things that we definitely would not do today. When I learned Python, both the find/match duplication and the in-band same-type Unix/Cism -1 return stuck out to me like sore thumbs. So I would either 1. just remove find() and leave match(); or 2. change find()'s error return to None, and remove index(); or possibly consider Josiah's idea of 3. remove both in favor of an index generator. I am strongly -1 on leaving both as are. Terry Jan Reedy From jack at psynchronous.com Thu Aug 24 02:39:48 2006 From: jack at psynchronous.com (Jack Diederich) Date: Wed, 23 Aug 2006 20:39:48 -0400 Subject: [Python-3000] find -> index patch In-Reply-To: References: Message-ID: <20060824003948.GT5772@performancedrivers.com> On Wed, Aug 23, 2006 at 02:18:59PM -0700, Guido van Rossum wrote: > Here's the patch (by Hasan Diwan, BTW) for people's perusal. It just > gets rid of all *uses* of find/rfind from Lib; it doesn't actually > modify stringobject.c or unicodeobject.c. It doesn't use > [r]partition()'; someone could look for opportunities to use that > separately. I make a go at doing an idiomatic convertion of the first few modules tagged by 'grep find( *.py' in Lib, patch attached. WOW, I love partition. In all the instances that weren't a simple "in" test I ended up using [r]partition. In some cases one of the returned strings gets thrown away but in those cases it is guaranteed to be small. The new code is usually smaller than the old and generally clearer. ex/ cgi.py - i = p.find('=') - if i >= 0: - name = p[:i].strip().lower() - value = p[i+1:].strip() + (name, sep_found, value) = p.partition('=') + if (sep_found): + name = name.strip().lower() + value = value.strip() If folks like the way this partial set looks I'll convert the rest. -Jack -------------- next part -------------- Index: Lib/CGIHTTPServer.py =================================================================== --- Lib/CGIHTTPServer.py (revision 51530) +++ Lib/CGIHTTPServer.py (working copy) @@ -106,16 +106,9 @@ def run_cgi(self): """Execute a CGI script.""" dir, rest = self.cgi_info - i = rest.rfind('?') - if i >= 0: - rest, query = rest[:i], rest[i+1:] - else: - query = '' - i = rest.find('/') - if i >= 0: - script, rest = rest[:i], rest[i:] - else: - script, rest = rest, '' + (rest, sep, query) = rest.rpartition('?') + (rest, sep, script) = rest.partition('/') + rest = sep + rest # keep the slash scriptname = dir + '/' + script scriptfile = self.translate_path(scriptname) if not os.path.exists(scriptfile): Index: Lib/asynchat.py =================================================================== --- Lib/asynchat.py (revision 51530) +++ Lib/asynchat.py (working copy) @@ -125,14 +125,13 @@ # collect data to the prefix # 3) end of buffer does not match any prefix: # collect data - terminator_len = len(terminator) - index = self.ac_in_buffer.find(terminator) - if index != -1: + (data, term_found, more_data) = self.ac_in_buffer.partition(terminator) + if term_found: # we found the terminator - if index > 0: + if data: # don't bother reporting the empty string (source of subtle bugs) - self.collect_incoming_data (self.ac_in_buffer[:index]) - self.ac_in_buffer = self.ac_in_buffer[index+terminator_len:] + self.collect_incoming_data(data) + self.ac_in_buffer = more_data # This does the Right Thing if the terminator is changed here. self.found_terminator() else: Index: Lib/cookielib.py =================================================================== --- Lib/cookielib.py (revision 51530) +++ Lib/cookielib.py (working copy) @@ -531,8 +531,10 @@ return True if not is_HDN(A): return False - i = A.rfind(B) - if i == -1 or i == 0: + if (not B): + return False + (before_B, sep, after_B) = A.rpartition(B) + if not sep or not before_B: # A does not have form NB, or N is the empty string return False if not B.startswith("."): @@ -595,7 +597,7 @@ """ erhn = req_host = request_host(request) - if req_host.find(".") == -1 and not IPV4_RE.search(req_host): + if "." not in req_host and not IPV4_RE.search(req_host): erhn = req_host + ".local" return req_host, erhn @@ -616,16 +618,12 @@ def request_port(request): host = request.get_host() - i = host.find(':') - if i >= 0: - port = host[i+1:] - try: - int(port) - except ValueError: - _debug("nonnumeric port: '%s'", port) - return None - else: - port = DEFAULT_HTTP_PORT + port = host.partition(':')[-1] or DEFAULT_HTTP_PORT + try: + int(port) + except ValueError: + _debug("nonnumeric port: '%s'", port) + return None return port # Characters in addition to A-Z, a-z, 0-9, '_', '.', and '-' that don't @@ -676,13 +674,9 @@ '.local' """ - i = h.find(".") - if i >= 0: - #a = h[:i] # this line is only here to show what a is - b = h[i+1:] - i = b.find(".") - if is_HDN(h) and (i >= 0 or b == "local"): - return "."+b + (a, sep, b) = h.partition(".") + if sep and is_HDN(h) and ("." in b or b == "local"): + return "."+b return h def is_third_party(request): @@ -986,11 +980,9 @@ # XXX This should probably be compared with the Konqueror # (kcookiejar.cpp) and Mozilla implementations, but it's a # losing battle. - i = domain.rfind(".") - j = domain.rfind(".", 0, i) - if j == 0: # domain like .foo.bar - tld = domain[i+1:] - sld = domain[j+1:i] + (extra, dot, tld) = domain.rpartition(".") + (extra, dot, sld) = extra.rpartition(".") + if not extra: # domain like .foo.bar if sld.lower() in ("co", "ac", "com", "edu", "org", "net", "gov", "mil", "int", "aero", "biz", "cat", "coop", "info", "jobs", "mobi", "museum", "name", "pro", @@ -1002,7 +994,7 @@ undotted_domain = domain[1:] else: undotted_domain = domain - embedded_dots = (undotted_domain.find(".") >= 0) + embedded_dots = ("." in undotted_domain) if not embedded_dots and domain != ".local": _debug(" non-local domain %s contains no embedded dot", domain) @@ -1024,8 +1016,7 @@ if (cookie.version > 0 or (self.strict_ns_domain & self.DomainStrictNoDots)): host_prefix = req_host[:-len(domain)] - if (host_prefix.find(".") >= 0 and - not IPV4_RE.search(req_host)): + if ("." in host_prefix and not IPV4_RE.search(req_host)): _debug(" host prefix %s for domain %s contains a dot", host_prefix, domain) return False @@ -1462,13 +1453,13 @@ else: path_specified = False path = request_path(request) - i = path.rfind("/") - if i != -1: + (path, sep, dummy) = path.rpartition("/") + if sep: if version == 0: # Netscape spec parts company from reality here - path = path[:i] + pass else: - path = path[:i+1] + path = path + sep if len(path) == 0: path = "/" # set default domain Index: Lib/cgi.py =================================================================== --- Lib/cgi.py (revision 51530) +++ Lib/cgi.py (working copy) @@ -340,10 +340,10 @@ key = plist.pop(0).lower() pdict = {} for p in plist: - i = p.find('=') - if i >= 0: - name = p[:i].strip().lower() - value = p[i+1:].strip() + (name, sep_found, value) = p.partition('=') + if (sep_found): + name = name.strip().lower() + value = value.strip() if len(value) >= 2 and value[0] == value[-1] == '"': value = value[1:-1] value = value.replace('\\\\', '\\').replace('\\"', '"') Index: Lib/ConfigParser.py =================================================================== --- Lib/ConfigParser.py (revision 51530) +++ Lib/ConfigParser.py (working copy) @@ -468,9 +468,9 @@ if vi in ('=', ':') and ';' in optval: # ';' is a comment delimiter only if it follows # a spacing character - pos = optval.find(';') - if pos != -1 and optval[pos-1].isspace(): - optval = optval[:pos] + (new_optval, sep, comment) = optval.partition(';') + if (sep and new_optval[-1:].isspace()): + optval = new_optval optval = optval.strip() # allow empty values if optval == '""': @@ -599,14 +599,13 @@ if depth > MAX_INTERPOLATION_DEPTH: raise InterpolationDepthError(option, section, rest) while rest: - p = rest.find("%") - if p < 0: + (before, sep, after) = rest.partition('%') + if (not sep): accum.append(rest) return - if p > 0: - accum.append(rest[:p]) - rest = rest[p:] - # p is no longer used + elif (after): + accum.append(before) + rest = sep + after c = rest[1:2] if c == "%": accum.append("%") From jimjjewett at gmail.com Thu Aug 24 03:10:40 2006 From: jimjjewett at gmail.com (Jim Jewett) Date: Wed, 23 Aug 2006 21:10:40 -0400 Subject: [Python-3000] Droping find/rfind? In-Reply-To: References: <20060823044148.GR5772@performancedrivers.com> Message-ID: On 8/23/06, Terry Reedy wrote: > 2. change find()'s error return to None, and remove index(); +1 It is particularly unfortunate that the error code of -1 is a valid index. >>> substring = string[string.find(marker):] will silently produce garbage. > or possibly consider Josiah's idea of > 3. remove both in favor of an index generator. The strawman seemed clumsy, but maybe it will grow on me. -jJ From greg.ewing at canterbury.ac.nz Thu Aug 24 03:36:25 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 24 Aug 2006 13:36:25 +1200 Subject: [Python-3000] Droping find/rfind? In-Reply-To: <20060823123719.1A5D.JCARLSON@uci.edu> References: <20060823114629.1A57.JCARLSON@uci.edu> <20060823123719.1A5D.JCARLSON@uci.edu> Message-ID: <44ED0299.7040204@canterbury.ac.nz> Josiah Carlson wrote: > Given search as a potential replacement, about the only question is > whether count should default to sys.maxint or 1. Do you think that there will be many use cases for count values *other* than 1 or sys.maxint? If not, it might be more sensible to have two functions, search() and searchall(). And while we're on this, what about list.index? Should it also be replaced with list.search or whatever as well? -- Greg From tdelaney at avaya.com Thu Aug 24 04:05:08 2006 From: tdelaney at avaya.com (Delaney, Timothy (Tim)) Date: Thu, 24 Aug 2006 12:05:08 +1000 Subject: [Python-3000] Droping find/rfind? Message-ID: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com> Nick Coghlan wrote: > I also like Josiah's idea of replacing find() with a search() method > that returned an iterator of indices, so that you can do: > > for idx in string.search(sub): > # Process the indices (if any) Need to be careful with this - the original search proposal returned a list, which could be tested for a boolean value - hence: if not string.search(sub): pass but if an iterator were returned, I think we would want to be able to perform the same test i.e. search would have to return an iterator that had already performed the initial search, with __nonzero__ reflecting the result of that search. I do think that returning an iterator is better due to the fact that most uses of search() would only care about the first returned index. Tim Delaney From jcarlson at uci.edu Thu Aug 24 04:14:08 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Wed, 23 Aug 2006 19:14:08 -0700 Subject: [Python-3000] Droping find/rfind? In-Reply-To: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com> References: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com> Message-ID: <20060823191222.1A76.JCARLSON@uci.edu> "Delaney, Timothy (Tim)" wrote: > > Nick Coghlan wrote: > > > I also like Josiah's idea of replacing find() with a search() method > > that returned an iterator of indices, so that you can do: > > > > for idx in string.search(sub): > > # Process the indices (if any) > > Need to be careful with this - the original search proposal returned a > list, which could be tested for a boolean value - hence: > > if not string.search(sub): > pass > > but if an iterator were returned, I think we would want to be able to > perform the same test i.e. search would have to return an iterator that > had already performed the initial search, with __nonzero__ reflecting > the result of that search. I do think that returning an iterator is > better due to the fact that most uses of search() would only care about > the first returned index. ... which is why there is a count argument, that I have recently suggested default to 1. - Josiah From jcarlson at uci.edu Thu Aug 24 04:21:22 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Wed, 23 Aug 2006 19:21:22 -0700 Subject: [Python-3000] [Python-Dev] What should the focus for 2.6 be? In-Reply-To: References: <20060823125951.1A60.JCARLSON@uci.edu> Message-ID: <20060823185143.1A73.JCARLSON@uci.edu> "Terry Reedy" wrote: > "Josiah Carlson" wrote in message > news:20060823125951.1A60.JCARLSON at uci.edu... > > The intent of my post was to say that all of us want Py3k to succeed, > > I should hope that we all do. > > > but I believe that in order for it to succeed that breakage from the 2.x > > series should be gradual, in a similar way to how 2.x -> 2.x+1 breakage > > has been gradual. > > Given that the rate of intentional breakage in the core language (including > builtins) has been very minimal, this would take a couple of decades, which > to my mind would be a failure. If we could stick with a 12-18 month release schedule, using deprecation and removal in subsequent releases, every removal could happen in 2-3 years. 2.6 could offer every feature of 3.0 (except for backwards-incompatible syntax), warning of removal or relocation (in the case of stdlib reorganization), 3.0 could handle all of the actual syntax changes. > > I believe we agree on this basic point > > To the contrary, you seem to have a basic disagreement with the plan to > make all the core language changes at once and to clear the decks of old > baggage so we can move forward with a learner language that is a bit easier > to learn and remember. I disagree with the "all the changes at once", but if Guido didn't agree with a gradual upgrade path, then the 2.6-2.9 series wouldn't even be considered as options, and we'd be looking at 3.0 coming out after 2.5, and there not being a 2.6 . Since 2.6 is planned, and other 2.x releases are at least possible (if not expected), then I must agree with someone, as my desires haven't previously been sufficient to change Python release expectations. > > according to your talk and your posts here, you want Py3k alpha > > in the next year or two, while I'm thinking that Py3k alpha should come > > somewhere after 2.6 and probably 2.7, maybe even after 2.8 or 2.9, > > Whereas I wish it were already out and would be delighted to see it early > next year. Some of the changes have already been put off for at least five > years and, to me, are overdue. As a daily abuser of Python, I've not found the language to be lacking in any area significant enough, or even having too many overlapping features suffient to warrant such widespread language breakage. We disagree on this point, and that's fine, as long as Guido agrees that 2.6+ make sense, which he does, and states as much in his talk and all relevant postings I've seen, then I don't need to drug him. He also agrees that 3.0 should come out sooner rather than later, but that's not going to stop me from attempting to make the case that 3.0 is going to be generally unused until later gradual 2.6+ releases close the gap and make the transition more natural. But hey, I'm just a guy who writes software who is going to have to transition and maintain it. Obviously there can't be too many of us, go ahead and break the language, I'm sure everyone will be happy to upgrade to 3.0, you won't even need to maintain the 2.x series, really. - Josiah From martin at v.loewis.de Thu Aug 24 04:24:17 2006 From: martin at v.loewis.de (martin at v.loewis.de) Date: Thu, 24 Aug 2006 04:24:17 +0200 Subject: [Python-3000] [Python-Dev] What should the focus for 2.6 be? In-Reply-To: <20060823185143.1A73.JCARLSON@uci.edu> References: <20060823125951.1A60.JCARLSON@uci.edu> <20060823185143.1A73.JCARLSON@uci.edu> Message-ID: <1156386257.44ed0dd1cf737@www.domainfactory-webmail.de> Zitat von Josiah Carlson : > > To the contrary, you seem to have a basic disagreement with the plan to > > make all the core language changes at once and to clear the decks of old > > baggage so we can move forward with a learner language that is a bit easier > > to learn and remember. > > I disagree with the "all the changes at once", but if Guido didn't agree > with a gradual upgrade path, then the 2.6-2.9 series wouldn't even be > considered as options, and we'd be looking at 3.0 coming out after 2.5, > and there not being a 2.6 . Since 2.6 is planned, and other 2.x > releases are at least possible (if not expected), then I must agree with > someone, as my desires haven't previously been sufficient to change > Python release expectations. That conclusion is invalid. 2.6, 2.7, ... are not made to gradually move towards 3.0, but because it is anticipated that 3.0 will not be adopted immediately, but, say, 3.2 might be. To provide new features for 2.x users, new 2.x releases need to be made (of course, the features added to, say, 2.7 will likely also be added to, say, 3.3). Regards, Martin From guido at python.org Thu Aug 24 04:39:29 2006 From: guido at python.org (Guido van Rossum) Date: Wed, 23 Aug 2006 19:39:29 -0700 Subject: [Python-3000] Droping find/rfind? In-Reply-To: <20060823191222.1A76.JCARLSON@uci.edu> References: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com> <20060823191222.1A76.JCARLSON@uci.edu> Message-ID: I don't find the current attempts to come up with a better substring search API useful. We did a lot of thinking about this not too long ago, and the result was the addition of [r]partition() to 2.5 and the intent to drop [r]find() from py3k as both redundant with [r]index() and error-prone (I think I just found another bug in logging.__init__.py: def _fixupChildren(self, ph, alogger): """ Ensure that children of the placeholder ph are connected to the specified logger. """ #for c in ph.loggers: for c in ph.loggerMap.keys(): if string.find(c.parent.name, alogger.name) <> 0: alogger.parent = c.parent c.parent = alogger This is either a really weird way of writing "if not c.parent.name.startswith(alogger.name):", or a bug which was intending to write "if alogger.name in c.parent.name:" . I appreciate the criticism on the patch -- clearly it's not ready to go in, and more work needs to be put in to actually *improve* the code, using [r]partition() where necessary, etc. But I'm strenghtened in the conclusion that find() is way overused and we don't need yet another search primitive. TOOWTDI. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From holmesbj.dev at gmail.com Thu Aug 24 05:38:08 2006 From: holmesbj.dev at gmail.com (Brian Holmes) Date: Wed, 23 Aug 2006 20:38:08 -0700 Subject: [Python-3000] Droping find/rfind? In-Reply-To: References: <20060823044148.GR5772@performancedrivers.com> Message-ID: On 8/23/06, Terry Reedy wrote: > > > "Brian Holmes" wrote in message > news:e3c648160608222346g4587d55eiff521787ca4d915f at mail.gmail.com... > > >Even after reading Terry Reedy's arguments, I don't see why we need to > > >remove this option. > > Since this is my first post in this current thread, you either meant > someone else or are remembering my posts about in- and out-of-band error > signaling from the last time we discussed this. > My reference was to this post: http://mail.python.org/pipermail/python-dev/2005-August/055717.html - Brian -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060823/7848eb65/attachment.html From talin at acm.org Thu Aug 24 05:38:06 2006 From: talin at acm.org (Talin) Date: Wed, 23 Aug 2006 20:38:06 -0700 Subject: [Python-3000] [Python-Dev] What should the focus for 2.6 be? In-Reply-To: <20060823185143.1A73.JCARLSON@uci.edu> References: <20060823125951.1A60.JCARLSON@uci.edu> <20060823185143.1A73.JCARLSON@uci.edu> Message-ID: <44ED1F1E.1080307@acm.org> Josiah Carlson wrote: > "Terry Reedy" wrote: >> "Josiah Carlson" wrote in message >> news:20060823125951.1A60.JCARLSON at uci.edu... >>> The intent of my post was to say that all of us want Py3k to succeed, >> I should hope that we all do. >> >>> but I believe that in order for it to succeed that breakage from the 2.x >>> series should be gradual, in a similar way to how 2.x -> 2.x+1 breakage >>> has been gradual. >> Given that the rate of intentional breakage in the core language (including >> builtins) has been very minimal, this would take a couple of decades, which >> to my mind would be a failure. > > If we could stick with a 12-18 month release schedule, using deprecation > and removal in subsequent releases, every removal could happen in 2-3 > years. 2.6 could offer every feature of 3.0 (except for > backwards-incompatible syntax), warning of removal or relocation (in the > case of stdlib reorganization), 3.0 could handle all of the actual > syntax changes. 2.6 should also include a powerful 'lint' option that detects use of features not compatible with 3.0. Something like "from __future__ import pedantic" or something along those lines. -- Talin From jcarlson at uci.edu Thu Aug 24 07:07:29 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Wed, 23 Aug 2006 22:07:29 -0700 Subject: [Python-3000] Droping find/rfind? In-Reply-To: <44ED0299.7040204@canterbury.ac.nz> References: <20060823123719.1A5D.JCARLSON@uci.edu> <44ED0299.7040204@canterbury.ac.nz> Message-ID: <20060823220213.1A7C.JCARLSON@uci.edu> Greg Ewing wrote: > Josiah Carlson wrote: > > Given search as a potential replacement, about the only question is > > whether count should default to sys.maxint or 1. > > Do you think that there will be many use cases for > count values *other* than 1 or sys.maxint? If not, > it might be more sensible to have two functions, > search() and searchall(). I have used str.split with counts != 1 or sys.maxint, and I would guess that there would be similar use-cases. > And while we're on this, what about list.index? > Should it also be replaced with list.search or > whatever as well? To be consistant from a sequence operation perspective, I would say yes, though I have so rarely used list.index(), I'm hard-pressed to have much of an opinion. - Josiah From jcarlson at uci.edu Thu Aug 24 07:20:43 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Wed, 23 Aug 2006 22:20:43 -0700 Subject: [Python-3000] [Python-Dev] What should the focus for 2.6 be? In-Reply-To: <1156386257.44ed0dd1cf737@www.domainfactory-webmail.de> References: <20060823185143.1A73.JCARLSON@uci.edu> <1156386257.44ed0dd1cf737@www.domainfactory-webmail.de> Message-ID: <20060823203502.1A79.JCARLSON@uci.edu> martin at v.loewis.de wrote: > > Zitat von Josiah Carlson : > > > > To the contrary, you seem to have a basic disagreement with the plan to > > > make all the core language changes at once and to clear the decks of old > > > baggage so we can move forward with a learner language that is a bit easier > > > to learn and remember. > > > > I disagree with the "all the changes at once", but if Guido didn't agree > > with a gradual upgrade path, then the 2.6-2.9 series wouldn't even be > > considered as options, and we'd be looking at 3.0 coming out after 2.5, > > and there not being a 2.6 . Since 2.6 is planned, and other 2.x > > releases are at least possible (if not expected), then I must agree with > > someone, as my desires haven't previously been sufficient to change > > Python release expectations. > > That conclusion is invalid. 2.6, 2.7, ... are not made to gradually > move towards 3.0, but because it is anticipated that 3.0 will not > be adopted immediately, but, say, 3.2 might be. To provide new > features for 2.x users, new 2.x releases need to be made > (of course, the features added to, say, 2.7 will likely also > be added to, say, 3.3). See Guido's reply here: http://mail.python.org/pipermail/python-3000/2006-August/003105.html Specifically his reponse to the "Here's my suggestion:" paragraph. Unless I completely misunderstood his response, and his later asking whether I want to help author the transition PEP (presumably for at least dict.keys(), bur more likely from 2.x to 3.x), I can't help but believe that he also wants at least an attempt at some gradual change for users with cold feet about breaking everything in one go. Also, in the talk he gave at Google on July 21, somewhere around the 7:45-11 minute mark, he talks about how 3.x features are to be backported to 2.7 or so, specifically so that there is a larger subset of Python that will run in both 2.x and 3.x . Smells like an attempt at gradual migration to me. - Josiah From steven.bethard at gmail.com Thu Aug 24 08:31:54 2006 From: steven.bethard at gmail.com (Steven Bethard) Date: Thu, 24 Aug 2006 00:31:54 -0600 Subject: [Python-3000] find -> index patch In-Reply-To: <20060824003948.GT5772@performancedrivers.com> References: <20060824003948.GT5772@performancedrivers.com> Message-ID: On 8/23/06, Jack Diederich wrote: > On Wed, Aug 23, 2006 at 02:18:59PM -0700, Guido van Rossum wrote: > > Here's the patch (by Hasan Diwan, BTW) for people's perusal. It just > > gets rid of all *uses* of find/rfind from Lib; it doesn't actually > > modify stringobject.c or unicodeobject.c. It doesn't use > > [r]partition()'; someone could look for opportunities to use that > > separately. > > I make a go at doing an idiomatic convertion of the first few modules > tagged by 'grep find( *.py' in Lib, patch attached. > > WOW, I love partition. After looking at your patch, I have to agree. The new code is *way* more readable. Nice work! STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From ncoghlan at gmail.com Thu Aug 24 11:38:44 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 24 Aug 2006 19:38:44 +1000 Subject: [Python-3000] find -> index patch In-Reply-To: <20060824003948.GT5772@performancedrivers.com> References: <20060824003948.GT5772@performancedrivers.com> Message-ID: <44ED73A4.40208@gmail.com> Jack Diederich wrote: > If folks like the way this partial set looks I'll convert the rest. +1 from here (beautifying the standard lib was one of the justifications for partition, after all). > ------------------------------------------------------------------------ > > Index: Lib/CGIHTTPServer.py > =================================================================== > --- Lib/CGIHTTPServer.py (revision 51530) > +++ Lib/CGIHTTPServer.py (working copy) > @@ -106,16 +106,9 @@ > def run_cgi(self): > """Execute a CGI script.""" > dir, rest = self.cgi_info > - i = rest.rfind('?') > - if i >= 0: > - rest, query = rest[:i], rest[i+1:] > - else: > - query = '' > - i = rest.find('/') > - if i >= 0: > - script, rest = rest[:i], rest[i:] > - else: > - script, rest = rest, '' > + (rest, sep, query) = rest.rpartition('?') > + (rest, sep, script) = rest.partition('/') > + rest = sep + rest # keep the slash rest & script are back to front on the second line of the new bit. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ncoghlan at gmail.com Thu Aug 24 11:45:58 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 24 Aug 2006 19:45:58 +1000 Subject: [Python-3000] [Python-Dev] What should the focus for 2.6 be? In-Reply-To: <20060823203502.1A79.JCARLSON@uci.edu> References: <20060823185143.1A73.JCARLSON@uci.edu> <1156386257.44ed0dd1cf737@www.domainfactory-webmail.de> <20060823203502.1A79.JCARLSON@uci.edu> Message-ID: <44ED7556.7010306@gmail.com> Josiah Carlson wrote: > Also, in the talk he gave at Google on July 21, somewhere around the > 7:45-11 minute mark, he talks about how 3.x features are to be > backported to 2.7 or so, specifically so that there is a larger subset > of Python that will run in both 2.x and 3.x . Smells like an attempt at > gradual migration to me. He also said that he doesn't expect Python 3.0 to see widespread usage, with a relatively rapid evolution to 3.1 (and possibly even 3.2). I don't think there's really that much disagreement here - the difference is that Guido wants to get 3.0 out early so that we *know* what the eventual target is for later 2.x releases. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From fredrik at pythonware.com Thu Aug 24 12:35:54 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Thu, 24 Aug 2006 12:35:54 +0200 Subject: [Python-3000] find -> index patch References: Message-ID: Guido van Rossum wrote: > Here's the patch (by Hasan Diwan, BTW) for people's perusal. It just > gets rid of all *uses* of find/rfind from Lib; it doesn't actually > modify stringobject.c or unicodeobject.c. It doesn't use > [r]partition()'; someone could look for opportunities to use that > separately. since most of the changes appear to be variations of the pattern - index = foo.find(bar) + try: + index = foo.index(bar) + except: + index = -1 it sure looks like the "get rid of find; it's the same thing as index" idea might be somewhat misguided. I think I'm "idea".find("good") on this one. better use this energy on partitionifying the 2.6 standard library instead. From fredrik at pythonware.com Thu Aug 24 12:51:20 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Thu, 24 Aug 2006 12:51:20 +0200 Subject: [Python-3000] Droping find/rfind? References: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com><20060823191222.1A76.JCARLSON@uci.edu> Message-ID: Guido van Rossum wrote: > for c in ph.loggerMap.keys(): > if string.find(c.parent.name, alogger.name) <> 0: > alogger.parent = c.parent > c.parent = alogger > > This is either a really weird way of writing "if not > c.parent.name.startswith(alogger.name):" weird, indeed, but it could be a premature attempt to optimize away the slicing for platforms that don't have "startswith" (it doesn't look like a bug, afaict). (on the other hand, "s[:len(t)] == t" is usually faster than "s.startswith(t)" for short prefixes, so maybe someone should have done a bit more benchmarking...) (which reminds me that speeding up handling of optional arguments to C functions would be an even better use of this energy) From walter at livinglogic.de Thu Aug 24 12:56:37 2006 From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=) Date: Thu, 24 Aug 2006 12:56:37 +0200 Subject: [Python-3000] Droping find/rfind? In-Reply-To: References: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com> <20060823191222.1A76.JCARLSON@uci.edu> Message-ID: <44ED85E5.1000005@livinglogic.de> Guido van Rossum wrote: > I don't find the current attempts to come up with a better substring > search API useful. > > [...] > > I appreciate the criticism on the patch -- clearly it's not ready to > go in, and more work needs to be put in to actually *improve* the > code, using [r]partition() where necessary, etc. But I'm strenghtened > in the conclusion that find() is way overused and we don't need yet > another search primitive. TOOWTDI. I don't see what's wrong with find() per se. IMHO in the following use case find() is the best option: Find the occurrences of "{foo bar}" patterns in the string and return both parts as a tuple. Return (None, "text") for the parts between the patterns, i.e. for 'foo{spam eggs}bar{foo bar}' return [(None, 'foo'), ('spam', 'eggs'), (None, 'bar'), ('foo', 'bar')] Using find(), the code looks like this: def splitfind(s): pos = 0 while True: posstart = s.find("{", pos) if posstart < 0: break posarg = s.find(" ", posstart) if posarg < 0: break posend = s.find("}", posarg) if posend < 0: break prefix = s[pos:posstart] if prefix: yield (None, prefix) yield (s[posstart+1:posarg], s[posarg+1:posend]) pos = posend+1 rest = s[pos:] if rest: yield (None, rest) Using index() looks worse to me. The code is buried under the exception handling: def splitindex(s): pos = 0 while True: try: posstart = s.index("{", pos) except ValueError: break try: posarg = s.index(" ", posstart) except ValueError: break try: posend = s.find("}", posarg) except ValueError: break prefix = s[pos:posstart] if prefix: yield (None, prefix) yield (s[posstart+1:posarg], s[posarg+1:posend]) pos = posend+1 rest = s[pos:] if rest: yield (None, rest) Using partition() might have a performance problem if the input string is long. Servus, Walter From ncoghlan at gmail.com Thu Aug 24 13:48:22 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 24 Aug 2006 21:48:22 +1000 Subject: [Python-3000] Droping find/rfind? In-Reply-To: <44ED85E5.1000005@livinglogic.de> References: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com> <20060823191222.1A76.JCARLSON@uci.edu> <44ED85E5.1000005@livinglogic.de> Message-ID: <44ED9206.1080306@gmail.com> Walter D?rwald wrote: > Guido van Rossum wrote: > >> I don't find the current attempts to come up with a better substring >> search API useful. >> >> [...] >> >> I appreciate the criticism on the patch -- clearly it's not ready to >> go in, and more work needs to be put in to actually *improve* the >> code, using [r]partition() where necessary, etc. But I'm strenghtened >> in the conclusion that find() is way overused and we don't need yet >> another search primitive. TOOWTDI. > > I don't see what's wrong with find() per se. IMHO in the following use > case find() is the best option: Find the occurrences of "{foo bar}" > patterns in the string and return both parts as a tuple. Return (None, > "text") for the parts between the patterns, i.e. for > 'foo{spam eggs}bar{foo bar}' > return > [(None, 'foo'), ('spam', 'eggs'), (None, 'bar'), ('foo', 'bar')] With a variety of "view types", that work like the corresponding builtin type, but reference the original data structure instead of creating copies, then you could use partition without having to worry about poor performance on large strings: def splitview(s): rest = strview(s) while 1: prefix, found, rest = rest.partition("{") if prefix: yield (None, str(prefix)) if not found: break first, found, rest = rest.partition("{") if not found: break second, found, rest = rest.partition("{") if not found: break yield (str(first), str(second)) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From fredrik at pythonware.com Thu Aug 24 14:33:02 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Thu, 24 Aug 2006 14:33:02 +0200 Subject: [Python-3000] Droping find/rfind? References: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com> <20060823191222.1A76.JCARLSON@uci.edu> <44ED85E5.1000005@livinglogic.de> <44ED9206.1080306@gmail.com> Message-ID: Nick Coghlan wrote: > With a variety of "view types", that work like the corresponding builtin type, > but reference the original data structure instead of creating copies support for string views would require some serious interpreter surgery, though, and probably break quite a few extensions... From mcherm at mcherm.com Thu Aug 24 14:44:50 2006 From: mcherm at mcherm.com (Michael Chermside) Date: Thu, 24 Aug 2006 05:44:50 -0700 Subject: [Python-3000] find -> index patch Message-ID: <20060824054450.x8w46l05kz488004@login.werra.lunarpages.com> Jack Diederich writes: > I make a go at doing an idiomatic convertion [...] patch attached. > > WOW, I love partition. In all the instances that weren't a simple "in" > test I ended up using [r]partition. In some cases one of the returned > strings gets thrown away but in those cases it is guaranteed to be small. > The new code is usually smaller than the old and generally clearer. Wow. That's just beautiful. This has now convinced me that dumping [r]find() (at least!) and pushing people toward using partition will result in pain in the short term (of course), and beautiful, readable code in the long term. > If folks like the way this partial set looks I'll convert the rest. Please do! Even if we *retain* [r]find(), this is still better code. And I'm personally going to stop using [r]find() in my own code starting today. -- Michael Chermside From fredrik at pythonware.com Thu Aug 24 15:48:57 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Thu, 24 Aug 2006 15:48:57 +0200 Subject: [Python-3000] find -> index patch References: <20060824054450.x8w46l05kz488004@login.werra.lunarpages.com> Message-ID: Michael Chermside wrote: >> WOW, I love partition. In all the instances that weren't a simple "in" >> test I ended up using [r]partition. In some cases one of the returned >> strings gets thrown away but in those cases it is guaranteed to be small. >> The new code is usually smaller than the old and generally clearer. > > Wow. That's just beautiful. This has now convinced me that dumping > [r]find() (at least!) and pushing people toward using partition will > result in pain in the short term (of course), and beautiful, readable > code in the long term. note that partition provides an elegant solution to an important *subset* of all problems addressed by find/index. just like lexical scoping vs. default arguments and map vs. list comprehensions, it doesn't address all problems right out of the box, and shouldn't be advertised as doing that. From gmccaughan at synaptics-uk.com Thu Aug 24 16:21:11 2006 From: gmccaughan at synaptics-uk.com (Gareth McCaughan) Date: Thu, 24 Aug 2006 15:21:11 +0100 Subject: [Python-3000] find -> index patch In-Reply-To: References: <20060824054450.x8w46l05kz488004@login.werra.lunarpages.com> Message-ID: <200608241521.13007.gmccaughan@synaptics-uk.com> Fredrik Lundh wrote: > note that partition provides an elegant solution to an important *subset* of all > problems addressed by find/index. > > just like lexical scoping vs. default arguments and map vs. list comprehensions, > it doesn't address all problems right out of the box, and shouldn't be advertised > as doing that. Sure, but partition + "in" (now that it works as an arbitrary substring test) seem to cover a very large subset of the things you'd want to do with find: enough that having only index available for the remaining cases is unlikely to hurt much (apart from the important issue of backward compatibility, but this *is* py3k). I'm having trouble thinking of any plausible counterexamples, though I'm sure there must be some. -- g From nnorwitz at gmail.com Thu Aug 24 16:25:04 2006 From: nnorwitz at gmail.com (Neal Norwitz) Date: Thu, 24 Aug 2006 10:25:04 -0400 Subject: [Python-3000] Droping find/rfind? In-Reply-To: References: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com> <20060823191222.1A76.JCARLSON@uci.edu> Message-ID: On 8/24/06, Fredrik Lundh wrote: > > (which reminds me that speeding up handling of optional arguments to C functions > would be an even better use of this energy) If this patch: http://python.org/sf/1107887 is integrated with some of my current work, it should do the job nicely. IIRC the patch uses a big switch which sped things up, but Raymond didn't like it (I think more on a conceptual basis). I don't think it slowed things down measurably. My new approach has been to add a C function pointer to PyCFunction and some other 'function' objects that can dispatch to an appropriate function in ceval.c that does the right thing. I define a bunch of little methods that are determined when the function is created and only does what's necessary depending on the ml_flags. It could be expanded to look at other things. The current work hasn't produced any measurable changes in perf, but I've only gotten rid of a few comparisons and/or a possible function call (if it isn't inlined). If I merge these two approaches, I should be able to be able to speed up cases like you describe. n From guido at python.org Thu Aug 24 16:27:11 2006 From: guido at python.org (Guido van Rossum) Date: Thu, 24 Aug 2006 07:27:11 -0700 Subject: [Python-3000] [Python-Dev] What should the focus for 2.6 be? In-Reply-To: <20060823203502.1A79.JCARLSON@uci.edu> References: <20060823185143.1A73.JCARLSON@uci.edu> <1156386257.44ed0dd1cf737@www.domainfactory-webmail.de> <20060823203502.1A79.JCARLSON@uci.edu> Message-ID: On 8/23/06, Josiah Carlson wrote: > Specifically his reponse to the "Here's my suggestion:" paragraph. > Unless I completely misunderstood his response, and his later asking > whether I want to help author the transition PEP (presumably for at > least dict.keys(), bur more likely from 2.x to 3.x), I can't help but > believe that he also wants at least an attempt at some gradual change > for users with cold feet about breaking everything in one go. > > Also, in the talk he gave at Google on July 21, somewhere around the > 7:45-11 minute mark, he talks about how 3.x features are to be > backported to 2.7 or so, specifically so that there is a larger subset > of Python that will run in both 2.x and 3.x . Smells like an attempt at > gradual migration to me. Since you're trying to channel me, and I'm right here listening to you (and annoyed that you are wasting my time), I need to clarify. What I *don't* want to happen is that Python 2.6 2.7, and so on keep changing the language from under users' feet, requiring constant code changes to keep up, so that by the time the 2.9 -> 3.0 transition comes it will feel pretty much the same as 2.4 -> 2.5. That would be bad because it would mean that for every transition users would have to make a lot of changes. (Pretty much the only changes like that planned are increasing deprecation warnings for string exceptions, and making 'with' and 'as' unconditional keywords in 2.6.) 3.0 (or 3.2) will feel like a big change and will require a combination of automatic and manual explicit conversion, sometimes guided by warnings produced by Python 2.x in "future-proof-lint" mode (see (a) below). What I *do* want to do is: (a) Add an option to Python 2.6 or 2.7 that starts spewing out warnings about certain things that will change semantics in 3.0 and are hard to detect by source code inspection alone, just like the current -Q option. This could detect uses of range(), zip() or dict.keys() result values incompatible with the iterators or views that these will return in 3.0. But there will be no pressure to change such code before the 3.0 transition, and those warnings will be off by default. (b) Provide access to the new syntax, without dropping the old syntax, whenever it can be done without introducing new keywords, or through __future__ syntax. But these approaches alone cannot cover all cases. While we can probably backport the new I/O library, there won't be a way to test it in a world where str and unicode are the same (unless your app runs on Jython or IronPython). The str/unicode unification and the int/long unification, taking just two examples, just can't be backported to Python 2.x, since they require pervasive and deep changes to the implementation everywhere. Another change that is unlikely to be available in 2.x is the rationalization of comparisons. In 3.0, "1 < 'abc'" will raise a TypeError; there's just no way to backport this behavior, since again it requires pervasive changes to the implementation. I know that you are dreaming of a world where all transitions are easy. But it's just a dream. 3.0 will require hard work and for many large apps it will take years to migrate -- the best approach is probably to make it coincide with a planned major rewrite of the app. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Thu Aug 24 16:33:12 2006 From: martin at v.loewis.de (martin at v.loewis.de) Date: Thu, 24 Aug 2006 16:33:12 +0200 Subject: [Python-3000] find -> index patch In-Reply-To: <20060824003948.GT5772@performancedrivers.com> References: <20060824003948.GT5772@performancedrivers.com> Message-ID: <1156429992.44edb8a8e5794@www.domainfactory-webmail.de> Zitat von Jack Diederich : > + if (sep_found): This should be if sep_found: > If folks like the way this partial set looks I'll convert the rest. Otherwise, it looks fine. Martin From thomas at python.org Thu Aug 24 16:55:51 2006 From: thomas at python.org (Thomas Wouters) Date: Thu, 24 Aug 2006 16:55:51 +0200 Subject: [Python-3000] [Python-Dev] What should the focus for 2.6 be? In-Reply-To: References: <20060823185143.1A73.JCARLSON@uci.edu> <1156386257.44ed0dd1cf737@www.domainfactory-webmail.de> <20060823203502.1A79.JCARLSON@uci.edu> Message-ID: <9e804ac0608240755x1d5b4406r902d3154157f9fd9@mail.gmail.com> On 8/24/06, Guido van Rossum wrote: > I know that you are dreaming of a world where all transitions are > easy. But it's just a dream. 3.0 will require hard work and for many > large apps it will take years to migrate -- the best approach is > probably to make it coincide with a planned major rewrite of the app. I agree with everything you said, except this. Yes, Python 2.x -> 3.x will always be a large step, no matter which 'x' you take. That shouldn't (and doesn't, so far) mean you can't write code that works fine in both 2.x and 3.x, and transitioning applications from 2.x-only code to 2.x-and-3.x-codecould then be done incrementally. It would probably need support from future 2.x releases in order to make that possible, but it shouldn't affect 3.x. It will still be a rather big effort from applications, but not any bigger than porting to 3.x in the first place. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060824/00c0f085/attachment.htm From jcarlson at uci.edu Thu Aug 24 18:01:27 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Thu, 24 Aug 2006 09:01:27 -0700 Subject: [Python-3000] [Python-Dev] What should the focus for 2.6 be? In-Reply-To: References: <20060823203502.1A79.JCARLSON@uci.edu> Message-ID: <20060824084759.1A82.JCARLSON@uci.edu> "Guido van Rossum" wrote: > What I *do* want to do is: > > (a) Add an option to Python 2.6 or 2.7 that starts spewing out > warnings about certain things that will change semantics in 3.0 and > are hard to detect by source code inspection alone, just like the > current -Q option. This could detect uses of range(), zip() or > dict.keys() result values incompatible with the iterators or views > that these will return in 3.0. But there will be no pressure to change > such code before the 3.0 transition, and those warnings will be off by > default. > > (b) Provide access to the new syntax, without dropping the old syntax, > whenever it can be done without introducing new keywords, or through > __future__ syntax. Both of these things are also what I want. > But these approaches alone cannot cover all cases. While we can > probably backport the new I/O library, there won't be a way to test it > in a world where str and unicode are the same (unless your app runs on > Jython or IronPython). The str/unicode unification and the int/long > unification, taking just two examples, just can't be backported to > Python 2.x, since they require pervasive and deep changes to the > implementation everywhere. > > Another change that is unlikely to be available in 2.x is the > rationalization of comparisons. In 3.0, "1 < 'abc'" will raise a > TypeError; there's just no way to backport this behavior, since again > it requires pervasive changes to the implementation. > > I know that you are dreaming of a world where all transitions are > easy. But it's just a dream. 3.0 will require hard work and for many > large apps it will take years to migrate -- the best approach is > probably to make it coincide with a planned major rewrite of the app. > Easy change would be nice, but working towards everyone having an easy transition would take quite a bit of time and effort, more time and effort than I think *anyone* is really willing to put forward. What I want is for the transition not to be hard. Backporting new modules is one way of doing this, offering an import hook to gain access to a new standard library organization (wxPython uses a method of renaming objects that has worked quite well in their wx namespace transition, which might be usable here), deprecation warnings, __future__, etc., all of these are mechanisms, I see, as steps towards making the 2.x -> 3.x transition not quite so hard. Ultimately the features/syntax/semantics that cannot be backported will make the last transition hill a bit tougher to climb than the previous 2.x->2.x+1 ones, but people should have had ample warning for the most part, and I hope won't have terrible difficulties for the final set of changes necessary to go from 2.x to 3.x . - Josiah From jimjjewett at gmail.com Thu Aug 24 18:37:35 2006 From: jimjjewett at gmail.com (Jim Jewett) Date: Thu, 24 Aug 2006 12:37:35 -0400 Subject: [Python-3000] sort vs order (was: What should the focus for 2.6 be?) Message-ID: On 8/24/06, Guido van Rossum wrote: > Another change that is unlikely to be available in 2.x is the > rationalization of comparisons. In 3.0, "1 < 'abc'" will raise a > TypeError; there's just no way to backport this behavior, since again > it requires pervasive changes to the implementation. I still believe that this breaks an important current use case for sorting, but maybe the right answer is a different (but similar) API. Given an arbitrary collection of objects, I want to be able to order them in a consistent manner, at least within a single interpreter session. (Consistency across sessions/machines/persistence/etc would be even better, but isn't essential.) The current sort method works pretty well; the new one wouldn't. It would be enough (and arguably an improvement, because of broken objects) if there were a consistent_order equivalent that just caught the TypeError and then tried a fallback for you until it found an answer. -jJ From guido at python.org Thu Aug 24 18:44:48 2006 From: guido at python.org (Guido van Rossum) Date: Thu, 24 Aug 2006 09:44:48 -0700 Subject: [Python-3000] sort vs order (was: What should the focus for 2.6 be?) In-Reply-To: References: Message-ID: For doctestst etc., it's easy to create a consistent order: sorted(X, key=lambda x: (str(type(x)), x)) This sorts by the name of the type first, then by value within each type. This is assuming the type itself is sortable -- in 3.0, many types won't be sortable, e.g. dicts. (Even in 2.x, sets implement < so differently that a list of sets is likely to cause problems when sorting.) --Guido On 8/24/06, Jim Jewett wrote: > On 8/24/06, Guido van Rossum wrote: > > Another change that is unlikely to be available in 2.x is the > > rationalization of comparisons. In 3.0, "1 < 'abc'" will raise a > > TypeError; there's just no way to backport this behavior, since again > > it requires pervasive changes to the implementation. > > I still believe that this breaks an important current use case for > sorting, but maybe the right answer is a different (but similar) API. > > Given an arbitrary collection of objects, I want to be able to order > them in a consistent manner, at least within a single interpreter > session. (Consistency across sessions/machines/persistence/etc would > be even better, but isn't essential.) > > The current sort method works pretty well; the new one wouldn't. It > would be enough (and arguably an improvement, because of broken > objects) if there were a consistent_order equivalent that just caught > the TypeError and then tried a fallback for you until it found an > answer. > > -jJ > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From david.nospam.hopwood at blueyonder.co.uk Thu Aug 24 22:41:43 2006 From: david.nospam.hopwood at blueyonder.co.uk (David Hopwood) Date: Thu, 24 Aug 2006 21:41:43 +0100 Subject: [Python-3000] Droping find/rfind? In-Reply-To: <44ED85E5.1000005@livinglogic.de> References: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com> <20060823191222.1A76.JCARLSON@uci.edu> <44ED85E5.1000005@livinglogic.de> Message-ID: <44EE0F07.5030005@blueyonder.co.uk> Walter D?rwald wrote: [...] > Using find(), the code looks like this: > > def splitfind(s): > pos = 0 > while True: > posstart = s.find("{", pos) > if posstart < 0: > break > posarg = s.find(" ", posstart) > if posarg < 0: > break > posend = s.find("}", posarg) > if posend < 0: > break > prefix = s[pos:posstart] > if prefix: > yield (None, prefix) > yield (s[posstart+1:posarg], s[posarg+1:posend]) > pos = posend+1 > rest = s[pos:] > if rest: > yield (None, rest) > > Using index() looks worse to me. The code is buried under the exception > handling: > > def splitindex(s): > pos = 0 > while True: > try: > posstart = s.index("{", pos) > except ValueError: > break > try: > posarg = s.index(" ", posstart) > except ValueError: > break > try: > posend = s.find("}", posarg) > except ValueError: > break try: posstart = s.index("{", pos) posarg = s.index(" ", posstart) posend = s.find("}", posarg) except ValueError: break is shorter and clearer than the version using 'find'. -- David Hopwood From mcherm at mcherm.com Thu Aug 24 23:45:24 2006 From: mcherm at mcherm.com (Michael Chermside) Date: Thu, 24 Aug 2006 14:45:24 -0700 Subject: [Python-3000] sort vs order (was: What should the focus for 2.6 be?) Message-ID: <20060824144524.cz3o2mv4iv40w40k@login.werra.lunarpages.com> Jim Jewett writes: > Given an arbitrary collection of objects, I want to be able to order > them in a consistent manner, at least within a single interpreter > session. I think this meets your specifications: >>> myList = [2.5, 17, object(), 3+4j, 'abc'] >>> myList.sort(key=id) I prefer Guido's suggestion (id=lambda x: (type(x), x)), but it doesn't handle types that are not comparable (like the complex number I included to be perverse). Frankly, I don't know why you have an "arbitrary collection of objects" -- the only things I have ever dealt with that handled truly _arbitrary_ collections of objects were garbage collectors and generic caching mechanisms. In either case you really *wouldn't* care how things sorted so long as it was consistant, and then sorting by id works nicely. Of course, I doubt this is what you're doing because if you REALLY had arbitrary objects (including uncomparable things like complex numbers) then you would already need to be doing this today and your code wouldn't even need to be modified when you upgraded to 3.0. -- Michael Chermside From greg.ewing at canterbury.ac.nz Fri Aug 25 02:24:06 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 25 Aug 2006 12:24:06 +1200 Subject: [Python-3000] Droping find/rfind? In-Reply-To: References: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com> <20060823191222.1A76.JCARLSON@uci.edu> Message-ID: <44EE4326.1070604@canterbury.ac.nz> Fredrik Lundh wrote: > (on the other hand, "s[:len(t)] == t" is usually faster than "s.startswith(t)" for short > prefixes, That's surprising. Any idea why this might be? -- Greg From thomas at python.org Fri Aug 25 02:46:08 2006 From: thomas at python.org (Thomas Wouters) Date: Thu, 24 Aug 2006 20:46:08 -0400 Subject: [Python-3000] Removing 'old-style' ('simple') slices from Py3K. Message-ID: <9e804ac0608241746n7de7c161yd40f6bb4c3061ab6@mail.gmail.com> I spent my time at the Google sprint working on removing simple slices from Py3k, in the p3yk-noslice branch. The work is pretty much done, except for some minor details and finishing touches. There are a few items that should probably be discussed, though. The state of the tree: - The SLICE, STORE_SLICE and DELETE_SLICE opcodes (all 4 versions of each) are eradicated. This even freed up a local (register) variable in PyEval_EvalFrameex(), and probably resulted in a speedup of the bytecode loop. I didn't measure it, though. - Various types that didn't support extended slicing had such support added: - UserList, UserString, MutableUserString - structseq (what os.stat and time.localtime and such return) - sre_parse.SubPattern (well, more or less) - buffer - bytes - mmap.mmap - Various types that supported extended slicing now specialcase simple slicing, for extra speed (list, string, unicode, array, tuple) - the ctypes 'Array' and 'Pointer' types support slicing with slice-objects, but only with step = 1 - The __getslice__, __setslice__ and __delslice__ slots aren't created anymore, for C types. - The PySequence_GetSlice, PySequence_SetSlice and PySequence_DelSlice no longer try to access the sq_slice and sq_ass_slice PySequenceMethods members. They did already fall back to the mp_subscript and mp_ass_subscript PyMappingMethods members. - All tests pass, with only the expected changes to any tests. - The PySequenceMethods struct's 'sq_slice' and 'sq_ass_slice' members are unused and have been renamed - PyMapping_Check() now returns true for any type with a PyMappingMethods.mp_subscript filled, not just those without a PySequence.sq_slice. One test had to be adjusted for that -- execfile("", {}, ()) now raises a different error, so it now tests execfile("", {}, 42) - There's no way to figure out the size of a Py_ssize_t from Python code, now. test_support was using a simple-slice to figure it out. I'm not sure if there's really a reason to do it -- I don't quite understand the use of it. - It's still lacking tests for the extended-slicing abilities of buffer, mmap.mmap, structseq, UserList and UserString. I think the extended-slicing support as well as the simpleslice specialcasing should be ported to 2.6. Are there any objections to that? It means, in some cases, a bit of code duplication, but it would make 's[::]' almost as fast as 's[:]' for those types. I also think it may be worthwhile to switch to always using slice objects in Python 2.6 or 2.7. It would mean we can remove the 12 bytecodes for slicing, plus the associated code in the main bytecode loop. We can still call sq_slice/sq_ass_slice if step is None. The main issue is that it might be a net slowdown for slicing (but speedup for all other operations), and that it is no longer possible to see the difference between obj[:] and obj[::]. I personally think code that treats those two (significantly) differently is insane. Now that all those types have mp_subscript defined, we could remove sq_item and sq_ass_item as well. I'm not entirely sure I see all the implications of that, though. The C code does quite a lot of indexing of tuples and lists, and those are indexed using Py_ssize_t's directly. Going through a PyObject for that may be too cumbersome. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060824/c60c127a/attachment.htm From tim.peters at gmail.com Fri Aug 25 03:01:20 2006 From: tim.peters at gmail.com (Tim Peters) Date: Thu, 24 Aug 2006 21:01:20 -0400 Subject: [Python-3000] Droping find/rfind? In-Reply-To: <44EE4326.1070604@canterbury.ac.nz> References: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com> <20060823191222.1A76.JCARLSON@uci.edu> <44EE4326.1070604@canterbury.ac.nz> Message-ID: <1f7befae0608241801y3b285a12wc27cda5d25949fe0@mail.gmail.com> [Fredrik Lundh] >> (on the other hand, "s[:len(t)] == t" is usually faster than "s.startswith(t)" for short >> prefixes, [Greg Ewing] > That's surprising. Any idea why this might be? Perhaps it has to do with the rest of his message ;-): >> (which reminds me that speeding up handling of optional arguments >> to C functions would be an even better use of this energy) From greg.ewing at canterbury.ac.nz Fri Aug 25 03:15:01 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 25 Aug 2006 13:15:01 +1200 Subject: [Python-3000] Droping find/rfind? In-Reply-To: <1f7befae0608241801y3b285a12wc27cda5d25949fe0@mail.gmail.com> References: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com> <20060823191222.1A76.JCARLSON@uci.edu> <44EE4326.1070604@canterbury.ac.nz> <1f7befae0608241801y3b285a12wc27cda5d25949fe0@mail.gmail.com> Message-ID: <44EE4F15.2070301@canterbury.ac.nz> Tim Peters wrote: > Perhaps it has to do with the rest of his message ;-): > >>>(which reminds me that speeding up handling of optional arguments >>>to C functions would be an even better use of this energy) Until a few moments ago, I didn't know that str.startswith() had any optional arguments, so I missed the significance of that. In any case, I still find it surprising that this would make enough difference to outweigh a Python-level indexing and comparison... -- Greg From martin at v.loewis.de Fri Aug 25 03:49:55 2006 From: martin at v.loewis.de (martin at v.loewis.de) Date: Fri, 25 Aug 2006 03:49:55 +0200 Subject: [Python-3000] long/int unification Message-ID: <1156470595.44ee57436b03d@www.domainfactory-webmail.de> Here is a quick status of the int_unification branch, summarizing what I did at the Google sprint in NYC. - the int type has been dropped; the builtins int and long now both refer to long type - all PyInt_* API is forwarded to the PyLong_* API. Little changes to the C code are necessary; the most common offender is PyInt_AS_LONG((PyIntObject*)v) since I completely removed PyIntObject. - Much of the test suite passes, although it still has a number of bugs. - There are timing tests for allocation and for addition. On allocation, the current implementation is about a factor of 2 slower; the integer addition is about 1.5 times slower; the initial slowdowns was by a factor of 3. The pystones dropped about 10% (pybench fails to run on p3yk). A couple of interesting observations: - bool was a subtype of int, and is now a subtype of long. In order to avoid knowing the internal representation of long, the bool type compares addresses against Py_True and Py_False, instead of looking at ob_ival. - to add the small ints cache, an array of statically allocated longs is used, rather than heap-allocating them. - after adding the small ints cache, lot of things broke, e.g. for code like py> x = 4 py> x = -4 py> x -4 py> 4 -4 This happened because long methods just toggle the sign of the object they got, messing up the small ints cache. - to further speedup the implementation, I added special casing for one-digit numbers. As they are always in range(-32767,32768), the arithmethic operations don't need overflow checking anymore (even multiplication won't overflow 32-bit int). - I found that in 2.x, long objects overallocate 2 byte on a 32-bit machine, and 6 bytes on a 64-bit machine, because sizeof(PyLongObject) rounds up. - pickle and marshal have been changed to deal with the loss of int; pickle generates INT codes even for longs now provided the value is in the range for the code. I'm not sure whether this performance change is acceptable; at this point, I'm running out of ideas how to further improve the performance. Using a plain 32-bit int as the representation could be another try, but I somewhat doubt it helps given that the the supposedly-simpler single-digit case is so slow. Regards, Martin From fredrik at pythonware.com Fri Aug 25 07:50:58 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri, 25 Aug 2006 07:50:58 +0200 Subject: [Python-3000] long/int unification In-Reply-To: <1156470595.44ee57436b03d@www.domainfactory-webmail.de> References: <1156470595.44ee57436b03d@www.domainfactory-webmail.de> Message-ID: martin at v.loewis.de wrote: > I'm not sure whether this performance change is > acceptable; at this point, I'm running out of ideas > how to further improve the performance. without really digging into the patch, is it perhaps time to switch to unboxed integers for the CPython interpreter ? (support for implementation subtypes could also be nice; I agree that it would be nice if we had only one visible integer type, but I don't really see why the implementation has to restricted to one type only. this applies to strings too, of course). From fredrik at pythonware.com Fri Aug 25 07:54:23 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri, 25 Aug 2006 07:54:23 +0200 Subject: [Python-3000] Droping find/rfind? In-Reply-To: <1f7befae0608241801y3b285a12wc27cda5d25949fe0@mail.gmail.com> References: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com> <20060823191222.1A76.JCARLSON@uci.edu> <44EE4326.1070604@canterbury.ac.nz> <1f7befae0608241801y3b285a12wc27cda5d25949fe0@mail.gmail.com> Message-ID: Tim Peters wrote: > [Greg Ewing] >> That's surprising. Any idea why this might be? > > Perhaps it has to do with the rest of his message ;-): > >>> (which reminds me that speeding up handling of optional arguments >>> to C functions would be an even better use of this energy) in my experience, the object allocator tends to be surprisingly fast, and the calling mechanism tends to be surprisingly slow. and this is true even if you take this into account. From jcarlson at uci.edu Fri Aug 25 08:39:22 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Thu, 24 Aug 2006 23:39:22 -0700 Subject: [Python-3000] long/int unification In-Reply-To: References: <1156470595.44ee57436b03d@www.domainfactory-webmail.de> Message-ID: <20060824232848.1A9F.JCARLSON@uci.edu> Fredrik Lundh wrote: > > martin at v.loewis.de wrote: > > > I'm not sure whether this performance change is > > acceptable; at this point, I'm running out of ideas > > how to further improve the performance. > > without really digging into the patch, is it perhaps time to switch to > unboxed integers for the CPython interpreter ? > > (support for implementation subtypes could also be nice; I agree that > it would be nice if we had only one visible integer type, but I don't > really see why the implementation has to restricted to one type only. > this applies to strings too, of course). In the integer case, it reminds me of James Knight's tagged integer patch to 2.3 [1]. If using long exclusively is 50% slower, why not try the improved speed approach? Also, depending on the objects, one may consider a few other tagged objects, like perhaps None, True, and False (they could all be special values with a single tag), or even just use 31/63 bits for the tagged integer value, with a 1 in the lowest bit signifying it as a tagged integer. - Josiah [1] http://mail.python.org/pipermail/python-dev/2004-July/046139.html From fredrik at pythonware.com Fri Aug 25 11:15:37 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri, 25 Aug 2006 11:15:37 +0200 Subject: [Python-3000] long/int unification References: <1156470595.44ee57436b03d@www.domainfactory-webmail.de> <20060824232848.1A9F.JCARLSON@uci.edu> Message-ID: Josiah Carlson wrote: > In the integer case, it reminds me of James Knight's tagged integer > patch to 2.3 [1]. If using long exclusively is 50% slower, why not try > the improved speed approach? looks like GvR was -1000 on this idea at the time, though... > Also, depending on the objects, one may consider a few other tagged > objects, like perhaps None, True, and False (they could all be special > values with a single tag), or even just use 31/63 bits for the tagged > integer value, with a 1 in the lowest bit signifying it as a tagged integer. iirc, my pytte1 experiment used tagged objects for integers and single- character strings, which resulting in considerable speedups for the (small set of) benchmarks I used. (on the other hand, the dominating speedups in pytte1 were "true" GC, and call-site caching combined with streamlined method lookup. if we really want to speed things up, we should probably start with call-site caching and (explicit?) method inlining). From ncoghlan at gmail.com Fri Aug 25 11:50:03 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 25 Aug 2006 19:50:03 +1000 Subject: [Python-3000] Droping find/rfind? In-Reply-To: References: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com> <20060823191222.1A76.JCARLSON@uci.edu> <44ED85E5.1000005@livinglogic.de> <44ED9206.1080306@gmail.com> Message-ID: <44EEC7CB.2090908@gmail.com> Fredrik Lundh wrote: > Nick Coghlan wrote: > >> With a variety of "view types", that work like the corresponding builtin type, >> but reference the original data structure instead of creating copies > > support for string views would require some serious interpreter surgery, though, > and probably break quite a few extensions... Why do you say that? I'm thinking about a type written in Python, intended to be used exactly the way I did in my strawman example - you accept a normal string, make a view of it, do your manipulations, then make sure that anything you return or yield is a normal string so other code doesn't get any nasty surprises. It would be strictly an optimisation technique to allow the normal string operations to be used without the performance penalties associating with slicing large strings. Otherwise you have to choose between "readable" and "scalable" which is an annoying choice to be forced to make. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From fredrik at pythonware.com Fri Aug 25 12:06:43 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri, 25 Aug 2006 12:06:43 +0200 Subject: [Python-3000] Droping find/rfind? References: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com> <20060823191222.1A76.JCARLSON@uci.edu> <44ED85E5.1000005@livinglogic.de> <44ED9206.1080306@gmail.com> <44EEC7CB.2090908@gmail.com> Message-ID: Nick Coghlan wrote: >> Nick Coghlan wrote: >> >>> With a variety of "view types", that work like the corresponding builtin type, >>> but reference the original data structure instead of creating copies >> >> support for string views would require some serious interpreter surgery, though, >> and probably break quite a few extensions... > > Why do you say that? because I happen to know a lot about how Python's string types are implemented ? > make a view of it so to make a view of a string, you make a view of it ? From ncoghlan at gmail.com Fri Aug 25 12:20:02 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 25 Aug 2006 20:20:02 +1000 Subject: [Python-3000] Removing 'old-style' ('simple') slices from Py3K. In-Reply-To: <9e804ac0608241746n7de7c161yd40f6bb4c3061ab6@mail.gmail.com> References: <9e804ac0608241746n7de7c161yd40f6bb4c3061ab6@mail.gmail.com> Message-ID: <44EECED2.2020206@gmail.com> Thomas Wouters wrote: > - There's no way to figure out the size of a Py_ssize_t from Python > code, now. test_support was using a simple-slice to figure it out. I'm > not sure if there's really a reason to do it -- I don't quite understand > the use of it. This isn't quite true, but I will admit that the only way I know how to do it is somewhat on the arcane side ;) try: double_width = 2*(sys.maxint+1)**2-1 slice(None).indices(double_width) pyssize_t_max = double_width # ssize_t twice as wide as long except OverflowError: pyssize_t_max = sys.maxint # ssize_t same width as long It might make more sense to just include a "sys.maxindex" to parallel sys.maxint (even though both are technically misnomers, leaving out the 'native' bit). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ncoghlan at gmail.com Fri Aug 25 14:33:46 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 25 Aug 2006 22:33:46 +1000 Subject: [Python-3000] Droping find/rfind? In-Reply-To: References: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com> <20060823191222.1A76.JCARLSON@uci.edu> <44ED85E5.1000005@livinglogic.de> <44ED9206.1080306@gmail.com> <44EEC7CB.2090908@gmail.com> Message-ID: <44EEEE2A.9080509@gmail.com> Fredrik Lundh wrote: > Nick Coghlan wrote: > >>> Nick Coghlan wrote: >>> >>>> With a variety of "view types", that work like the corresponding builtin type, >>>> but reference the original data structure instead of creating copies >>> support for string views would require some serious interpreter surgery, though, >>> and probably break quite a few extensions... >> Why do you say that? > > because I happen to know a lot about how Python's string types are > implemented ? I believe you're thinking about something far more sophisticated than what I'm suggesting. I'm just talking about a Python data type in a standard library module that trades off slower performance with smaller strings (due to extra method call overhead) against improved scalability (due to avoidance of copying strings around). >> make a view of it > > so to make a view of a string, you make a view of it ? Yep - by using all those "start" and "stop" optional arguments to builtin string methods to implement the methods of a string view in pure Python. By creating the string view all you would really be doing is a partial application of start and stop arguments on all of the relevant string methods. I've included an example below that just supports __len__, __str__ and partition(). The source object survives for as long as the view does - the idea is that the view should only last while you manipulate the string, with only real strings released outside the function via return statements or yield expressions. All that said, I think David Hopwood nailed the simplest answer to Walter's particular use case with: def splitindex(s): pos = 0 while True: try: posstart = s.index("{", pos) posarg = s.index(" ", posstart) posend = s.index("}", posarg) except ValueError: break prefix = s[pos:posstart] if prefix: yield (None, prefix) yield (s[posstart+1:posarg], s[posarg+1:posend]) pos = posend+1 rest = s[pos:] if rest: yield (None, rest) >>> list(splitindex('foo{spam eggs}bar{foo bar}')) [(None, 'foo'), ('spam', 'eggs'), (None, 'bar'), ('foo', 'bar')] Cheers, Nick. # Simple string view example class strview(object): def __new__(cls, source, start=None, stop=None): self = object.__new__(cls) self.source = "%s" % source self.start = start if start is not None else 0 self.stop = stop if stop is not None else len(source) return self def __str__(self): return self.source[self.start:self.stop] def __len__(self): return self.stop - self.start def partition(self, sep): _src = self.source try: startsep = _src.index(sep, self.start, self.stop) except ValueError: # Separator wasn't found! return self, _NULL_STR, _NULL_STR # Return new views of the three string parts endsep = startsep + len(sep) return (strview(_src, self.start, startsep), strview(_src, startsep, endsep), strview(_src, endsep, self.stop)) _NULL_STR = strview('') def splitview(s): rest = strview(s) while 1: prefix, found, rest = rest.partition("{") if prefix: yield (None, str(prefix)) if not found: break first, found, rest = rest.partition(" ") if not found: break second, found, rest = rest.partition("}") if not found: break yield (str(first), str(second)) >>> list(splitview('foo{spam eggs}bar{foo bar}')) [(None, 'foo'), ('spam', 'eggs'), (None, 'bar'), ('foo', 'bar')] -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From fredrik at pythonware.com Fri Aug 25 15:06:13 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri, 25 Aug 2006 15:06:13 +0200 Subject: [Python-3000] Droping find/rfind? References: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com> <20060823191222.1A76.JCARLSON@uci.edu> <44ED85E5.1000005@livinglogic.de> <44ED9206.1080306@gmail.com> <44EEC7CB.2090908@gmail.com> <44EEEE2A.9080509@gmail.com> Message-ID: Nick Coghlan wrote: > I believe you're thinking about something far more sophisticated than what I'm > suggesting. I'm just talking about a Python data type in a standard library > module that trades off slower performance with smaller strings (due to extra > method call overhead) against improved scalability (due to avoidance of > copying strings around). have you done any benchmarking on this ? From exarkun at divmod.com Fri Aug 25 15:14:51 2006 From: exarkun at divmod.com (Jean-Paul Calderone) Date: Fri, 25 Aug 2006 09:14:51 -0400 Subject: [Python-3000] Droping find/rfind? In-Reply-To: Message-ID: <20060825131452.1717.999901437.divmod.quotient.30940@ohm> On Fri, 25 Aug 2006 15:06:13 +0200, Fredrik Lundh wrote: >Nick Coghlan wrote: > >> I believe you're thinking about something far more sophisticated than what I'm >> suggesting. I'm just talking about a Python data type in a standard library >> module that trades off slower performance with smaller strings (due to extra >> method call overhead) against improved scalability (due to avoidance of >> copying strings around). > >have you done any benchmarking on this ? > I've benchmarked string copying via slicing against views implemented using buffer(). For certain use patterns, views are absolutely significantly faster. Jean-Paul From fredrik at pythonware.com Fri Aug 25 15:31:49 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri, 25 Aug 2006 15:31:49 +0200 Subject: [Python-3000] Droping find/rfind? References: <20060825131452.1717.999901437.divmod.quotient.30940@ohm> Message-ID: Jean-Paul Calderone wrote: >>> I believe you're thinking about something far more sophisticated than what I'm >>> suggesting. I'm just talking about a Python data type in a standard library >>> module that trades off slower performance with smaller strings (due to extra >>> method call overhead) against improved scalability (due to avoidance of >>> copying strings around). >> >>have you done any benchmarking on this ? > > I've benchmarked string copying via slicing against views implemented using > buffer(). For certain use patterns, views are absolutely significantly > faster. of course, but buffers don't support many string methods, so I'm not sure how that's applicable to this case. (and before anyone says "let's fix that, then", please read earlier messages). From jimjjewett at gmail.com Fri Aug 25 16:22:36 2006 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri, 25 Aug 2006 10:22:36 -0400 Subject: [Python-3000] sort vs order (was: What should the focus for 2.6 be?) In-Reply-To: <20060824144524.cz3o2mv4iv40w40k@login.werra.lunarpages.com> References: <20060824144524.cz3o2mv4iv40w40k@login.werra.lunarpages.com> Message-ID: On 8/24/06, Michael Chermside wrote: > Jim Jewett writes: > > Given an arbitrary collection of objects, I want to be able to order > > them in a consistent manner, at least within a single interpreter > > session. > I think this meets your specifications: > >>> myList = [2.5, 17, object(), 3+4j, 'abc'] > >>> myList.sort(key=id) Yes; not nicely, but it does. I would prefer that it be the fallback after first trying a regular sort. Now I'm wondering if the right recipe is to try comparing the objects, then the types, then the id, or whether that would sometimes be inconsistent even for sane objects if only some classes know about each other. The end result is that even if I find a solution that works, I think it will be common (and bug-prone) enough that it really ought to be in the language, or at least the standard library -- as it is today for objects that don't go out of their way to prevent it. > Frankly, I don't know why you have an "arbitrary collection of objects" mostly for debugging and tests. > Of course, I doubt this is what you're doing because if you > REALLY had arbitrary objects (including uncomparable things like > complex numbers) More precisely, my code is buggy when faced with complex numbers or Numeric arrays -- but in practice, it isn't faced with those. It *is* faced with tuples, lists, strings, ints, floats, and instances of arbitrary program-specific classes. These all work fine today, because sort either special cases or falls back to using id *without throwing an exception*. -jJ From paul at prescod.net Fri Aug 25 17:39:47 2006 From: paul at prescod.net (Paul Prescod) Date: Fri, 25 Aug 2006 08:39:47 -0700 Subject: [Python-3000] Droping find/rfind? In-Reply-To: <44EE4F15.2070301@canterbury.ac.nz> References: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com> <20060823191222.1A76.JCARLSON@uci.edu> <44EE4326.1070604@canterbury.ac.nz> <1f7befae0608241801y3b285a12wc27cda5d25949fe0@mail.gmail.com> <44EE4F15.2070301@canterbury.ac.nz> Message-ID: <1cb725390608250839s78cb4c46s378bb56313c1932a@mail.gmail.com> On 8/24/06, Greg Ewing wrote: > > Tim Peters wrote: > > > Perhaps it has to do with the rest of his message ;-): > > > >>>(which reminds me that speeding up handling of optional arguments > >>>to C functions would be an even better use of this energy) > > Until a few moments ago, I didn't know that str.startswith() > had any optional arguments, so I missed the significance of > that. I also didn't know about the optional arguments to startswith and wonder if they are much used or just cruft. Paul Prescod -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060825/026a0124/attachment.html From jcarlson at uci.edu Fri Aug 25 17:47:25 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Fri, 25 Aug 2006 08:47:25 -0700 Subject: [Python-3000] Droping find/rfind? In-Reply-To: References: <20060825131452.1717.999901437.divmod.quotient.30940@ohm> Message-ID: <20060825080148.1AA8.JCARLSON@uci.edu> "Fredrik Lundh" wrote: > Jean-Paul Calderone wrote: > > >>> I believe you're thinking about something far more sophisticated than what I'm > >>> suggesting. I'm just talking about a Python data type in a standard library > >>> module that trades off slower performance with smaller strings (due to extra > >>> method call overhead) against improved scalability (due to avoidance of > >>> copying strings around). > >> > >>have you done any benchmarking on this ? > > > > I've benchmarked string copying via slicing against views implemented using > > buffer(). For certain use patterns, views are absolutely significantly > > faster. > > of course, but buffers don't support many string methods, so I'm not sure how > that's applicable to this case. > > (and before anyone says "let's fix that, then", please read earlier messages). Aside from the scheduled removal of buffer in 3.x, I see no particular issue with offering a bytes view and str view in 3.x via two specific bytes and str subtypes. With care, very few changes if any would be necessary in the str (unicode) implementation, and the bytesview consistancy updating is already being done with current buffer objects. From there, the only quesion is when an operation on a bytes or str object should return such a view, and the answer would be never. Return views from view objects, the non-views from non-view objects. If you want views, wrap your original object with a view, and call its methods. If you need a non-view, call the standard bytes/str constructor. - Josiah From guido at python.org Fri Aug 25 17:48:34 2006 From: guido at python.org (Guido van Rossum) Date: Fri, 25 Aug 2006 08:48:34 -0700 Subject: [Python-3000] Droping find/rfind? In-Reply-To: <20060825080148.1AA8.JCARLSON@uci.edu> References: <20060825131452.1717.999901437.divmod.quotient.30940@ohm> <20060825080148.1AA8.JCARLSON@uci.edu> Message-ID: On 8/25/06, Josiah Carlson wrote: > Aside from the scheduled removal of buffer in 3.x, I see no particular > issue with offering a bytes view and str view in 3.x via two specific > bytes and str subtypes. With care, very few changes if any would be > necessary in the str (unicode) implementation, and the bytesview > consistancy updating is already being done with current buffer objects. > > >From there, the only quesion is when an operation on a bytes or str > object should return such a view, and the answer would be never. Return > views from view objects, the non-views from non-view objects. If you > want views, wrap your original object with a view, and call its methods. > If you need a non-view, call the standard bytes/str constructor. For the record, I think this is a major case of YAGNI. You appear way to obsessed with performance of some microscopic aspect of the language. Please stop firing random proposals until you actually have working code and proof that it matters. Speeding up microbenchmarks is irrelevant. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From exarkun at divmod.com Fri Aug 25 18:29:50 2006 From: exarkun at divmod.com (Jean-Paul Calderone) Date: Fri, 25 Aug 2006 12:29:50 -0400 Subject: [Python-3000] Droping find/rfind? In-Reply-To: Message-ID: <20060825162950.1717.331562078.divmod.quotient.31042@ohm> On Fri, 25 Aug 2006 08:48:34 -0700, Guido van Rossum wrote: >On 8/25/06, Josiah Carlson wrote: >> Aside from the scheduled removal of buffer in 3.x, I see no particular >> issue with offering a bytes view and str view in 3.x via two specific >> bytes and str subtypes. With care, very few changes if any would be >> necessary in the str (unicode) implementation, and the bytesview >> consistancy updating is already being done with current buffer objects. >> >> >From there, the only quesion is when an operation on a bytes or str >> object should return such a view, and the answer would be never. Return >> views from view objects, the non-views from non-view objects. If you >> want views, wrap your original object with a view, and call its methods. >> If you need a non-view, call the standard bytes/str constructor. > >For the record, I think this is a major case of YAGNI. You appear way >to obsessed with performance of some microscopic aspect of the >language. Please stop firing random proposals until you actually have >working code and proof that it matters. Speeding up microbenchmarks is >irrelevant. Twisted's core loop uses string views to avoid unnecessary copying. This has proven to be a real-world speedup. This isn't a synthetic benchmark or a micro-optimization. I don't understand the resistance. Is it really so earth-shatteringly surprising that not copying memory unnecessarily is faster than copying memory unnecessarily? If the goal is to avoid speeding up Python programs because views are too complex or unpythonic or whatever, fine. But there isn't really any question as to whether or not this is a real optimization. Jean-Paul From jimjjewett at gmail.com Fri Aug 25 18:41:27 2006 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri, 25 Aug 2006 12:41:27 -0400 Subject: [Python-3000] simplifying methods (was: Re: Droping find/rfind?) Message-ID: On 8/24/06, Greg Ewing wrote: > Until a few moments ago, I didn't know that str.startswith() > had any optional arguments I just looked them up, and they turn out to just be syntactic sugar for a slice. (Even to the extent of handling omitted arguments as None.) The stop argument in particular is (almost) silly. s.startswith(prefix, start, stop) === s[start:stop].startswith(prefix) Ignoring efficiency concerns, would dropping the optional arguments and requiring an explicit slice be a valid Py3K simplification? -jJ From jimjjewett at gmail.com Fri Aug 25 18:55:33 2006 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri, 25 Aug 2006 12:55:33 -0400 Subject: [Python-3000] Droping find/rfind? In-Reply-To: <20060825080148.1AA8.JCARLSON@uci.edu> References: <20060825131452.1717.999901437.divmod.quotient.30940@ohm> <20060825080148.1AA8.JCARLSON@uci.edu> Message-ID: On 8/25/06, Josiah Carlson wrote: > From there, the only quesion is when an operation on a bytes or str > object should return such a view, and the answer would be never. Return > views from view objects, the non-views from non-view objects. If you > want views, wrap your original object with a view, and call its methods. > If you need a non-view, call the standard bytes/str constructor. I do like the idea of permitting multiple string *implementations*, some of which might store their characters elsewhere, as lists and large tables do. But this needs to be an automatic implementation detail, like the distiction between int and long. If the choice must be explicit, then people who worry too much about speed will start wrapping all string references in view(). This is worse (and more tempting) then the default-argument len=len hack. -jJ From guido at python.org Fri Aug 25 19:37:44 2006 From: guido at python.org (Guido van Rossum) Date: Fri, 25 Aug 2006 10:37:44 -0700 Subject: [Python-3000] simplifying methods (was: Re: Droping find/rfind?) In-Reply-To: References: Message-ID: Then you would have to drop the same style of optional arguments from all string methods. There is a method to this madness: the slice arguments let you search through the string without actually making the slice copy. This matters rarely, but when it does, it can matter a lot -- imagine s being 100 MB long, and the specified slice being a large portion of that. (Yes, the string "views" that some folks would like to add could solve this in a different way. But IMO the views make everybody pay because basic usage of the string data type will be slower, and there are horrible worst-case scenarios (such as keeping one word from many 10-MB strings). We've gone over this many times without anybody ever showing a realistic bullet-proof imlpementation or performance figures other than micro-benchmarks. Perhaps someone should write a PEP so I can reject it. :-) --Guido On 8/25/06, Jim Jewett wrote: > On 8/24/06, Greg Ewing wrote: > > Until a few moments ago, I didn't know that str.startswith() > > had any optional arguments > > I just looked them up, and they turn out to just be syntactic sugar > for a slice. (Even to the extent of handling omitted arguments as > None.) The stop argument in particular is (almost) silly. > > s.startswith(prefix, start, stop) === s[start:stop].startswith(prefix) > > Ignoring efficiency concerns, would dropping the optional arguments > and requiring an explicit slice be a valid Py3K simplification? > > -jJ > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Aug 25 19:53:15 2006 From: guido at python.org (Guido van Rossum) Date: Fri, 25 Aug 2006 10:53:15 -0700 Subject: [Python-3000] Droping find/rfind? In-Reply-To: <20060825162950.1717.331562078.divmod.quotient.31042@ohm> References: <20060825162950.1717.331562078.divmod.quotient.31042@ohm> Message-ID: On 8/25/06, Jean-Paul Calderone wrote: > >For the record, I think this is a major case of YAGNI. You appear way > >to obsessed with performance of some microscopic aspect of the > >language. Please stop firing random proposals until you actually have > >working code and proof that it matters. Speeding up microbenchmarks is > >irrelevant. > > Twisted's core loop uses string views to avoid unnecessary copying. This > has proven to be a real-world speedup. This isn't a synthetic benchmark > or a micro-optimization. OK, that's the kind of data I was hoping for; if this was mentioned before I apologize. Did they implement this in C or in Python? Can you point us to the docs for their API? > I don't understand the resistance. Is it really so earth-shatteringly > surprising that not copying memory unnecessarily is faster than copying > memory unnecessarily? It depends on how much bookkeeping is needed to properly free the underlying buffer when it is no longer referenced, and whether the application repeatedly takes short long-lived slices of long otherwise short-lived buffers. Unless you have a heuristic for deciding to copy at some point, you may waste a lot of space. > If the goal is to avoid speeding up Python programs because views are too > complex or unpythonic or whatever, fine. But there isn't really any > question as to whether or not this is a real optimization. There are many ways to implement views. It has often been proposed to make views an automatic feature of the basic string object. There the optimization in one case has to be weighed against the pessimization in another case (like the bookkeeping overhead everywhere and the worst-case scenario I mentioned above). If views have to be explicitly requested that may not be a problem because the app author will (hopefully) understand the issues. But even if it was just a standard library module, I would worry that many inexperienced programmers would complicate their code by using the string views module without real benefits. Sort of the way some folks have knee-jerk habits to write def foo(x, None=None): if they use None anywhere in the body of the function. This should be done only as a last resort when real-life measurements have shown that foo() is a performance show-stopper. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From exarkun at divmod.com Fri Aug 25 20:49:02 2006 From: exarkun at divmod.com (Jean-Paul Calderone) Date: Fri, 25 Aug 2006 14:49:02 -0400 Subject: [Python-3000] Droping find/rfind? In-Reply-To: Message-ID: <20060825184902.1717.697934511.divmod.quotient.31126@ohm> On Fri, 25 Aug 2006 10:53:15 -0700, Guido van Rossum wrote: >On 8/25/06, Jean-Paul Calderone wrote: >> >For the record, I think this is a major case of YAGNI. You appear way >> >to obsessed with performance of some microscopic aspect of the >> >language. Please stop firing random proposals until you actually have >> >working code and proof that it matters. Speeding up microbenchmarks is >> >irrelevant. >> >>Twisted's core loop uses string views to avoid unnecessary copying. This >>has proven to be a real-world speedup. This isn't a synthetic benchmark >>or a micro-optimization. > >OK, that's the kind of data I was hoping for; if this was mentioned >before I apologize. Did they implement this in C or in Python? Can you >point us to the docs for their API? One instance of this is an implementation detail which doesn't impact any application-level APIs: http://twistedmatrix.com/trac/browser/trunk/twisted/internet/abstract.py?r=17451#L88 Another instance of this is implemented in C++: http://twistedmatrix.com/trac/browser/sandbox/itamar/cppreactor/fusion but doesn't interact a lot with Python code. The C++ API uses char* with a length (a natural way to implement string views in C/C++). The Python API just uses strings, because Twisted has always used str here, and passing in a buffer would break everything expecting something with str methods. >>I don't understand the resistance. Is it really so earth-shatteringly >>surprising that not copying memory unnecessarily is faster than copying >>memory unnecessarily? > >It depends on how much bookkeeping is needed to properly free the >underlying buffer when it is no longer referenced, and whether the >application repeatedly takes short long-lived slices of long otherwise >short-lived buffers. Unless you have a heuristic for deciding to copy >at some point, you may waste a lot of space. Certainly. The first link above includes an example of such a heuristic. >>If the goal is to avoid speeding up Python programs because views are too >>complex or unpythonic or whatever, fine. But there isn't really any >>question as to whether or not this is a real optimization. > >There are many ways to implement views. It has often been proposed to >make views an automatic feature of the basic string object. There the >optimization in one case has to be weighed against the pessimization >in another case (like the bookkeeping overhead everywhere and the >worst-case scenario I mentioned above). I'm happy to see things progress one step at a time. Having them _at all_ (buffer) was a good place to start. A view which has string methods is a nice incremental improvement. Maybe somewhere down the line there can be a single type which magically knows how to behave optimally for all programs, but I'm not asking for that yet. ;) >If views have to be explicitly >requested that may not be a problem because the app author will >(hopefully) understand the issues. But even if it was just a standard >library module, I would worry that many inexperienced programmers >would complicate their code by using the string views module without >real benefits. Sort of the way some folks have knee-jerk habits to >write > > def foo(x, None=None): > >if they use None anywhere in the body of the function. This should be >done only as a last resort when real-life measurements have shown that >foo() is a performance show-stopper. > I don't think we see people overusing buffer() in ways which damage readability now, and buffer is even a builtin. Tossing something off into a module somewhere shouldn't really be a problem. To most people who don't actually know what they're doing, the idea to optimize code by reducing memory copying usually just doesn't come up. Jean-Paul From rrr at ronadam.com Fri Aug 25 20:59:46 2006 From: rrr at ronadam.com (Ron Adam) Date: Fri, 25 Aug 2006 13:59:46 -0500 Subject: [Python-3000] Droping find/rfind? In-Reply-To: <44EEEE2A.9080509@gmail.com> References: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com> <20060823191222.1A76.JCARLSON@uci.edu> <44ED85E5.1000005@livinglogic.de> <44ED9206.1080306@gmail.com> <44EEC7CB.2090908@gmail.com> <44EEEE2A.9080509@gmail.com> Message-ID: Nick Coghlan wrote: > Fredrik Lundh wrote: >> Nick Coghlan wrote: >> >>>> Nick Coghlan wrote: >>>> >>>>> With a variety of "view types", that work like the corresponding builtin type, >>>>> but reference the original data structure instead of creating copies >>>> support for string views would require some serious interpreter surgery, though, >>>> and probably break quite a few extensions... >>> Why do you say that? >> because I happen to know a lot about how Python's string types are >> implemented ? > > I believe you're thinking about something far more sophisticated than what I'm > suggesting. I'm just talking about a Python data type in a standard library > module that trades off slower performance with smaller strings (due to extra > method call overhead) against improved scalability (due to avoidance of > copying strings around). > >>> make a view of it >> so to make a view of a string, you make a view of it ? > > Yep - by using all those "start" and "stop" optional arguments to builtin > string methods to implement the methods of a string view in pure Python. By > creating the string view all you would really be doing is a partial > application of start and stop arguments on all of the relevant string methods. > > I've included an example below that just supports __len__, __str__ and > partition(). The source object survives for as long as the view does - the > idea is that the view should only last while you manipulate the string, with > only real strings released outside the function via return statements or yield > expressions. >>> self.source = "%s" % source I think this should be. self.source = source Other wise you are making copies of the source which is what you are trying to avoid. I'm not sure if python would reuse the self.source string, but I wouldn't count on it. It might be nice if slice objects could be used in more ways in python. That may work in most cases where you would want a string view. An example of a slice version of partition would be: (not tested) def slice_partition(s, sep, sub_slice=None): if sub_slice is None: sub_slice = slice(len(s)) found_slice = find_slice(s, sep, sub_slice) prefix_slice = slice(sub_slice.start, found_slice.start) rest_slice = slice(found_slice.stop, sub_slice.stop) return ( prefix_slice, found_slice, rest_slice ) # implementation of find_slice left to readers. def find_slice(s, sub, sub_slice=None): ... return found_slice Of course this isn't needed for short strings, but might be worth while when used with very long strings. > # Simple string view example > class strview(object): > def __new__(cls, source, start=None, stop=None): > self = object.__new__(cls) > self.source = "%s" % source > self.start = start if start is not None else 0 > self.stop = stop if stop is not None else len(source) > return self > def __str__(self): > return self.source[self.start:self.stop] > def __len__(self): > return self.stop - self.start > def partition(self, sep): > _src = self.source > try: > startsep = _src.index(sep, self.start, self.stop) > except ValueError: > # Separator wasn't found! > return self, _NULL_STR, _NULL_STR > # Return new views of the three string parts > endsep = startsep + len(sep) > return (strview(_src, self.start, startsep), > strview(_src, startsep, endsep), > strview(_src, endsep, self.stop)) > > _NULL_STR = strview('') > > def splitview(s): > rest = strview(s) > while 1: > prefix, found, rest = rest.partition("{") > if prefix: > yield (None, str(prefix)) > if not found: > break > first, found, rest = rest.partition(" ") > if not found: > break > second, found, rest = rest.partition("}") > if not found: > break > yield (str(first), str(second)) > > >>> list(splitview('foo{spam eggs}bar{foo bar}')) > [(None, 'foo'), ('spam', 'eggs'), (None, 'bar'), ('foo', 'bar')] From rrr at ronadam.com Fri Aug 25 21:08:50 2006 From: rrr at ronadam.com (Ron Adam) Date: Fri, 25 Aug 2006 14:08:50 -0500 Subject: [Python-3000] sort vs order (was: What should the focus for 2.6 be?) In-Reply-To: References: <20060824144524.cz3o2mv4iv40w40k@login.werra.lunarpages.com> Message-ID: Jim Jewett wrote: > The end result is that even if I find a solution that works, I think > it will be common (and bug-prone) enough that it really ought to be in > the language, or at least the standard library -- as it is today for > objects that don't go out of their way to prevent it. The usual way to handle this in databases is to generate an unique id_key when the data is entered. That also allows for duplicate entries such as people with the same name, or multiple items with the same part number. From guido at python.org Fri Aug 25 21:13:31 2006 From: guido at python.org (Guido van Rossum) Date: Fri, 25 Aug 2006 12:13:31 -0700 Subject: [Python-3000] Droping find/rfind? In-Reply-To: <20060825184902.1717.697934511.divmod.quotient.31126@ohm> References: <20060825184902.1717.697934511.divmod.quotient.31126@ohm> Message-ID: On 8/25/06, Jean-Paul Calderone wrote: > >>Twisted's core loop uses string views to avoid unnecessary copying. This > >>has proven to be a real-world speedup. This isn't a synthetic benchmark > >>or a micro-optimization. > > > >OK, that's the kind of data I was hoping for; if this was mentioned > >before I apologize. Did they implement this in C or in Python? Can you > >point us to the docs for their API? > > One instance of this is an implementation detail which doesn't impact any application-level APIs: > > http://twistedmatrix.com/trac/browser/trunk/twisted/internet/abstract.py?r=17451#L88 You are referring to the two calls to buffer(), right? It seems a pretty rare use case (though an important one). I wonder how often offset != 0 in practice. I'd like the new 3.0 I/O library provide better support for writing part of a buffer, e.g. by adding an optional offset parameter to write(). > Another instance of this is implemented in C++: > > http://twistedmatrix.com/trac/browser/sandbox/itamar/cppreactor/fusion > > but doesn't interact a lot with Python code. The C++ API uses char* with a length (a natural way to implement string views in C/C++). The Python API just uses strings, because Twisted has always used str here, and passing in a buffer would break everything expecting something with str methods. This doesn't seem a particularly strong use case (but I can't say I understand the code or how it's used). > >>I don't understand the resistance. Is it really so earth-shatteringly > >>surprising that not copying memory unnecessarily is faster than copying > >>memory unnecessarily? > > > >It depends on how much bookkeeping is needed to properly free the > >underlying buffer when it is no longer referenced, and whether the > >application repeatedly takes short long-lived slices of long otherwise > >short-lived buffers. Unless you have a heuristic for deciding to copy > >at some point, you may waste a lot of space. > > Certainly. The first link above includes an example of such a heuristic. Because the app is in control it is easy to avoid the worst-case behvior of the heuristoc. > >>If the goal is to avoid speeding up Python programs because views are too > >>complex or unpythonic or whatever, fine. But there isn't really any > >>question as to whether or not this is a real optimization. > > > >There are many ways to implement views. It has often been proposed to > >make views an automatic feature of the basic string object. There the > >optimization in one case has to be weighed against the pessimization > >in another case (like the bookkeeping overhead everywhere and the > >worst-case scenario I mentioned above). > > I'm happy to see things progress one step at a time. Having them _at > all_ (buffer) was a good place to start. But buffer() is on the kick-list for Py3k right now. Perhaps the new bytes object will make it possible to write the first example above differently; bytes will be mutable which changes everything. > A view which has string methods > is a nice incremental improvement. Maybe somewhere down the line there > can be a single type which magically knows how to behave optimally for all > programs, but I'm not asking for that yet. ;) I still expect that a view with string methods will find more abuse than legitimate use. > >If views have to be explicitly > >requested that may not be a problem because the app author will > >(hopefully) understand the issues. But even if it was just a standard > >library module, I would worry that many inexperienced programmers > >would complicate their code by using the string views module without > >real benefits. Sort of the way some folks have knee-jerk habits to > >write > > > > def foo(x, None=None): > > > >if they use None anywhere in the body of the function. This should be > >done only as a last resort when real-life measurements have shown that > >foo() is a performance show-stopper. > > I don't think we see people overusing buffer() in ways which damage > readability now, and buffer is even a builtin. But it has been riddled by problems in the past so most people know to steer clear of it. > Tossing something off > into a module somewhere shouldn't really be a problem. To most people > who don't actually know what they're doing, the idea to optimize code > by reducing memory copying usually just doesn't come up. That final remark is a matter of opinion. I've seen too much code that mindlessly copied idioms that were supposed to magically speed up certain things to believe it. Often, people who don't know what they are doing are more worried about speed than people who do, and they copy all the wrong examples... :-( -- --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik at pythonware.com Fri Aug 25 22:23:18 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri, 25 Aug 2006 22:23:18 +0200 Subject: [Python-3000] Droping find/rfind? In-Reply-To: <20060825080148.1AA8.JCARLSON@uci.edu> References: <20060825131452.1717.999901437.divmod.quotient.30940@ohm> <20060825080148.1AA8.JCARLSON@uci.edu> Message-ID: Josiah Carlson wrote: > Aside from the scheduled removal of buffer in 3.x, I see no particular > issue with offering a bytes view and str view in 3.x via two specific > bytes and str subtypes. the fact that it's *impossible* to offer a view subtype that's com- patible with the current PyString C API might be an issue, though. From fredrik at pythonware.com Fri Aug 25 22:27:09 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri, 25 Aug 2006 22:27:09 +0200 Subject: [Python-3000] Droping find/rfind? In-Reply-To: References: <20060825184902.1717.697934511.divmod.quotient.31126@ohm> Message-ID: Guido van Rossum wrote: > That final remark is a matter of opinion. I've seen too much code that > mindlessly copied idioms that were supposed to magically speed up > certain things to believe it. Often, people who don't know what they > are doing are more worried about speed than people who do, and they > copy all the wrong examples... :-( +1. From krstic at solarsail.hcs.harvard.edu Fri Aug 25 22:47:29 2006 From: krstic at solarsail.hcs.harvard.edu (=?UTF-8?B?SXZhbiBLcnN0acSH?=) Date: Fri, 25 Aug 2006 16:47:29 -0400 Subject: [Python-3000] Droping find/rfind? In-Reply-To: <20060825184902.1717.697934511.divmod.quotient.31126@ohm> References: <20060825184902.1717.697934511.divmod.quotient.31126@ohm> Message-ID: <44EF61E1.5050001@solarsail.hcs.harvard.edu> Jean-Paul Calderone wrote: > http://twistedmatrix.com/trac/browser/sandbox/itamar/cppreactor/fusion This is the same Itamar who, in the talk I linked a few days ago (http://ln-s.net/D+u) extolled buffer as a very real performance improvement in fast python networking, and asked for broader and more complete support for buffers, rather than their removal. A bunch of people, myself included, want to use Python as a persistent network server. Proper support for reading into already-allocated memory, and non-copying strings are pretty indispensable for serious production use. -- Ivan Krsti? | GPG: 0x147C722D From tjreedy at udel.edu Fri Aug 25 23:00:47 2006 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 25 Aug 2006 17:00:47 -0400 Subject: [Python-3000] Droping find/rfind? References: <20060825184902.1717.697934511.divmod.quotient.31126@ohm> Message-ID: "Guido van Rossum" wrote in message news:ca471dc20608251213v70f3a1b1y29df7affbf3f9522 at mail.gmail.com... > But buffer() is on the kick-list for Py3k right now. Perhaps the new > bytes object will make it possible to write the first example above > differently; bytes will be mutable which changes everything. I never learned about buffers and buffer() because in various ways they have been underdocumented and label problematical and sujbect to revision or removal. > I still expect that a view with string methods will find more abuse > than legitimate use. Perhaps views should first be written and released by advocates as 3rd-party modules (in C or Python), possibly in more than one competing version, to be tested by interested members of the community and subject to the usual criteria for inclusion in the standard library or even the core. Then we would have some performance and usage data to argue with ;-). Terry Jan Reedy From tjreedy at udel.edu Fri Aug 25 23:08:40 2006 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 25 Aug 2006 17:08:40 -0400 Subject: [Python-3000] sort vs order (was: What should the focus for 2.6be?) References: <20060824144524.cz3o2mv4iv40w40k@login.werra.lunarpages.com> Message-ID: "Ron Adam" wrote in message news:ecni1g$5i8$1 at sea.gmane.org... > Jim Jewett wrote: > >> The end result is that even if I find a solution that works, I think >> it will be common (and bug-prone) enough that it really ought to be in >> the language, or at least the standard library -- as it is today for >> objects that don't go out of their way to prevent it. Id() *is* in builtins. Now that sort has a key parameter, I think an explicit 'key = id' qualifies enough as 'in the language' for something used not too often. > The usual way to handle this in databases is to generate an unique > id_key when the data is entered. Which is what Python does when objects are created. > That also allows for duplicate entries > such as people with the same name, or multiple items with the same part > number. Or multiple objects with the same value. Terry Jan Reedy From fredrik at pythonware.com Sat Aug 26 01:16:04 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sat, 26 Aug 2006 01:16:04 +0200 Subject: [Python-3000] PyString C API Message-ID: > the fact that it's *impossible* to offer a view subtype that's com- > patible with the current PyString C API might be an issue, though. what's the current thinking wrt. the PyString C API, btw. has any of the various bytes/wide string design proposals looked at the C API level ? From guido at python.org Sat Aug 26 01:32:48 2006 From: guido at python.org (Guido van Rossum) Date: Fri, 25 Aug 2006 16:32:48 -0700 Subject: [Python-3000] PyString C API In-Reply-To: References: Message-ID: On 8/25/06, Fredrik Lundh wrote: > > the fact that it's *impossible* to offer a view subtype that's com- > > patible with the current PyString C API might be an issue, though. > > what's the current thinking wrt. the PyString C API, btw. has any of the > various bytes/wide string design proposals looked at the C API level ? No... I was hoping to get to that but ended up spending unanticipated time on fixing comparisons. Maybe the first step ought to be similar to what was done for int/long unification -- keep both the PyString_ and PyUnicode_ APIs around but make the PyString_ APIs do whatever they do on Unicode objects instead. Each use of certain macros will still have to be patched, obviously; e.g. a common way to create a string is to call PyString_FromStringAndSize(NULL, nbytes) and then to call something like memcpy(PyString_AS_STRING(obj), source, nbytes) -- this won't work of course. There are a bunch of PyBytes_ APIs that can be used in those places where 8-bit strings are really used to hold binary data, not characters. These have been modeled on the PyString APIs (even with AS_STRING and GET_SIZE macros). See Include/bytesobject.h. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jackdied at jackdied.com Sat Aug 26 02:19:23 2006 From: jackdied at jackdied.com (Jack Diederich) Date: Fri, 25 Aug 2006 20:19:23 -0400 Subject: [Python-3000] cleaning up *path.py code duplication Message-ID: <20060826001923.GD24154@performancedrivers.com> While checking find() uses in the stdlib I noticed that the various path modules have duplicate code and docstrings for some generic path manipulations. Delightfully they even have different implementations and docstrings for identical functions. splitext() is a great bad example - os2emxpath.splitext() builds up strings by doing char-by-char concatenations where everyone else uses find() + slice. If there are no objections I'll move these into a module named genericpath.py and change the others to do from genericpath import func1, func2, funcN where applicable. So, any objections? Should it be a 2.6 backport too? -Jack From guido at python.org Sat Aug 26 02:35:33 2006 From: guido at python.org (Guido van Rossum) Date: Fri, 25 Aug 2006 17:35:33 -0700 Subject: [Python-3000] cleaning up *path.py code duplication In-Reply-To: <20060826001923.GD24154@performancedrivers.com> References: <20060826001923.GD24154@performancedrivers.com> Message-ID: Sounds like a great 2.6 project. Beware of things that are intentionally different between platforms of course! --Guido On 8/25/06, Jack Diederich wrote: > While checking find() uses in the stdlib I noticed that the various > path modules have duplicate code and docstrings for some generic path > manipulations. Delightfully they even have different implementations > and docstrings for identical functions. splitext() is a great bad > example - os2emxpath.splitext() builds up strings by doing char-by-char > concatenations where everyone else uses find() + slice. > > If there are no objections I'll move these into a module named > genericpath.py and change the others to do > > from genericpath import func1, func2, funcN > > where applicable. > > So, any objections? Should it be a 2.6 backport too? > > -Jack > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg.ewing at canterbury.ac.nz Sat Aug 26 03:32:10 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 26 Aug 2006 13:32:10 +1200 Subject: [Python-3000] Droping find/rfind? In-Reply-To: <1cb725390608250839s78cb4c46s378bb56313c1932a@mail.gmail.com> References: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com> <20060823191222.1A76.JCARLSON@uci.edu> <44EE4326.1070604@canterbury.ac.nz> <1f7befae0608241801y3b285a12wc27cda5d25949fe0@mail.gmail.com> <44EE4F15.2070301@canterbury.ac.nz> <1cb725390608250839s78cb4c46s378bb56313c1932a@mail.gmail.com> Message-ID: <44EFA49A.1060201@canterbury.ac.nz> Paul Prescod wrote: > I also didn't know about the optional arguments to startswith and wonder > if they are much used or just cruft. Looking through the string methods, it appears that only a few of them, seemingly chosen arbitrarily, have start and stop arguments. Seems to me a string-view object supporting all of the string methods would be a much better idea than this haphazard mixture, and would fit in nicely with the Py3k views philosophy. -- Greg From jackdied at jackdied.com Sat Aug 26 03:52:43 2006 From: jackdied at jackdied.com (Jack Diederich) Date: Fri, 25 Aug 2006 21:52:43 -0400 Subject: [Python-3000] cleaning up *path.py code duplication In-Reply-To: References: <20060826001923.GD24154@performancedrivers.com> Message-ID: <20060826015243.GE24154@performancedrivers.com> Ooph, there is some dissonance in the comments and the code. Cut-n-paste errors I suppose. -- ntpath.py -- def islink(path): """Test for symbolic link. On WindowsNT/95 always returns false""" return False # This follows symbolic links, so both islink() and isdir() can be true # for the same path. def isfile(path): """Test whether a path is a regular file""" -- end exeprt -- I'll try and keep a list so those in the know can do a post mortem on the comments. I'm only useful for vetting the *nix versions. -Jack On Fri, Aug 25, 2006 at 05:35:33PM -0700, Guido van Rossum wrote: > Sounds like a great 2.6 project. Beware of things that are > intentionally different between platforms of course! > > --Guido > > On 8/25/06, Jack Diederich wrote: > > While checking find() uses in the stdlib I noticed that the various > > path modules have duplicate code and docstrings for some generic path > > manipulations. Delightfully they even have different implementations > > and docstrings for identical functions. splitext() is a great bad > > example - os2emxpath.splitext() builds up strings by doing char-by-char > > concatenations where everyone else uses find() + slice. > > > > If there are no objections I'll move these into a module named > > genericpath.py and change the others to do > > > > from genericpath import func1, func2, funcN > > > > where applicable. > > > > So, any objections? Should it be a 2.6 backport too? > > > > -Jack > > _______________________________________________ > > Python-3000 mailing list > > Python-3000 at python.org > > http://mail.python.org/mailman/listinfo/python-3000 > > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/jack%40performancedrivers.com > From ncoghlan at gmail.com Sat Aug 26 09:27:46 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 26 Aug 2006 17:27:46 +1000 Subject: [Python-3000] Droping find/rfind? In-Reply-To: <44EF61E1.5050001@solarsail.hcs.harvard.edu> References: <20060825184902.1717.697934511.divmod.quotient.31126@ohm> <44EF61E1.5050001@solarsail.hcs.harvard.edu> Message-ID: <44EFF7F2.7070407@gmail.com> Ivan Krsti? wrote: > Jean-Paul Calderone wrote: >> http://twistedmatrix.com/trac/browser/sandbox/itamar/cppreactor/fusion > > This is the same Itamar who, in the talk I linked a few days ago > (http://ln-s.net/D+u) extolled buffer as a very real performance > improvement in fast python networking, and asked for broader and more > complete support for buffers, rather than their removal. > > A bunch of people, myself included, want to use Python as a persistent > network server. Proper support for reading into already-allocated > memory, and non-copying strings are pretty indispensable for serious > production use. A mutable bytes type with deque-like performance characteristics (i.e O(1) insert/pop at index 0 as well as at the end), as well as the appropriate mutating methods (like read_into()) should go a long way to meeting those needs. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ncoghlan at gmail.com Sat Aug 26 10:02:15 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 26 Aug 2006 18:02:15 +1000 Subject: [Python-3000] Droping find/rfind? In-Reply-To: References: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com> <20060823191222.1A76.JCARLSON@uci.edu> <44ED85E5.1000005@livinglogic.de> <44ED9206.1080306@gmail.com> <44EEC7CB.2090908@gmail.com> <44EEEE2A.9080509@gmail.com> Message-ID: <44F00007.3050107@gmail.com> Ron Adam wrote: > Nick Coghlan wrote: >> Fredrik Lundh wrote: >>> Nick Coghlan wrote: >>> >>>>> Nick Coghlan wrote: >>>>> >>>>>> With a variety of "view types", that work like the corresponding builtin type, >>>>>> but reference the original data structure instead of creating copies >>>>> support for string views would require some serious interpreter surgery, though, >>>>> and probably break quite a few extensions... >>>> Why do you say that? >>> because I happen to know a lot about how Python's string types are >>> implemented ? >> I believe you're thinking about something far more sophisticated than what I'm >> suggesting. I'm just talking about a Python data type in a standard library >> module that trades off slower performance with smaller strings (due to extra >> method call overhead) against improved scalability (due to avoidance of >> copying strings around). >> >>>> make a view of it >>> so to make a view of a string, you make a view of it ? >> Yep - by using all those "start" and "stop" optional arguments to builtin >> string methods to implement the methods of a string view in pure Python. By >> creating the string view all you would really be doing is a partial >> application of start and stop arguments on all of the relevant string methods. >> >> I've included an example below that just supports __len__, __str__ and >> partition(). The source object survives for as long as the view does - the >> idea is that the view should only last while you manipulate the string, with >> only real strings released outside the function via return statements or yield >> expressions. > > > >>> self.source = "%s" % source > > I think this should be. > > self.source = source > > Other wise you are making copies of the source which is what you > are trying to avoid. I'm not sure if python would reuse the self.source > string, but I wouldn't count on it. CPython 2.5 certainly doesn't reuse the existing string object. Given that what I wrote is the way to ensure you have a builtin string type (str or unicode) without coercing actual unicode objects to str objects or vice-versa, it should probably be subjected to the same optimisation as the str() and unicode() constructors (i.e., simply increfing and returning the original builtin string). > It might be nice if slice objects could be used in more ways in python. > That may work in most cases where you would want a string view. That's quite an interesting idea. With that approach, rather than having to duplicate 'concrete sequence with copying semantics' and 'sequence view with non-copying semantics' everywhere, you could just provide methods on objects that returned the appropriate slice objects representing the location of relevant sections, rather than copies of the sections themselves. To make that work effectively, you'd need to implement __nonzero__ on slice objects as "((self.stop - self.start) // self.step) > 0" (Either that or implement __len__, which would contribute to making slice() look more and more like xrange(), as someone else noted recently). Using the same signature as partition: def partition_indices(self, sep, start=None, stop=None): if start is None: start = 0 if stop is None: stop = len(s) try: idxsep = self.index(sep, start, stop) except ValueError: return slice(start, stop), slice(0), slice(0) endsep = idxsep + len(sep) return slice(start, idxsep), slice(idxsep, endsep), slice(endsep, stop) Then partition() itself would be equivalent to: def partition(self, sep, start=None, stop=None): before, sep, after = self.partition_indices(sep, start, stop) return self[before], self[sep], self[after] Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From jcarlson at uci.edu Sat Aug 26 10:29:01 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Sat, 26 Aug 2006 01:29:01 -0700 Subject: [Python-3000] Droping find/rfind? In-Reply-To: <44EFF7F2.7070407@gmail.com> References: <44EF61E1.5050001@solarsail.hcs.harvard.edu> <44EFF7F2.7070407@gmail.com> Message-ID: <20060826012418.1ABA.JCARLSON@uci.edu> Nick Coghlan wrote: > Ivan Krsti?? wrote: > > Jean-Paul Calderone wrote: > >> http://twistedmatrix.com/trac/browser/sandbox/itamar/cppreactor/fusion > > > > This is the same Itamar who, in the talk I linked a few days ago > > (http://ln-s.net/D+u) extolled buffer as a very real performance > > improvement in fast python networking, and asked for broader and more > > complete support for buffers, rather than their removal. > > > > A bunch of people, myself included, want to use Python as a persistent > > network server. Proper support for reading into already-allocated > > memory, and non-copying strings are pretty indispensable for serious > > production use. > > A mutable bytes type with deque-like performance characteristics (i.e O(1) > insert/pop at index 0 as well as at the end), as well as the appropriate > mutating methods (like read_into()) should go a long way to meeting those needs. The implementation of deque and the idea behind bytes are not compatible. Everything I've heard about the proposal of bytes is that it is effectively a C unsigned char[] with some convenience methods, very similar to a Python array.array("B"), with different methods. There is also an implementation in the Py3k branch. Also, while I would have a use for bytes as currently implemented (with readinto() ), I would have approximately zero use for a deque-like bytes object (never mind that due to Python not allowing multi-segment buffers, etc., it would be functionally impossible to get equivalent time bounds). - Josiah From ncoghlan at iinet.net.au Sat Aug 26 11:12:27 2006 From: ncoghlan at iinet.net.au (Nick Coghlan) Date: Sat, 26 Aug 2006 19:12:27 +1000 Subject: [Python-3000] Making more effective use of slice objects in Py3k Message-ID: <44F0107B.20205@iinet.net.au> This idea is inspired by the find/rfind string discussion (particularly a couple of comments from Jim and Ron), but I think the applicability may prove to be wider than just string methods (e.g. I suspect it may prove useful for the bytes() type as well). Copy-on-slice semantics are by far the easiest semantics to deal with in most cases, as they result in the fewest nasty surprises. However, they have one obvious drawback: performance can suffer badly when dealing with large datasets (copying 10 MB chunks of memory around can take a while!). There are a couple of existing workarounds for this: buffer() objects, and the start/stop arguments to a variety of string methods. Neither of these is particular convenient to work with, and buffer() is slated to go away in Py3k. I think an enriched slicing model that allows sequence views to be expressed easily as "this slice of this sequence" would allow this to be dealt with cleanly, without requiring every sequence to provide a corresponding "sequence view" with non-copying semantics. I think Guido's concern that people will reach for string views when they don't need them is also valid (as I believe that it is most often inexperience that leads to premature optimization that then leads to needless code complexity). The specific changes I suggest based on the find/rfind discussion are: 1. make range() (what used to be xrange()) a subclass of slice(), so that range objects can be used to index sequences. The only differences between range() and slice() would then be that start/stop/step will never be None for range instances, and range instances act like an immutable sequence while slice instances do not (i.e. range objects would grow an indices() method). 2. change range() and slice() to accept slice() instances as arguments so that range(range(0)) is equivalent to range(0). (range(x) may throw ValueError if x.stop is None). 3. change API's that currently accept start/stop arguments (like string methods) to accept a single slice() instance instead (possibly raising ValueError if step != 1). 4. provide an additional string method partition_indices() that returns 3 range() objects instead of 3 new strings The new method would have semantics like: def partition_indices(self, sep, limits=None): if limits is None: limits = range(0, len(self)) else: limits = limits.indices(len(self)) try: idxsep = self.index(sep, limits) except ValueError: return limits, range(0), range(0) endsep = idxsep + len(sep) return (range(limits.start, idxsep), range(idxsep, endsep), range(endsep, limits.stop)) With partition() itself being equivalent to: def partition(self, sep, subseq=None): before, sep, after = self.partition_indices(sep, subseq) return self[before], self[sep], self[after] Finally, an efficient partition based implementation of the example from Walter that started the whole discussion about views and the problem with excessive copying would look like: def splitpartition_indices(s): rest = range(len(s)) while 1: prefix, lbrace, rest = s.partition_indices("{", rest) first, space, rest = s.partition_indices(" ", rest) second, rbrace, rest = s.partition_indices("}", rest) if prefix: yield (None, s[prefix]) if not (lbrace and space and rbrace): break yield (s[first], s[second]) (I know the above misses a micro-optimization, in that it calls partition again on an empty subsequence, even if space or lbrace are False. I believe doing the three partition calls together makes it much easier to read, and searching an empty string is pretty quick). For comparison, here's the normal copying version that has problems scaling to large strings: def splitpartition(s): rest = s while 1: prefix, lbrace, rest = rest.partition_indices("{") first, space, rest = rest.partition_indices(" ") second, rbrace, rest = rest.partition_indices("}") if prefix: yield (None, prefix) if not (lbrace and space and rbrace): break yield (first, second) Should I make a Py3k PEP for this? Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ncoghlan at gmail.com Sat Aug 26 11:40:19 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 26 Aug 2006 19:40:19 +1000 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <44F0107B.20205@iinet.net.au> References: <44F0107B.20205@iinet.net.au> Message-ID: <44F01703.6070200@gmail.com> Nick Coghlan wrote: A couple of errors in the sample code. > The new method would have semantics like: > > def partition_indices(self, sep, limits=None): > if limits is None: > limits = range(0, len(self)) > else: > limits = limits.indices(len(self)) Either that line should be: limits = range(*limits.indices(len(self))) Or the definition of indices() would need to be changed to return a range() object instead of a 3-tuple. > For comparison, here's the normal copying version that has problems scaling to > large strings: > > def splitpartition(s): > rest = s > while 1: > prefix, lbrace, rest = rest.partition_indices("{") > first, space, rest = rest.partition_indices(" ") > second, rbrace, rest = rest.partition_indices("}") Those 3 lines should be: prefix, lbrace, rest = rest.partition("{") first, space, rest = rest.partition(" ") second, rbrace, rest = rest.partition("}") Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From rrr at ronadam.com Sat Aug 26 13:46:14 2006 From: rrr at ronadam.com (Ron Adam) Date: Sat, 26 Aug 2006 06:46:14 -0500 Subject: [Python-3000] Droping find/rfind? In-Reply-To: <44F00007.3050107@gmail.com> References: <2773CAC687FD5F4689F526998C7E4E5FF1E90C@au3010avexu1.global.avaya.com> <20060823191222.1A76.JCARLSON@uci.edu> <44ED85E5.1000005@livinglogic.de> <44ED9206.1080306@gmail.com> <44EEC7CB.2090908@gmail.com> <44EEEE2A.9080509@gmail.com> <44F00007.3050107@gmail.com> Message-ID: Nick Coghlan wrote: > Ron Adam wrote: >> Nick Coghlan wrote: [clipped] >> It might be nice if slice objects could be used in more ways in python. >> That may work in most cases where you would want a string view. > > That's quite an interesting idea. With that approach, rather than having to > duplicate 'concrete sequence with copying semantics' and 'sequence view with > non-copying semantics' everywhere, you could just provide methods on objects > that returned the appropriate slice objects representing the location of > relevant sections, rather than copies of the sections themselves. Yes, and possibly having more methods that accept slice objects could make that idea work in a way that would seem more natural. > To make that work effectively, you'd need to implement __nonzero__ on slice > objects as "((self.stop - self.start) // self.step) > 0" (Either that or > implement __len__, which would contribute to making slice() look more and more > like xrange(), as someone else noted recently). Since xrange() has the same signature, it might be nice to be able to use a slice object directly in xrange to get indices to a substring or list. For that to work, slice.indices would need to not return None, and/or xrange would need to accept None. They differ in how they handle negative indices as well. So I expect it may be too big of a change. > Using the same signature as partition: > > def partition_indices(self, sep, start=None, stop=None): > if start is None: start = 0 > if stop is None: stop = len(s) > try: > idxsep = self.index(sep, start, stop) > except ValueError: > return slice(start, stop), slice(0), slice(0) > endsep = idxsep + len(sep) > return slice(start, idxsep), slice(idxsep, endsep), slice(endsep, stop) > > Then partition() itself would be equivalent to: > > def partition(self, sep, start=None, stop=None): > before, sep, after = self.partition_indices(sep, start, stop) > return self[before], self[sep], self[after] > > Cheers, > Nick. Just a little timing for the fun of it. ;-) 2.5c1 (r25c1:51305, Aug 17 2006, 10:41:11) [MSC v.1310 32 bit (Intel)] splitindex : 0.02866 splitview : 0.28021 splitpartition : 0.34991 splitslice : 0.07892 This may not be the best use case, (if you can call it that). It does show that the slice "as a view" idea may have some potential. But underneath it's just using index, so a well written function with index will probably always be faster. Cheers, Ron """ Compare different index, string view, and partition methods. """ # -------- Split by str.index. def splitindex(s): pos = 0 while True: try: posstart = s.index("{", pos) posarg = s.index(" ", posstart) posend = s.index("}", posarg) except ValueError: break yield None, s[pos:posstart] yield s[posstart+1:posarg], s[posarg+1:posend] pos = posend+1 rest = s[pos:] if rest: yield None, rest # --------- Simple string view. class strview(object): def __new__(cls, source, start=None, stop=None): self = object.__new__(cls) self.source = source #self.start = start if start is not None else 0 self.start = start != None and start or 0 #self.stop = stop if stop is not None else len(source) self.stop = stop != None and stop or len(source) return self def __str__(self): return self.source[self.start:self.stop] def __len__(self): return self.stop - self.start def partition(self, sep): _src = self.source try: startsep = _src.index(sep, self.start, self.stop) except ValueError: # Separator wasn't found! return self, _NULL_STR, _NULL_STR # Return new views of the three string parts endsep = startsep + len(sep) return (strview(_src, self.start, startsep), strview(_src, startsep, endsep), strview(_src, endsep, self.stop)) _NULL_STR = strview('') def splitview(s): rest = strview(s) while 1: prefix, found, rest = rest.partition("{") if prefix: yield (None, str(prefix)) if not found: break first, found, rest = rest.partition(" ") if not found: break second, found, rest = rest.partition("}") if not found: break yield (str(first), str(second)) # -------- Split by str.partition. def splitpartition(s): rest = s while 1: prefix, found, temp = rest.partition("{") first, found, temp = temp.partition(" ") second, found, temp = temp.partition("}") if not found: break yield None, prefix yield first, second rest = temp if rest != '': yield None, rest # -------- Split by partition slices. import sys def partslice(s, sep, sub_slice=slice(0, sys.maxint)): start, stop = sub_slice.start, sub_slice.stop try: found = s.index(sep, start, stop) except ValueError: return sub_slice, slice(stop,stop), slice(stop,stop) foundend = found + len(sep) return ( slice(start, found), slice(found, foundend), slice(foundend, stop) ) def splitslice(s): rest = slice(0, sys.maxint) while 1: prefix, found, temp = partslice(s, "{", rest) first, found, temp = partslice(s, " ", temp) second, found, temp = partslice(s, "}", temp) if found.start == found.stop: break yield None, s[prefix] yield s[first], s[second] rest = temp if rest.start != rest.stop: yield None, s[rest] # -------- Tests. import time print sys.version s = 'foo{spam eggs}bar{ham eggs}fob{beacon eggs}' * 2000 + 'xyz' r = list(splitindex(s)) functions = [splitindex, splitview, splitpartition, splitslice] for f in functions: start = time.clock() result = list(f(s)) print '%-16s: %7.5f' % (f.__name__, time.clock()-start) assert result == r From qrczak at knm.org.pl Sat Aug 26 14:41:57 2006 From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk) Date: Sat, 26 Aug 2006 14:41:57 +0200 Subject: [Python-3000] long/int unification In-Reply-To: <20060824232848.1A9F.JCARLSON@uci.edu> (Josiah Carlson's message of "Thu, 24 Aug 2006 23:39:22 -0700") References: <1156470595.44ee57436b03d@www.domainfactory-webmail.de> <20060824232848.1A9F.JCARLSON@uci.edu> Message-ID: <87u03z1xey.fsf@qrnik.zagroda> Josiah Carlson writes: > Also, depending on the objects, one may consider a few other tagged > objects, like perhaps None, True, and False I doubt that it's worth it: they are not dynamically computed anyway, so there is little gain (only avoiding manipulating their refcounts), and the loss is a greater number of special cases when accessing contents of every object. > or even just use 31/63 bits for the tagged integer value, with a 1 > in the lowest bit signifying it as a tagged integer. This is exactly what my compiler of my language does. -- __("< Marcin Kowalczyk \__/ qrczak at knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/ From tjreedy at udel.edu Sat Aug 26 15:26:04 2006 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 26 Aug 2006 09:26:04 -0400 Subject: [Python-3000] Making more effective use of slice objects in Py3k References: <44F0107B.20205@iinet.net.au> Message-ID: "Nick Coghlan" wrote in message news:44F0107B.20205 at iinet.net.au... > I think an enriched slicing model that allows sequence views to be > expressed > easily as "this slice of this sequence" would allow this to be dealt with > cleanly, without requiring every sequence to provide a corresponding > "sequence > view" with non-copying semantics. I think this is promising. I like the potential unification. > Should I make a Py3k PEP for this? I think so ;-) tjr From guido at python.org Sat Aug 26 18:26:48 2006 From: guido at python.org (Guido van Rossum) Date: Sat, 26 Aug 2006 09:26:48 -0700 Subject: [Python-3000] long/int unification In-Reply-To: References: <1156470595.44ee57436b03d@www.domainfactory-webmail.de> <20060824232848.1A9F.JCARLSON@uci.edu> Message-ID: On 8/25/06, Fredrik Lundh wrote: > Josiah Carlson wrote: > > > In the integer case, it reminds me of James Knight's tagged integer > > patch to 2.3 [1]. If using long exclusively is 50% slower, why not try > > the improved speed approach? > > looks like GvR was -1000 on this idea at the time, though... I still am, because it requires extra tests for every incref and decref and also for every use of an object's type pointer. I worry about the cost of these tests, but I worry much more about the bugs it will add when people don't tests first. ABC used this approach and we kept finding bugs due to this problem. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sat Aug 26 18:30:57 2006 From: guido at python.org (Guido van Rossum) Date: Sat, 26 Aug 2006 09:30:57 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <44F0107B.20205@iinet.net.au> References: <44F0107B.20205@iinet.net.au> Message-ID: Can you explain in a sentence or two how these changes would be *used*? Your code examples don't speak for themselves (maybe because It's Saturday morning :-). Short examples of something clumsy and/or slow that we'd have to write today compared to something fast and elegant that we could write after the change woulde be quite helpful. The exact inheritance relationship between slice and [x]range seems a fairly uninteresting details in comparison. --Guido On 8/26/06, Nick Coghlan wrote: > This idea is inspired by the find/rfind string discussion (particularly a > couple of comments from Jim and Ron), but I think the applicability may prove > to be wider than just string methods (e.g. I suspect it may prove useful for > the bytes() type as well). > > Copy-on-slice semantics are by far the easiest semantics to deal with in most > cases, as they result in the fewest nasty surprises. However, they have one > obvious drawback: performance can suffer badly when dealing with large > datasets (copying 10 MB chunks of memory around can take a while!). > > There are a couple of existing workarounds for this: buffer() objects, and the > start/stop arguments to a variety of string methods. Neither of these is > particular convenient to work with, and buffer() is slated to go away in Py3k. > > I think an enriched slicing model that allows sequence views to be expressed > easily as "this slice of this sequence" would allow this to be dealt with > cleanly, without requiring every sequence to provide a corresponding "sequence > view" with non-copying semantics. I think Guido's concern that people will > reach for string views when they don't need them is also valid (as I believe > that it is most often inexperience that leads to premature optimization that > then leads to needless code complexity). > > The specific changes I suggest based on the find/rfind discussion are: > > 1. make range() (what used to be xrange()) a subclass of slice(), so that > range objects can be used to index sequences. The only differences between > range() and slice() would then be that start/stop/step will never be None for > range instances, and range instances act like an immutable sequence while > slice instances do not (i.e. range objects would grow an indices() method). > > 2. change range() and slice() to accept slice() instances as arguments so > that range(range(0)) is equivalent to range(0). (range(x) may throw ValueError > if x.stop is None). > > 3. change API's that currently accept start/stop arguments (like string > methods) to accept a single slice() instance instead (possibly raising > ValueError if step != 1). > > 4. provide an additional string method partition_indices() that returns 3 > range() objects instead of 3 new strings > > The new method would have semantics like: > > def partition_indices(self, sep, limits=None): > if limits is None: > limits = range(0, len(self)) > else: > limits = limits.indices(len(self)) > try: > idxsep = self.index(sep, limits) > except ValueError: > return limits, range(0), range(0) > endsep = idxsep + len(sep) > return (range(limits.start, idxsep), > range(idxsep, endsep), > range(endsep, limits.stop)) > > With partition() itself being equivalent to: > > def partition(self, sep, subseq=None): > before, sep, after = self.partition_indices(sep, subseq) > return self[before], self[sep], self[after] > > Finally, an efficient partition based implementation of the example from > Walter that started the whole discussion about views and the problem with > excessive copying would look like: > > def splitpartition_indices(s): > rest = range(len(s)) > while 1: > prefix, lbrace, rest = s.partition_indices("{", rest) > first, space, rest = s.partition_indices(" ", rest) > second, rbrace, rest = s.partition_indices("}", rest) > if prefix: > yield (None, s[prefix]) > if not (lbrace and space and rbrace): > break > yield (s[first], s[second]) > > (I know the above misses a micro-optimization, in that it calls partition > again on an empty subsequence, even if space or lbrace are False. I believe > doing the three partition calls together makes it much easier to read, and > searching an empty string is pretty quick). > > For comparison, here's the normal copying version that has problems scaling to > large strings: > > def splitpartition(s): > rest = s > while 1: > prefix, lbrace, rest = rest.partition_indices("{") > first, space, rest = rest.partition_indices(" ") > second, rbrace, rest = rest.partition_indices("}") > if prefix: > yield (None, prefix) > if not (lbrace and space and rbrace): > break > yield (first, second) > > Should I make a Py3k PEP for this? > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > --------------------------------------------------------------- > http://www.boredomandlaziness.org > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jcarlson at uci.edu Sat Aug 26 19:00:41 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Sat, 26 Aug 2006 10:00:41 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <44F0107B.20205@iinet.net.au> References: <44F0107B.20205@iinet.net.au> Message-ID: <20060826084138.1AC0.JCARLSON@uci.edu> Nick Coghlan wrote: > > This idea is inspired by the find/rfind string discussion (particularly a > couple of comments from Jim and Ron), but I think the applicability may prove > to be wider than just string methods (e.g. I suspect it may prove useful for > the bytes() type as well). A couple comments... I don't particularly like the idea of using lists (or really iter(list) ), range, or slice objects as defining what indices remain for a particular string operation. It just doesn't seem like the *right* thing to do. > There are a couple of existing workarounds for this: buffer() objects, and the > start/stop arguments to a variety of string methods. Neither of these is > particular convenient to work with, and buffer() is slated to go away in Py3k. Ahh, but string views offer a significantly more reasonable mechanism. string = stringview(string) Now, you can do things like parition(), slicing (with step=1), etc., and all can return further string views. Users don't need to learn a new semantic (pass the sequence of indices). We can toss all of the optional start, stop arguments to all string functions, and replace them with either of the following: result = stringview(string, start=None, stop=None).method(args) string = stringview(string) result = string[start:stop].method(args) Perhaps one of the reasons why I prefer string views over this indices mechanism is because I'm familliar with buffers, the idea of just having a pointer into another structure, etc. It just feels more natural from my 8 years of C and 6 years of Python. - Josiah From jackdied at jackdied.com Sun Aug 27 02:24:04 2006 From: jackdied at jackdied.com (Jack Diederich) Date: Sat, 26 Aug 2006 20:24:04 -0400 Subject: [Python-3000] find -> index patch In-Reply-To: References: <20060824054450.x8w46l05kz488004@login.werra.lunarpages.com> Message-ID: <20060827002404.GG24154@performancedrivers.com> On Thu, Aug 24, 2006 at 03:48:57PM +0200, Fredrik Lundh wrote: > Michael Chermside wrote: > > >> WOW, I love partition. In all the instances that weren't a simple "in" > >> test I ended up using [r]partition. In some cases one of the returned > >> strings gets thrown away but in those cases it is guaranteed to be small. > >> The new code is usually smaller than the old and generally clearer. > > > > Wow. That's just beautiful. This has now convinced me that dumping > > [r]find() (at least!) and pushing people toward using partition will > > result in pain in the short term (of course), and beautiful, readable > > code in the long term. > > note that partition provides an elegant solution to an important *subset* of all > problems addressed by find/index. > > just like lexical scoping vs. default arguments and map vs. list comprehensions, > it doesn't address all problems right out of the box, and shouldn't be advertised > as doing that. > After some benchmarking find() can't go away without really hurting readline() performance. partition performs as well as find for small lines but for large lines the extra copy to concat the newline separator is a killer (twice as slow for 50k char lines). index has the opposite problem as the overhead of setting up a try block makes 50 char lines twice as slow even when the except clause is never triggered. A version of partition that returned two arguments instead of three would solve the problem but that would just be adding more functions to remove the two find's or adding behavior flags to partition. Ick. Most uses of find are better off using partition but if this one case can't be beat there must be others too. -Jack From jimjjewett at gmail.com Sun Aug 27 03:59:25 2006 From: jimjjewett at gmail.com (Jim Jewett) Date: Sat, 26 Aug 2006 21:59:25 -0400 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <20060826084138.1AC0.JCARLSON@uci.edu> References: <44F0107B.20205@iinet.net.au> <20060826084138.1AC0.JCARLSON@uci.edu> Message-ID: On 8/26/06, Josiah Carlson wrote: > Nick Coghlan wrote: > > There are a couple of existing workarounds for > > this: buffer() objects, and the start/stop arguments > > to a variety of string methods. Neither of these is > > particular convenient to work with, and buffer() is > > slated to go away in Py3k. > Ahh, but string views offer a significantly more > reasonable mechanism. As I understand it, Nick is suggesting that slice objects be used as a sequence (not just string) view. > string = stringview(string) > ... We can toss all of the optional start, stop > arguments to all string functions, and replace them > with either of the following: > result = stringview(string, start=None, stop=None).method(args) > string = stringview(string) > result = string[start:stop].method(args) Under Nick's proposal, I believe we could replace it with just the final line. result = string[start:stop].method(args) though there is a chance that (when you want to avoid copying) he is suggesting explicit slice objects such as view=slice(start, stop) result = view(string).method(args) -jJ From jimjjewett at gmail.com Sun Aug 27 04:42:02 2006 From: jimjjewett at gmail.com (Jim Jewett) Date: Sat, 26 Aug 2006 22:42:02 -0400 Subject: [Python-3000] path in py3K Re: [Python-checkins] r51624 - in python/trunk/Lib: genericpath.py macpath.py ntpath.py os2emxpath.py posixpath.py test/test_genericpath.py Message-ID: In Py3K, is it still safe to assume that a list of paths will be (enough like) ordinary strings? I ask because of the various Path object discussions; it wasn't clear that a Path object should be a sequence of (normalized unicode?) characters (rather than path components), that the path would always be normalized or absolute, or even that it would implement the LE (or LT?) comparison operator. -jJ On 8/26/06, jack.diederich wrote: > Author: jack.diederich > Date: Sat Aug 26 20:42:06 2006 > New Revision: 51624 > Added: python/trunk/Lib/genericpath.py > +# Return the longest prefix of all list elements. > +def commonprefix(m): > + "Given a list of pathnames, returns the longest common leading component" > + if not m: return '' > + s1 = min(m) > + s2 = max(m) > + n = min(len(s1), len(s2)) > + for i in xrange(n): > + if s1[i] != s2[i]: > + return s1[:i] > + return s1[:n] From guido at python.org Sun Aug 27 04:51:03 2006 From: guido at python.org (Guido van Rossum) Date: Sat, 26 Aug 2006 19:51:03 -0700 Subject: [Python-3000] find -> index patch In-Reply-To: <20060827002404.GG24154@performancedrivers.com> References: <20060824054450.x8w46l05kz488004@login.werra.lunarpages.com> <20060827002404.GG24154@performancedrivers.com> Message-ID: On 8/26/06, Jack Diederich wrote: > After some benchmarking find() can't go away without really hurting readline() > performance. Can you elaborate? readline() is typically implemented in C so I'm not sure I follow. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sun Aug 27 05:00:05 2006 From: guido at python.org (Guido van Rossum) Date: Sat, 26 Aug 2006 20:00:05 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: References: <44F0107B.20205@iinet.net.au> <20060826084138.1AC0.JCARLSON@uci.edu> Message-ID: On 8/26/06, Jim Jewett wrote: > On 8/26/06, Josiah Carlson wrote: > > Nick Coghlan wrote: > > > > There are a couple of existing workarounds for > > > this: buffer() objects, and the start/stop arguments > > > to a variety of string methods. Neither of these is > > > particular convenient to work with, and buffer() is > > > slated to go away in Py3k. > > > Ahh, but string views offer a significantly more > > reasonable mechanism. > > As I understand it, Nick is suggesting that slice objects be used as a > sequence (not just string) view. I have a hard time parsing this sentence. A slice is an object with three immutable attributes -- start, stop, step. How does this double as a string view? > > string = stringview(string) > > ... We can toss all of the optional start, stop > > arguments to all string functions, and replace them > > with either of the following: > > result = stringview(string, start=None, stop=None).method(args) > > > string = stringview(string) > > result = string[start:stop].method(args) > > Under Nick's proposal, I believe we could replace it with just the final line. I still don't see the transformation of clumsy to elegant. Please give me a complete, specific example instead of a generic code snippet. (Also, please don't use 'string' as a variable name. There's a module by that name that I can't get out of my head.) Maybe the idea is that instead of pos = s.find(t, pos) we would write pos += stringview(s)[pos:].find(t) ??? And how is that easier on the eyes? (And note the need to use += because the sliced view renumbers the positions in the original string.) > result = string[start:stop].method(args) > > though there is a chance that (when you want to avoid copying) he is > suggesting explicit slice objects such as > > view=slice(start, stop) > result = view(string).method(args) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sun Aug 27 05:01:27 2006 From: guido at python.org (Guido van Rossum) Date: Sat, 26 Aug 2006 20:01:27 -0700 Subject: [Python-3000] path in py3K Re: [Python-checkins] r51624 - in python/trunk/Lib: genericpath.py macpath.py ntpath.py os2emxpath.py posixpath.py test/test_genericpath.py In-Reply-To: References: Message-ID: It is not my intention to adopt the Path module in Py3k. On 8/26/06, Jim Jewett wrote: > In Py3K, is it still safe to assume that a list of paths will be > (enough like) ordinary strings? > > I ask because of the various Path object discussions; it wasn't clear > that a Path object should be a sequence of (normalized unicode?) > characters (rather than path components), that the path would always > be normalized or absolute, or even that it would implement the LE (or > LT?) comparison operator. > > -jJ > > On 8/26/06, jack.diederich wrote: > > Author: jack.diederich > > Date: Sat Aug 26 20:42:06 2006 > > New Revision: 51624 > > > Added: python/trunk/Lib/genericpath.py > > > +# Return the longest prefix of all list elements. > > +def commonprefix(m): > > + "Given a list of pathnames, returns the longest common leading component" > > + if not m: return '' > > + s1 = min(m) > > + s2 = max(m) > > + n = min(len(s1), len(s2)) > > + for i in xrange(n): > > + if s1[i] != s2[i]: > > + return s1[:i] > > + return s1[:n] > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jimjjewett at gmail.com Sun Aug 27 05:30:30 2006 From: jimjjewett at gmail.com (Jim Jewett) Date: Sat, 26 Aug 2006 23:30:30 -0400 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: References: <44F0107B.20205@iinet.net.au> <20060826084138.1AC0.JCARLSON@uci.edu> Message-ID: On 8/26/06, Guido van Rossum wrote: > On 8/26/06, Jim Jewett wrote: > > On 8/26/06, Josiah Carlson wrote: > > > Nick Coghlan wrote: > > > > There are a couple of existing workarounds for > > > > this: buffer() objects, and the start/stop > > > > arguments to a variety of string methods. > > > Ahh, but string views offer a significantly more > > > reasonable mechanism. > > As I understand it, Nick is suggesting that slice > > objects be used as a sequence (not just string) > > view. > I have a hard time parsing this sentence. A slice is > an object with three immutable attributes -- start, > stop, step. How does this double as a string view? Poor wording on my part; it is (the application of a slice to a specific sequence) that could act as copyless view. For example, you wanted to keep the rarely used optional arguments to find because of efficiency. s.find(prefix, start, stop) does not copy. If slices were less eager at copying, this could be rewritten as view=slice(start, stop, 1) view(s).find(prefix) or perhaps even as s[start:stop].find(prefix) I'm not sure these look better, but they are less surprising, because they don't depend on optional arguments that most people have forgotten about. > Maybe the idea is that instead of > pos = s.find(t, pos) > we would write > pos += stringview(s)[pos:].find(t) > ??? With stringviews, you wouldn't need to be reindexing from the start of the original string. The idiom would instead be a generalization of "for line in file:" while data: chunk, sep, data = data.partition() but the partition call would not need to copy the entire string; it could simply return three views. Yes, this does risk keeping all of data alive because one chunk was saved. This might be a reasonable tradeoff to avoid the copying. If not, perhaps the gc system could be augmented to shrink bloated views during idle moments. -jJ From jackdied at jackdied.com Sun Aug 27 06:12:27 2006 From: jackdied at jackdied.com (Jack Diederich) Date: Sun, 27 Aug 2006 00:12:27 -0400 Subject: [Python-3000] find -> index patch In-Reply-To: References: <20060824054450.x8w46l05kz488004@login.werra.lunarpages.com> <20060827002404.GG24154@performancedrivers.com> Message-ID: <20060827041227.GJ24154@performancedrivers.com> On Sat, Aug 26, 2006 at 07:51:03PM -0700, Guido van Rossum wrote: > On 8/26/06, Jack Diederich wrote: > > After some benchmarking find() can't go away without really hurting readline() > > performance. > > Can you elaborate? readline() is typically implemented in C so I'm not > sure I follow. > A number of modules in Lib have readline() methods that currently use find(). StringIO, httplib, tarfile, and others sprat:~/src/python-head/Lib# grep 'def readline' *.py | wc -l 30 Mainly I wanted to point out that find() solves a class of problems that can't be solved equally well with partition() (bad for large strings that want to preserve the seperator) or index() (bad for large numbers of small strings and for frequent misses). I wanted to reach the conclusion that find() could be yanked out but as Fredrik opined it is still useful for a subset of problems. -Jack From jcarlson at uci.edu Sun Aug 27 08:08:14 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Sat, 26 Aug 2006 23:08:14 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: References: Message-ID: <20060826230223.1AD6.JCARLSON@uci.edu> "Jim Jewett" wrote: > With stringviews, you wouldn't need to be reindexing from the start of > the original string. The idiom would instead be a generalization of > "for line in file:" > > while data: > chunk, sep, data = data.partition() > > but the partition call would not need to copy the entire string; it > could simply return three views. Also, with a little work, having string views be smart about concatenation (if two views are adjacent to each other, like chunk,sep or sep,data above, view1+view2 -> view3 on the original string), copies could further be minimized, and the earlier problem with readline, etc., can be avoided. - Josiah From jcarlson at uci.edu Sun Aug 27 08:23:38 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Sat, 26 Aug 2006 23:23:38 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: References: <20060826084138.1AC0.JCARLSON@uci.edu> Message-ID: <20060826230846.1AD9.JCARLSON@uci.edu> "Jim Jewett" wrote: > > On 8/26/06, Josiah Carlson wrote: > > Nick Coghlan wrote: > > > > There are a couple of existing workarounds for > > > this: buffer() objects, and the start/stop arguments > > > to a variety of string methods. Neither of these is > > > particular convenient to work with, and buffer() is > > > slated to go away in Py3k. > > > Ahh, but string views offer a significantly more > > reasonable mechanism. > > As I understand it, Nick is suggesting that slice objects be used as a > sequence (not just string) view. I'm not sure there is a compelling use-case for offering views on general ordered sequences (lists). Unicode and bytes strings, sure, but I don't think I've ever really been hurting for faster/more memory efficient list slicing... Maybe I'm strange. - Josiah From ncoghlan at iinet.net.au Sun Aug 27 16:59:24 2006 From: ncoghlan at iinet.net.au (Nick Coghlan) Date: Mon, 28 Aug 2006 00:59:24 +1000 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: References: <44F0107B.20205@iinet.net.au> <20060826084138.1AC0.JCARLSON@uci.edu> Message-ID: <44F1B34C.4020601@iinet.net.au> Jim Jewett wrote: > On 8/26/06, Guido van Rossum wrote: >> On 8/26/06, Jim Jewett wrote: >> > As I understand it, Nick is suggesting that slice >> > objects be used as a sequence (not just string) >> > view. > >> I have a hard time parsing this sentence. A slice is >> an object with three immutable attributes -- start, >> stop, step. How does this double as a string view? > > Poor wording on my part; it is (the application of a slice to a > specific sequence) that could act as copyless view. > > For example, you wanted to keep the rarely used optional arguments to > find because of efficiency. > > s.find(prefix, start, stop) > > does not copy. If slices were less eager at copying, this could be > rewritten as > > view=slice(start, stop, 1) > view(s).find(prefix) > > or perhaps even as > > s[start:stop].find(prefix) > > I'm not sure these look better, but they are less surprising, because > they don't depend on optional arguments that most people have > forgotten about. Actually, string views have nothing to do with what I'm suggesting (although my comments about them in the find/rfind thread were one of the things that fed into this message). I'm actually proposing an *alternative* to string views, because they have a nasty problem with non-local effects. It is easy to pass or return a string view instead of an actual string, and you get something that runs with subtly different semantics from what you expect, but that isn't likely to trigger an obvious error. It also breaks the persistent idiom that "seq[:]" makes a copy (which is true throughout the standard library, even if it isn't true for external number-crunching libraries like NumPy). You also potentially end up with *every* sequence type ending up with a "x-view" counterpart, which is horrible. OTOH, if we make the standard library more consistent in always using a slice or range object anytime it wants to pass or return (start, stop, step) information, it provides a foundation for someone to do their own non-copying versions. So with my musings, the non-copying index operation in a subsection would still use an optional second argument: s.find(prefix, slice(start, stop)) Now, the ultimate extension of this idea would be to permit slice literals in places other than sequence indexing (similar to how Py3k is likely to permit Ellipsis literals outside of subscript expressions). Naturally, parentheses may be needed in order to disambiguate colons: s.find(prefix, (start:stop)) Contrast this with the copying version: s[start:stop].find(prefix) If (start:stop:step) is equivalent to slice(start, stop, step), then slice notation can be used to create ranges: range(start:stop:step) The idea of making slice objects callable, with the result being a view of the original sequence is Jim's, not mine, and I'm not that keen on it (my reservations about string views apply to the more general idea of sequence views, too). Cheers, Nick. P.S. I *will* be doing a PEP to bring this discussion together, but be warned that it may be a week or two before I get to it. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ncoghlan at gmail.com Sun Aug 27 17:28:14 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 28 Aug 2006 01:28:14 +1000 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: References: <44F0107B.20205@iinet.net.au> Message-ID: <44F1BA0E.3040203@gmail.com> Guido van Rossum wrote: > Can you explain in a sentence or two how these changes would be > *used*? Your code examples don't speak for themselves (maybe because > It's Saturday morning :-). Short examples of something clumsy and/or > slow that we'd have to write today compared to something fast and > elegant that we could write after the change woulde be quite helpful. > The exact inheritance relationship between slice and [x]range seems a > fairly uninteresting details in comparison. A more unified model for representing sequence slices makes it practical to offer a non-copying string partitioning method like the version of partition_indices() in my initial message. With the current mixed model (sometimes using xrange(), sometimes using slice(), sometimes using a 3-tuple, sometimes using separate start & stop values), there is no point in offering such a method, as it would be terribly inconvenient to work with regardless of what kind of objects it returned to indicate the 3 portions of the original string: - 3-tuples and xrange() objects can't be used to slice a sequence - 3-tuples and slice() objects can't be usefully tested for truth - none of them can be passed as optional string method arguments I believe the current mixed model is actually an artifact of the transition from simple slicing to extended slicing, albeit one that is significantly less obvious than the deprecated __*slice__ family of special methods. Old style slicing and string methods use separate start and stop values. Extended slicing uses slice objects with start,stop,step attributes (which can be anything, including None). The indices() method of slice objects uses a start,stop,step 3-tuple. Iteration uses either a list of indices (from range()) or xrange objects with start,stop,step attributes (which must be integers). The basic proposal I am making is to reduce this to exactly two concepts: - slice objects, which have arbitrary start, stop, step attributes - range objects, which have indices as start, stop, step attributes, behave like an immutable sequence, and are a subclass of slice All other instances in the core and standard library which use a different representation of a sequence slice (like the optional arguments to string methods, or the result of the indices() method) would change to use one of those two types. The methods of the types would be driven by the needs of the standard library. In addition to reuding the number of concepts to be dealt with from 4 to 2, I believe this would make it much easier to write memory efficient code without having to duplicate entire objects with non-copying versions. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ncoghlan at gmail.com Sun Aug 27 17:37:59 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 28 Aug 2006 01:37:59 +1000 Subject: [Python-3000] find -> index patch In-Reply-To: <20060827041227.GJ24154@performancedrivers.com> References: <20060824054450.x8w46l05kz488004@login.werra.lunarpages.com> <20060827002404.GG24154@performancedrivers.com> <20060827041227.GJ24154@performancedrivers.com> Message-ID: <44F1BC57.7090004@gmail.com> Jack Diederich wrote: > On Sat, Aug 26, 2006 at 07:51:03PM -0700, Guido van Rossum wrote: >> On 8/26/06, Jack Diederich wrote: >>> After some benchmarking find() can't go away without really hurting readline() >>> performance. >> Can you elaborate? readline() is typically implemented in C so I'm not >> sure I follow. >> > > A number of modules in Lib have readline() methods that currently use find(). > StringIO, httplib, tarfile, and others > > sprat:~/src/python-head/Lib# grep 'def readline' *.py | wc -l > 30 > > Mainly I wanted to point out that find() solves a class of problems that > can't be solved equally well with partition() (bad for large strings that > want to preserve the seperator) or index() (bad for large numbers of small > strings and for frequent misses). I wanted to reach the conclusion that > find() could be yanked out but as Fredrik opined it is still useful for a > subset of problems. What about a version of partition that returned a 3-tuple of xrange objects indicating the indices of the partitions, instead of copies of the partitions? That would allow you to use the cleaner idiom without having to suffer the copying performance penalty. Something like: line, newline, rest = s.partition_indices('\n', rest.start, rest.stop) if newline: yield s[line.start:newline.stop] Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From jcarlson at uci.edu Sun Aug 27 17:45:30 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Sun, 27 Aug 2006 08:45:30 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <44F1B34C.4020601@iinet.net.au> References: <44F1B34C.4020601@iinet.net.au> Message-ID: <20060827081644.1ADC.JCARLSON@uci.edu> Nick Coghlan wrote: [snip] > that isn't likely to trigger an obvious error. It also breaks the persistent > idiom that "seq[:]" makes a copy (which is true throughout the standard > library, even if it isn't true for external number-crunching libraries like > NumPy). The copying is easily fixed. I'm also not terribly concerned with the persistance of views, as I expect that most people who bother to use them (and/or care about the efficiency of str.partition, etc.) will know what they are getting themselves into. If they don't, then they will post on python-[list|dev], and we can give them a link to the string view documentation, which will explain what views are and how they can release the references to the original object: ref = str(ref) . > You also potentially end up with *every* sequence type ending up with a > "x-view" counterpart, which is horrible. OTOH, if we make the standard library > more consistent in always using a slice or range object anytime it wants to > pass or return (start, stop, step) information, it provides a foundation for > someone to do their own non-copying versions. I'm not sure your slippery-slope argument holds. So far there are only a few objects for which views have been proposed with any substance: dictionaries, text and byte strings. The removal of buffer from 3.0 does leave an opening for other structures for which views (or even the original buffers) would make sense, like array and mmap, but those each have implementations that could effectively mirror the (mutable) byte string view. As for using slices to define a mechanism for returning view-like objects (it is effectively a different spelling), I don't particularly care for passing around slice/xrange objects. I would also like to mention that there exists external libraries that offer non-copying "views" to their underlying structures, the 'array interface' that was proposed in the last few months being a primary example of a desired standardization of such. > So with my musings, the non-copying index operation in a subsection would > still use an optional second argument: > > s.find(prefix, slice(start, stop)) This reduces the number of optional arguments by 1, and requires the somewhat explicit spelling out of the slice creation (which you attempt to remove via various syntax changes). I'm not sure these are actual improvements to either the string (or otherwise) API, or to the general sequence API. > If (start:stop:step) is equivalent to slice(start, stop, step), then slice > notation can be used to create ranges: range(start:stop:step) That looks like the integer slicing PEP that was rejected. Also, no one has been severely restricted by syntax; one could easily write a specialized object so that "for i in range[start:stop:step]" 'does the right thing'. - Josiah From guido at python.org Sun Aug 27 17:50:39 2006 From: guido at python.org (Guido van Rossum) Date: Sun, 27 Aug 2006 08:50:39 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: References: <44F0107B.20205@iinet.net.au> <20060826084138.1AC0.JCARLSON@uci.edu> Message-ID: On 8/26/06, Jim Jewett wrote: > > > As I understand it, Nick is suggesting that slice > > > objects be used as a sequence (not just string) > > > view. > > > I have a hard time parsing this sentence. A slice is > > an object with three immutable attributes -- start, > > stop, step. How does this double as a string view? > > Poor wording on my part; it is (the application of a slice to a > specific sequence) that could act as copyless view. > > For example, you wanted to keep the rarely used optional arguments to > find because of efficiency. I don't believe they are rarely used. They are (currently) essential for code that searches a long string for a short substring repeatedly. If you believe that is a rare use case, why bother coming up with a whole new language feature to support it? > s.find(prefix, start, stop) > > does not copy. That's still really poor wording. If you want to make your case you should take more time explaining it right. > If slices were less eager at copying, this could be > rewritten as > > view=slice(start, stop, 1) > view(s).find(prefix) Now you're postulating that calling a slice will take a slice of an object? Any object? And how is that supposed to work for arbitrary objects? I would think that it ought to be a method on the string object -- surely a view on a string will have to be a different type of object than a few on a list and that ought to be different again from a view on a unicode string. Also you're postulating that the slice object somehow has the same methods as the thing it slices? How are you expecting to implement that? (Don't tell me that you haven't thought about implementation yet. Without a plan implementation there is no feature.) > or perhaps even as > > s[start:stop].find(prefix) That will never fly. NumPy may get away with non-copying slices, but for built-in objects this would be too big of a departure of current practice. (If you don't stop about this I'll have to add it to PEP 3099. :-) > I'm not sure these look better, but they are less surprising, because > they don't depend on optional arguments that most people have > forgotten about. Because they're not that important except to the few people who really need the optimization. Also they're easily looked up. > > Maybe the idea is that instead of > > > pos = s.find(t, pos) > > > we would write > > > pos += stringview(s)[pos:].find(t) > > > ??? > > With stringviews, you wouldn't need to be reindexing from the start of > the original string. The idiom would instead be a generalization of > "for line in file:" > > while data: > chunk, sep, data = data.partition() > > but the partition call would not need to copy the entire string; it > could simply return three views. That depends. I can imagine situations where the indices are needed regardless of how you code it. > Yes, this does risk keeping all of data alive because one chunk was > saved. This might be a reasonable tradeoff to avoid the copying. If > not, perhaps the gc system could be augmented to shrink bloated views > during idle moments. Keep dreaming on. it really seems you have no clue about implementation issues; you just keep postulating random solutions whenever you're faced with an objection. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sun Aug 27 17:55:05 2006 From: guido at python.org (Guido van Rossum) Date: Sun, 27 Aug 2006 08:55:05 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <20060826230223.1AD6.JCARLSON@uci.edu> References: <20060826230223.1AD6.JCARLSON@uci.edu> Message-ID: On 8/26/06, Josiah Carlson wrote: > > "Jim Jewett" wrote: > > With stringviews, you wouldn't need to be reindexing from the start of > > the original string. The idiom would instead be a generalization of > > "for line in file:" > > > > while data: > > chunk, sep, data = data.partition() > > > > but the partition call would not need to copy the entire string; it > > could simply return three views. > > Also, with a little work, having string views be smart about > concatenation (if two views are adjacent to each other, like chunk,sep > or sep,data above, view1+view2 -> view3 on the original string), copies > could further be minimized, and the earlier problem with readline, etc., > can be avoided. But this assumes that string views are 99.999% indiscernible from regular strings -- if operations can return a copy or a view depending on how things happen to be laid out in memory, It should be trivial to write code that doesn't care whether it gets a string or a view. This works for strings (which are immutable) but these semantics are unacceptable for mutable objects -- another reason to doubt that it makes sense to generalize the idea of views to all sequences, or to involve a change to the slice object in the design. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sun Aug 27 18:08:09 2006 From: guido at python.org (Guido van Rossum) Date: Sun, 27 Aug 2006 09:08:09 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <44F1BA0E.3040203@gmail.com> References: <44F0107B.20205@iinet.net.au> <44F1BA0E.3040203@gmail.com> Message-ID: On 8/27/06, Nick Coghlan wrote: > Guido van Rossum wrote: > > Can you explain in a sentence or two how these changes would be > > *used*? Your code examples don't speak for themselves (maybe because > > It's Saturday morning :-). Short examples of something clumsy and/or > > slow that we'd have to write today compared to something fast and > > elegant that we could write after the change woulde be quite helpful. > > The exact inheritance relationship between slice and [x]range seems a > > fairly uninteresting details in comparison. > > A more unified model for representing sequence slices makes it practical to > offer a non-copying string partitioning method like the version of > partition_indices() in my initial message. Which I still don't understand. (Because you give code but no docstring or rationale, and are assuming some unspecified changes to other things as well.) > With the current mixed model > (sometimes using xrange(), sometimes using slice(), sometimes using a 3-tuple, > sometimes using separate start & stop values), I don't recall xrange() being used anywhere except in for-loops. I don't know of any use of 3-tuples, though the re API uses 2-tuples consistently. > there is no point in offering > such a method, as it would be terribly inconvenient to work with regardless of > what kind of objects it returned to indicate the 3 portions of the original > string: > > - 3-tuples and xrange() objects can't be used to slice a sequence > - 3-tuples and slice() objects can't be usefully tested for truth > - none of them can be passed as optional string method arguments > > I believe the current mixed model is actually an artifact of the transition > from simple slicing to extended slicing, Really? Extended slicing mostly meant adding a third "step" option to the slice syntax, which is useful for NumPy but completely pointless for string searches as we're discussing here. The slice() object was invented as an API hack so that we didn't have to add new special methods. > albeit one that is significantly less > obvious than the deprecated __*slice__ family of special methods. Old style > slicing and string methods use separate start and stop values. Extended > slicing uses slice objects with start,stop,step attributes (which can be > anything, including None). The indices() method of slice objects uses a > start,stop,step 3-tuple. Iteration uses either a list of indices (from > range()) or xrange objects with start,stop,step attributes (which must be > integers). It was always my intention to keep slice objects limited to NumPy apps and the rare application of extended slicing in regular Python. > The basic proposal I am making is to reduce this to exactly two concepts: > - slice objects, which have arbitrary start, stop, step attributes > - range objects, which have indices as start, stop, step attributes, behave > like an immutable sequence, and are a subclass of slice And you still haven't explained how this is going to make life easier. I keep asking for concrete examples and you keep answering in generalities. This is an annoying disconnect. > All other instances in the core and standard library which use a different > representation of a sequence slice (like the optional arguments to string > methods, or the result of the indices() method) would change to use one of > those two types. The methods of the types would be driven by the needs of the > standard library. What's the indices() method? In many cases it doesn'ts eem to make a lot of sense to return a slice object, since it doesn't convey more information than a single index (given that the string being searched for is known -- we're not searching regular expressions here but literal substrings). > In addition to reducing the number of concepts to be dealt with from 4 to 2, I > believe this would make it much easier to write memory efficient code without > having to duplicate entire objects with non-copying versions. Write the PEP and make sure it is plentiful of examples of old and new ways of doing common string operations. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jcarlson at uci.edu Sun Aug 27 18:52:50 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Sun, 27 Aug 2006 09:52:50 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: References: <20060826230223.1AD6.JCARLSON@uci.edu> Message-ID: <20060827091000.1ADF.JCARLSON@uci.edu> "Guido van Rossum" wrote: > > On 8/26/06, Josiah Carlson wrote: > > > > "Jim Jewett" wrote: > > > With stringviews, you wouldn't need to be reindexing from the start of > > > the original string. The idiom would instead be a generalization of > > > "for line in file:" > > > > > > while data: > > > chunk, sep, data = data.partition() > > > > > > but the partition call would not need to copy the entire string; it > > > could simply return three views. > > > > Also, with a little work, having string views be smart about > > concatenation (if two views are adjacent to each other, like chunk,sep > > or sep,data above, view1+view2 -> view3 on the original string), copies > > could further be minimized, and the earlier problem with readline, etc., > > can be avoided. > > But this assumes that string views are 99.999% indiscernible from > regular strings -- if operations can return a copy or a view depending > on how things happen to be laid out in memory, It should be trivial to > write code that doesn't care whether it gets a string or a view. That's what I'm working towards. Let us say for a moment that the only view that was on the table was the string view: view = stringview(st[, start[, stop]]) If st is a string, it produces a view on that string. If st is a stringview already, it references the original string (removing tree persistance[1]). After a view is created, it can be treated like a string for (effectively) everything because it has an Py_UNICODE* that has already been adjusted to handle the offset argument. Its implementation would require copying the PyUnicodeObject struct, adding one more field: PyUnicodeObject* orig_object; This would point to the original object for the later Py_DECREF (when the view is destroyed), view creation (again, we don't want tree persistance), etc. We can easily discover the 'start' offset again by comparing the view->str and the orig_object->str pointers. Optimizations like 'adding properly ordered adjacent string views returns a new view', 'views over fewer than X bytes are string copies', etc., could be added later with (hopefully) little trouble. > This works for strings (which are immutable) but these semantics are > unacceptable for mutable objects -- another reason to doubt that it > makes sense to generalize the idea of views to all sequences, or to > involve a change to the slice object in the design. I think the whole slice object thing is complete nonsense. On the other hand, I think that just like buffers are verifying the object that they are buffering every time they are accessed, mutable bytes string, array, and mmap views could do the same. After they are verified, they can generally be used the same, but it may take some discussion as to whether certain operations are allowed, and/or what their semantics are. Things like: view = arrayview(arr, 1, -1) del view[1:-1] A convenient semantic (from the Python side of things) is to do as buffer does now and only allow them to be read-only. I'm also not terribly convinced about general sequence views, but for objects in which buffer(obj) returns something useful, I can see specialized views for them making at least some sense. I am cautious about pushing for all of them because implementing views for all would be a pain. Choosing one (like bytes) would take some effort, but could easily be pushed back to 3.1 or 3.2 and be done by someone who really wants them. - Josiah [1] When I say "tree persistance", I mean those cases like a -> b -> c, where view b persist because view a persists, even though b doesn't have a reference otherwise. Making both views a and b reference c directly allows for b to be freed when it is no longer used. From jack at psynchronous.com Sun Aug 27 19:05:50 2006 From: jack at psynchronous.com (Jack Diederich) Date: Sun, 27 Aug 2006 13:05:50 -0400 Subject: [Python-3000] find -> index patch In-Reply-To: <44F1BC57.7090004@gmail.com> References: <20060824054450.x8w46l05kz488004@login.werra.lunarpages.com> <20060827002404.GG24154@performancedrivers.com> <20060827041227.GJ24154@performancedrivers.com> <44F1BC57.7090004@gmail.com> Message-ID: <20060827170550.GK24154@performancedrivers.com> On Mon, Aug 28, 2006 at 01:37:59AM +1000, Nick Coghlan wrote: > Jack Diederich wrote: > > On Sat, Aug 26, 2006 at 07:51:03PM -0700, Guido van Rossum wrote: > >> On 8/26/06, Jack Diederich wrote: > >>> After some benchmarking find() can't go away without really hurting readline() > >>> performance. > >> Can you elaborate? readline() is typically implemented in C so I'm not > >> sure I follow. > >> > > > > A number of modules in Lib have readline() methods that currently use find(). > > StringIO, httplib, tarfile, and others > > > > sprat:~/src/python-head/Lib# grep 'def readline' *.py | wc -l > > 30 > > > > Mainly I wanted to point out that find() solves a class of problems that > > can't be solved equally well with partition() (bad for large strings that > > want to preserve the seperator) or index() (bad for large numbers of small > > strings and for frequent misses). I wanted to reach the conclusion that > > find() could be yanked out but as Fredrik opined it is still useful for a > > subset of problems. > > What about a version of partition that returned a 3-tuple of xrange objects > indicating the indices of the partitions, instead of copies of the partitions? > That would allow you to use the cleaner idiom without having to suffer the > copying performance penalty. > > Something like: > > line, newline, rest = s.partition_indices('\n', rest.start, rest.stop) > if newline: > yield s[line.start:newline.stop] > What is with the sudden rush to solve all problems by using slice objects? I've never used a slice object and I don't care to start now. The above code reads just fine as i = s.find('\n', start, stop) if i >= 0: yield s[:i] -Jack From guido at python.org Sun Aug 27 23:17:12 2006 From: guido at python.org (Guido van Rossum) Date: Sun, 27 Aug 2006 14:17:12 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <20060827091000.1ADF.JCARLSON@uci.edu> References: <20060826230223.1AD6.JCARLSON@uci.edu> <20060827091000.1ADF.JCARLSON@uci.edu> Message-ID: On 8/27/06, Josiah Carlson wrote: > [1] When I say "tree persistance", I mean those cases like a -> b -> c, > where view b persist because view a persists, even though b doesn't have > a reference otherwise. Making both views a and b reference c directly > allows for b to be freed when it is no longer used. Yeah, but you're still keeping c alive, which is the real memory waste. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sun Aug 27 23:18:13 2006 From: guido at python.org (Guido van Rossum) Date: Sun, 27 Aug 2006 14:18:13 -0700 Subject: [Python-3000] find -> index patch In-Reply-To: <20060827170550.GK24154@performancedrivers.com> References: <20060824054450.x8w46l05kz488004@login.werra.lunarpages.com> <20060827002404.GG24154@performancedrivers.com> <20060827041227.GJ24154@performancedrivers.com> <44F1BC57.7090004@gmail.com> <20060827170550.GK24154@performancedrivers.com> Message-ID: On 8/27/06, Jack Diederich wrote: > What is with the sudden rush to solve all problems by using slice objects?> I've never used a slice object and I don't care to start now. The above code > reads just fine as > > i = s.find('\n', start, stop) > if i >= 0: > yield s[:i] Hear, hear. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jimjjewett at gmail.com Mon Aug 28 01:38:08 2006 From: jimjjewett at gmail.com (Jim Jewett) Date: Sun, 27 Aug 2006 19:38:08 -0400 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: References: <44F0107B.20205@iinet.net.au> <20060826084138.1AC0.JCARLSON@uci.edu> Message-ID: On 8/27/06, Guido van Rossum wrote: > On 8/26/06, Jim Jewett wrote: > > For example, you wanted to keep the rarely used optional arguments to > > find because of efficiency. > I don't believe they are rarely used. They are (currently) essential > for code that searches a long string for a short substring repeatedly. > If you believe that is a rare use case, why bother coming up with a > whole new language feature to support it? I believe that a fair amount of code already does the copying inline; suppporting it in the runtime means that copying code becomes more efficient, and shortcutting code becomes less unusual. > > If slices were less eager at copying, this could be > > rewritten as > > view=slice(start, stop, 1) > > view(s).find(prefix) > Now you're postulating that calling a slice will take a slice of an > object? Yes. > Any object? And how is that supposed to work for arbitrary > objects? For non-iterables, it will raise a TypeError. > I would think that it ought to be a method on the string > object Restricting it to a few types including string might make sense. > Also you're postulating that the slice object somehow has the > same methods as the thing it slices? Rather, the value returned by calling the slice on a specific string. (I tend to think of this as a "slice of" the string, but as you've pointed out, "slice object" technically refers to the object specifying how/where to cut.) > How are you expecting to implement that? I had expected to implement it as a (string) view, which is why I don't quite understand the distinction Nick and Josiah are making. > But this assumes that string views are 99.999% indiscernible from > regular strings Yes; instead of assuming that a string's data starts n bytes after the object's own pointer, it will instead be located at a (possibly zero) offset. No visible difference to python code; the difference between -> and . for C code. (And this indirection is already used by unicode objects.) > That will never fly. NumPy may get away with non-copying slices, but > for built-in objects this would be too big of a departure of current > practice. (If you don't stop about this I'll have to add it to PEP > 3099. :-) That's unfortunate, but if you're sure, maybe it should go in PEP 3099. > > Yes, this does risk keeping all of data alive because one chunk was > > saved. This might be a reasonable tradeoff to avoid the copying. If > > not, perhaps the gc system could be augmented to shrink bloated views > > during idle moments. > Keep dreaming on. it really seems you have no clue about > implementation issues; you just keep postulating random solutions > whenever you're faced with an objection. I had thought the problem was more about whether or not it was a good idea; the tradeoff might be OK, or at least less bad than the complication of fixing it. As one implementation of fixing it, in today's garbage collection, http://svn.python.org/view/python/trunk/Modules/gcmodule.c?rev=46244&view=markup function collect, surviving objects are moved to the next generation with gc_list_merge(young, old); before merging, the young list could be traversed, and any object whose type has a __condense__ method would get it called. The strview type's __condense__ method would be the C equivalent of if len(self.src) <= 200: return # Src object too small to be worth recovering if (len(self) * refcounts(src)) >= len(self.src): return # Src object used enough to be worth keeping self.src=str(src) # Create a new data buffer, with no extra chars. (Sent in python because the commented C was several times as long, even before checking with compiler.) As to whether a __condense method is a good idea, whether it should really be tied that closely to garbage collection, whether it should be limited to C implementations ... that I'm not so sure of. -jJ From tdelaney at avaya.com Mon Aug 28 01:52:08 2006 From: tdelaney at avaya.com (Delaney, Timothy (Tim)) Date: Mon, 28 Aug 2006 09:52:08 +1000 Subject: [Python-3000] Making more effective use of slice objects in Py3k Message-ID: <2773CAC687FD5F4689F526998C7E4E5F0743D0@au3010avexu1.global.avaya.com> Jim Jewett wrote: > s[start:stop].find(prefix) No matter what, I really think the obj[start:stop:step] syntax needs to be consistent in its behaviour - either returning a copy or a view - and that that behaviour be to return a copy. I'm not at all in favour of sometimes getting a copy, and sometimes getting a view. As a bit of an out-there and very premature suggestion ... For when/*if* views ever become considered to be a good thing for builtin classes, etc, may I suggest that the following syntax be reserved for view creation: obj{start:stop:step} mapping to something like: def __view__(self, slice) So if you really want a string view, use: s{1:2} instead of: s[1:2] I don't *think* the syntax is currently legal, and I don't think it could ever be ambiguous - anyone think of a case where it could be? Tim Delaney From jimjjewett at gmail.com Mon Aug 28 02:00:08 2006 From: jimjjewett at gmail.com (Jim Jewett) Date: Sun, 27 Aug 2006 20:00:08 -0400 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <2773CAC687FD5F4689F526998C7E4E5F0743D0@au3010avexu1.global.avaya.com> References: <2773CAC687FD5F4689F526998C7E4E5F0743D0@au3010avexu1.global.avaya.com> Message-ID: On 8/27/06, Delaney, Timothy (Tim) wrote: > Jim Jewett wrote: > > s[start:stop].find(prefix) > No matter what, I really think the obj[start:stop:step] > syntax needs to be consistent in its behaviour - either > returning a copy or a view - Does it still matter if we're looking only at immutable sequences, such as text? -jJ From tdelaney at avaya.com Mon Aug 28 02:24:41 2006 From: tdelaney at avaya.com (Delaney, Timothy (Tim)) Date: Mon, 28 Aug 2006 10:24:41 +1000 Subject: [Python-3000] Making more effective use of slice objects in Py3k Message-ID: <2773CAC687FD5F4689F526998C7E4E5F0743D1@au3010avexu1.global.avaya.com> Jim Jewett wrote: > On 8/27/06, Delaney, Timothy (Tim) wrote: >> Jim Jewett wrote: > >>> s[start:stop].find(prefix) > >> No matter what, I really think the obj[start:stop:step] >> syntax needs to be consistent in its behaviour - either >> returning a copy or a view - > > Does it still matter if we're looking only at immutable sequences, > such as text? Actually, yes. I think it should be an explicit operation to say "I'm taking a small view of this large string, which will result in the large string existing until the view goes away". Currently the way to do that is to have a method. I'm simply proposing that we reserve syntax that is currently not used to prevent it from being used for another, less appropriate usage. It may never be used at all. Tim Delaney From guido at python.org Mon Aug 28 03:58:52 2006 From: guido at python.org (Guido van Rossum) Date: Sun, 27 Aug 2006 18:58:52 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: References: <44F0107B.20205@iinet.net.au> <20060826084138.1AC0.JCARLSON@uci.edu> Message-ID: On 8/27/06, Jim Jewett wrote: > On 8/27/06, Guido van Rossum wrote: > > On 8/26/06, Jim Jewett wrote: > > > > For example, you wanted to keep the rarely used optional arguments to > > > find because of efficiency. > > > I don't believe they are rarely used. They are (currently) essential > > for code that searches a long string for a short substring repeatedly. > > If you believe that is a rare use case, why bother coming up with a > > whole new language feature to support it? > > I believe that a fair amount of code already does the copying inline; > suppporting it in the runtime means that copying code becomes more > efficient, and shortcutting code becomes less unusual. We're not making progress here. Your beliefs against my beliefs isn't helpful. Do you have proof that there is code out that that's inefficient and for which it would *matter* if it became faster? > > > If slices were less eager at copying, this could be > > > rewritten as > > > > view=slice(start, stop, 1) > > > view(s).find(prefix) > > > Now you're postulating that calling a slice will take a slice of an > > object? > > Yes. I'd rather see an explicit method call. Using "call" as an operation means no other operation can use the same syntax (on the same objects, of course); you have to be very sure that there won't be another use of "call" that would be more useful. > > Any object? And how is that supposed to work for arbitrary > > objects? > > For non-iterables, it will raise a TypeError. Duh. I meant for other iterables, like tuples and lists. I'm asking if you expect that asking for a view on a previously unknown sequence should return a view on that sequence that behaves just like the underlying object, and how you are thinking of pulling off that feat. My claim is that you can't. You need full cooperation of the underlying object to support views. You could attempt to automatically provide wrappers for all methods, but since you don't know which of the parameters or return values represent indices and which don't, you can't do anything useful. Suppose I have a list [1, 2, 3, 1, 2, 3]. Suppose you don't have built-in knowledge of a list (otherwise I'll substitute some other object that you don't have built-in knowledge of). Now suppose you have a view v on the last half of that list, and you ask for v.count(1). This of course should return 1. But how to do this unless you how the count() method is implemented on the underlying object type? > > I would think that it ought to be a method on the string > > object > > Restricting it to a few types including string might make sense. Yes please. Without that your proposal is dead in the water. (With it likely too, but for different reasons.) > > Also you're postulating that the slice object somehow has the > > same methods as the thing it slices? > > Rather, the value returned by calling the slice on a specific string. > (I tend to think of this as a "slice of" the string, but as you've > pointed out, "slice object" technically refers to the object > specifying how/where to cut.) And remember, calling buffer() on a unicode object is not a useful operation unless you're interesting in the underlying bytes. > > How are you expecting to implement that? > > I had expected to implement it as a (string) view, which is why I > don't quite understand the distinction Nick and Josiah are making. Well maybe you don't quite understand your own proposal either. :-) > > But this assumes that string views are 99.999% indiscernible from > > regular strings > > Yes; instead of assuming that a string's data starts n bytes after the > object's own pointer, it will instead be located at a (possibly zero) > offset. No visible difference to python code; the difference between > -> and . for C code. (And this indirection is already used by unicode > objects.) Only because their original draft design had a kind of views. I expect they had good reasons to rip out that part... > > That will never fly. NumPy may get away with non-copying slices, but > > for built-in objects this would be too big of a departure of current > > practice. (If you don't stop about this I'll have to add it to PEP > > 3099. :-) > > That's unfortunate, but if you're sure, maybe it should go in PEP 3099. Ask any Python developer. Slices of mutable objects make copies except in NumPy. > > > Yes, this does risk keeping all of data alive because one chunk was > > > saved. This might be a reasonable tradeoff to avoid the copying. If > > > not, perhaps the gc system could be augmented to shrink bloated views > > > during idle moments. > > > Keep dreaming on. it really seems you have no clue about > > implementation issues; you just keep postulating random solutions > > whenever you're faced with an objection. > > I had thought the problem was more about whether or not it was a good > idea; the tradeoff might be OK, or at least less bad than the > complication of fixing it. It's only a good idea if it works. Details matter. > As one implementation of fixing it, in today's garbage collection, > http://svn.python.org/view/python/trunk/Modules/gcmodule.c?rev=46244&view=markup > function collect, surviving objects are moved to the next generation > with gc_list_merge(young, old); before merging, the young list could > be traversed, and any object whose type has a __condense__ method > would get it called. The strview type's __condense__ method would be > the C equivalent of > > if len(self.src) <= 200: > return # Src object too small to be worth recovering > if (len(self) * refcounts(src)) >= len(self.src): > return # Src object used enough to be worth keeping > self.src=str(src) # Create a new data buffer, with no extra chars. > > (Sent in python because the commented C was several times as long, > even before checking with compiler.) As to whether a __condense > method is a good idea, whether it should really be tied that closely > to garbage collection, whether it should be limited to C > implementations ... that I'm not so sure of. It's up to you to show that this doesn't completely kill performance. It would take a lot of measurements. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jcarlson at uci.edu Mon Aug 28 04:20:42 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Sun, 27 Aug 2006 19:20:42 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <2773CAC687FD5F4689F526998C7E4E5F0743D1@au3010avexu1.global.avaya.com> References: <2773CAC687FD5F4689F526998C7E4E5F0743D1@au3010avexu1.global.avaya.com> Message-ID: <20060827191547.1AEB.JCARLSON@uci.edu> "Delaney, Timothy (Tim)" wrote: > > Jim Jewett wrote: > > > On 8/27/06, Delaney, Timothy (Tim) wrote: > >> Jim Jewett wrote: > > > >>> s[start:stop].find(prefix) > > > >> No matter what, I really think the obj[start:stop:step] > >> syntax needs to be consistent in its behaviour - either > >> returning a copy or a view - > > > > Does it still matter if we're looking only at immutable sequences, > > such as text? > > Actually, yes. I think it should be an explicit operation to say "I'm > taking a small view of this large string, which will result in the large > string existing until the view goes away". > > Currently the way to do that is to have a method. I'm simply proposing > that we reserve syntax that is currently not used to prevent it from > being used for another, less appropriate usage. It may never be used at > all. In what I have been attempting to propose, no text methods would ever return a view. If one wants a view of text, one needs to manually construct the view via 'view = textview(st, start, stop)' or some equivalent spelling. After that, any operations on a view returns views (with a few exceptions, like steps != 1). The seemingly proposed textobj(start:stop) returning a view is not terribly intuitive, as () and [] aren't so terribly different from each other to not confuse someone initially. Never mind that it would be a syntax addition for the equivalent of a small subset of operations on currently existing objects. - Josiah From greg.ewing at canterbury.ac.nz Mon Aug 28 04:20:18 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 28 Aug 2006 14:20:18 +1200 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: References: <44F0107B.20205@iinet.net.au> <20060826084138.1AC0.JCARLSON@uci.edu> Message-ID: <44F252E2.4080700@canterbury.ac.nz> Jim Jewett wrote: > On 8/27/06, Guido van Rossum wrote: > > Any object? And how is that supposed to work for arbitrary > > objects? > > For non-iterables, it will raise a TypeError. I think the question was what benefit would there be in a general slice-view object which knew nothing about the internal structure of the thing it's viewing. The benefits of the string views we're talking about hinge on the fact that they're special-purpose and know how to get directly at the bytes of the underlying string. -- Greg From tdelaney at avaya.com Mon Aug 28 04:26:40 2006 From: tdelaney at avaya.com (Delaney, Timothy (Tim)) Date: Mon, 28 Aug 2006 12:26:40 +1000 Subject: [Python-3000] Making more effective use of slice objects in Py3k Message-ID: <2773CAC687FD5F4689F526998C7E4E5FF1E921@au3010avexu1.global.avaya.com> Josiah Carlson wrote: > The seemingly proposed textobj(start:stop) returning a view is not > terribly intuitive, as () and [] aren't so terribly different from > each other to not confuse someone initially. Nor {} as I proposed for that matter ;) Tim Delaney From greg.ewing at canterbury.ac.nz Mon Aug 28 04:32:54 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 28 Aug 2006 14:32:54 +1200 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <20060827191547.1AEB.JCARLSON@uci.edu> References: <2773CAC687FD5F4689F526998C7E4E5F0743D1@au3010avexu1.global.avaya.com> <20060827191547.1AEB.JCARLSON@uci.edu> Message-ID: <44F255D6.2060002@canterbury.ac.nz> Josiah Carlson wrote: > If one wants a view of text, one needs to manually > construct the view via 'view = textview(st, start, stop)' or some > equivalent spelling. After that, any operations on a view returns views Given Guido's sensitivity about potential misuses of views, it might be better if operations on views *didn't* return views, so that you would have to be explicit about creating views at all stages. -- Greg From jcarlson at uci.edu Mon Aug 28 04:43:36 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Sun, 27 Aug 2006 19:43:36 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: References: <20060827091000.1ADF.JCARLSON@uci.edu> Message-ID: <20060827184941.1AE8.JCARLSON@uci.edu> "Guido van Rossum" wrote: > > On 8/27/06, Josiah Carlson wrote: > > [1] When I say "tree persistance", I mean those cases like a -> b -> c, > > where view b persist because view a persists, even though b doesn't have > > a reference otherwise. Making both views a and b reference c directly > > allows for b to be freed when it is no longer used. > > Yeah, but you're still keeping c alive, which is the real memory waste. It depends on the application. 1. Let us say I was parsing XML. Rather than allocating a bunch of small strings for the various tags, attributes, and data, I could instead allocate a bunch of string views with pointers into the one larger XML string. Because all of the views are the same size, we can use a free list and optimize allocation, deallocation, etc. Small strings, on the other hand, can't have such optimizations, and we would end up fragmentinh memory over a long series of XML parsings (possibly leading to an eventual MemoryError). Even better, if the underlying parsing mechanism expects to recieve a string, and we pass it a string view instead, then with the proper string+view implementation, it wouldn't ever need to know that it is working on views, it would just work, and we would recieve the parsing with views instead of sliced strings. 2.Another example is the parsing of email or any other [header, blank line, body] structured data (and even mime-like headers). Say you have read in a single email, you can have a view (or views) of the various headers, with the multipart body, etc., and wouldn't need to copy anything. Never mind that one could easily handle the insertion of headers, body portions, etc., all without slicing the original (possibly large) email, allowing for the easy manipulation of data with little memory overhead. Heck, one could even read in an entire mbox-formatted file, pull out all of the original emails, rearrange them (resort folder by sent date/recieved time), and write them back to disk, again without ever slicing up the original mailbox file, resulting in roughly 1/2 the memory overhead of an equivalent operation using string slicing. 3. In the 2.x byte string case (str not unicode), we have seen with the various str.find() to str.partition() that chopping up data isn't uncommon, and that generally most pieces are used, meaning that the equivalent memory use of the original string is going to persist in memory anyways. Also, I would just like to state that I am not advocating the automatic creation of views depending on string operations, one should always construct the views explicitly, with something like view = stringview(st). Then the operations on the view should return further views and perhaps occasionally strings, but operations on strings should never return views. --- Speaking of the 2.x byte strings and using str.partition() in 3.x, if 2.x strings are going away in 3.x, shouldn't we be either transitioning everything to using bytes or unicode? Initial translation of the standard library to use partition/index seems like a huge time investment, unless it is planned on being backported to the trunk for 2.6 . Which reminds me, on August 28, 2005, Raymond sent me an initial patch for a find -> partition patch for the full 2.5 standard library at the time. I can provide everyone with that patch along with my comments, which may or may not be enough to transition most of the standard library today. - Josiah From jcarlson at uci.edu Mon Aug 28 04:45:25 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Sun, 27 Aug 2006 19:45:25 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <2773CAC687FD5F4689F526998C7E4E5FF1E921@au3010avexu1.global.avaya.com> References: <2773CAC687FD5F4689F526998C7E4E5FF1E921@au3010avexu1.global.avaya.com> Message-ID: <20060827194428.1AEE.JCARLSON@uci.edu> "Delaney, Timothy (Tim)" wrote: > > Josiah Carlson wrote: > > > The seemingly proposed textobj(start:stop) returning a view is not > > terribly intuitive, as () and [] aren't so terribly different from > > each other to not confuse someone initially. > > Nor {} as I proposed for that matter ;) I can't really see the difference between () and {} when they are on their own with the font I'm using for email. Yeah, that's not good either. - Josiah From jcarlson at uci.edu Mon Aug 28 10:00:55 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Mon, 28 Aug 2006 01:00:55 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <44F255D6.2060002@canterbury.ac.nz> References: <20060827191547.1AEB.JCARLSON@uci.edu> <44F255D6.2060002@canterbury.ac.nz> Message-ID: <20060827214348.1AF4.JCARLSON@uci.edu> Greg Ewing wrote: > Josiah Carlson wrote: > > If one wants a view of text, one needs to manually > > construct the view via 'view = textview(st, start, stop)' or some > > equivalent spelling. After that, any operations on a view returns views > > Given Guido's sensitivity about potential misuses of > views, it might be better if operations on views > *didn't* return views, so that you would have to be > explicit about creating views at all stages. If every operation on a view returned a string copy, then what would be the point of the view in the first place? An alias for Python 2.x buffer()? No, that would be silly. As I see it, the point of string/text views is: 1. Remove all start, stop optional arguments from all string methods, replacing them with view slicing; resulting in generally improved call performance by the second or third operation on the original string. 2. Reduce memory use and fragmentation of common operations (like... while rest: prev, found, rest = rest.partition(sep) ) by performing those operations on views. 3. Reduce execution time of slicing or slicing-like operations by performing them on views (prev, found, rest = rest.partition(sep)). Note that with 2 and 3, it doesn't matter how much or little you 'slice' from the view, the slicing and/or creation of new views referencing the original string is a constant time operation every time. By making view.oper() always return strings instead of views, it makes #1 the only reason for views, even though #2 and #3 are also important and valid motivators. I would also like to point out that it would make the oft-cited partition example "while rest: first, found, rest = rest.partition(sep)" run in linear rather than quadratic time, where users will be pleasantly surprised about improvement in speed (or the lack of a reduction in speed). - Josiah From p.f.moore at gmail.com Mon Aug 28 11:08:31 2006 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 28 Aug 2006 10:08:31 +0100 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <2773CAC687FD5F4689F526998C7E4E5F0743D0@au3010avexu1.global.avaya.com> References: <2773CAC687FD5F4689F526998C7E4E5F0743D0@au3010avexu1.global.avaya.com> Message-ID: <79990c6b0608280208l41c2ae9bm1c76ee3bf06c99a7@mail.gmail.com> On 8/28/06, Delaney, Timothy (Tim) wrote: > For when/*if* views ever become considered to be a good thing for > builtin classes, etc, may I suggest that the following syntax be > reserved for view creation: > > obj{start:stop:step} > > mapping to something like: > > def __view__(self, slice) > > So if you really want a string view, use: > > s{1:2} > > instead of: > > s[1:2] > > I don't *think* the syntax is currently legal, and I don't think it > could ever be ambiguous - anyone think of a case where it could be? OTOH, it is very subtle. I had to lean closer to the monitor before I could even see the distinction you were making! (OK, some of that is due to less-than-ideal fonts plus failing eyesight, but the point remains...) Paul. From brian at sweetapp.com Mon Aug 28 11:35:39 2006 From: brian at sweetapp.com (Brian Quinlan) Date: Mon, 28 Aug 2006 11:35:39 +0200 Subject: [Python-3000] Warning about future-unsafe usage patterns in Python 2.x e.g. dict.keys().sort() In-Reply-To: <20060827214348.1AF4.JCARLSON@uci.edu> References: <20060827191547.1AEB.JCARLSON@uci.edu> <44F255D6.2060002@canterbury.ac.nz> <20060827214348.1AF4.JCARLSON@uci.edu> Message-ID: <44F2B8EB.6040704@sweetapp.com> It is my understanding that, in Python 3000, certain functions and methods that currently return lists will return some sort of view type (e.g. dict.values()) or an iterator (e.g. zip). So certain usage patterns will no longer be supported e.g. d.keys().sort(). The attached patch, which is a diff against the subversion "trunk" of Python 2.x, tries to warn the user about these kind of future-unsafe usage patterns. It works by storing the type that the list will become in the future, at creation time, and checking to see if called list functions will be supported by that type in the future. Currently the patch if very incomplete and the idea itself may be flawed. But I thought it was interesting to run against my own code to see what potential problems it has. Example: ... Type "help", "copyright", "credits" or "license" for more information. >>> d = {"apple" : "sweet", "orange" : "tangy"} >>> "juicy" in d.values() False >>> d.keys().sort() __main__:1: DeprecationWarning: dictionary view will not support sort >>> "a" in zip([1,2,3,4], "abcd") __main__:1: DeprecationWarning: iterator will not support contains False Cheers, Brian -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: warn_list_usage.diff Url: http://mail.python.org/pipermail/python-3000/attachments/20060828/8f08a2a7/attachment-0001.diff From g.brandl at gmx.net Mon Aug 28 12:22:11 2006 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 28 Aug 2006 12:22:11 +0200 Subject: [Python-3000] Set literals Message-ID: At python.org/sf/1547796, there is a preliminary patch for Py3k set literals as specified in PEP 3100. Set comprehensions are not implemented. have fun, Georg From rrr at ronadam.com Mon Aug 28 13:14:14 2006 From: rrr at ronadam.com (Ron Adam) Date: Mon, 28 Aug 2006 06:14:14 -0500 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <44F0107B.20205@iinet.net.au> References: <44F0107B.20205@iinet.net.au> Message-ID: Nick Coghlan wrote: > This idea is inspired by the find/rfind string discussion (particularly a > couple of comments from Jim and Ron), but I think the applicability may prove > to be wider than just string methods (e.g. I suspect it may prove useful for > the bytes() type as well). If I'm following the ideas here which was based (only in part) on my suggestion. It's not a major feature request, but instead a combination of various small changes in which each may have some benefits of their own. The proposal is more in line with cleaning up things so they can (if one desires) get them to work together easier. But that needn't be the main reason for doing it. I also recognize that python has many very specific functions and modules, many of which are highly optimized. Most of the major problems have already been solved in that way, so it is really hard to find things that make a big difference. But I don't think that means we shouldn't work on making small improvements to things where they are possible, even if it's only to make it a bit easier to remember and/or learn. > I think an enriched slicing model that allows sequence views to be expressed > easily as "this slice of this sequence" would allow this to be dealt with > cleanly, without requiring every sequence to provide a corresponding "sequence > view" with non-copying semantics. I think Guido's concern that people will > reach for string views when they don't need them is also valid (as I believe > that it is most often inexperience that leads to premature optimization that > then leads to needless code complexity). I agree with both of these, but maybe we should concentrate on the individual changes and not a big picture to justify a group of changes. The individual changes or enhancements need to stand on their own. So in that light, the following individual *separate* items is what I would focus on for now. (Not string views or slice partition functions. Let those come later if they prove useful.) > The specific changes I suggest based on the find/rfind discussion are: > > 1. make range() (what used to be xrange()) a subclass of slice(), so that > range objects can be used to index sequences. The only differences between > range() and slice() would then be that start/stop/step will never be None for > range instances, and range instances act like an immutable sequence while > slice instances do not (i.e. range objects would grow an indices() method). 1. Remove None stored as indices in slice objects. Depending on the step value, Any Nones can be converted to 0 or -1 immediately, the step should never be None or Zero. Once the slice is created the Nones are not needed, valid index values can be determined. This moves the checks forward to slice object creation time from slice object use time. If a slice object is reused, then there might be some (micro) performance benefits if it is defined outside a loop and then used multiple times inside a loop. Also the indices can be read and used directly via slice.start, etc... without having to check for None or invalid index's if someone wants to do that. > 2. change range() and slice() to accept slice() instances as arguments so > that range(range(0)) is equivalent to range(0). (range(x) may throw ValueError > if x.stop is None). 2. Enable slices and ranges to be converted back and forth. This works now. >>> xrange(*slice(1,-1,1).indices(10)) xrange(1, 9) There is no way to get the indices from an xrange object. They are not available via attributes or methods, (that I know of), but they can be gotten by parsing the __repr__ string. So this doesn't work. slice(*xrange(1,10,1).indices()) # no indices method While I don't have any real specific use case for this item, it may have some educational or introspective value. ie... something to teach the relationships of each. An xrange() object can also be defined outside a loop and then used multiple times in an inner loop. 3. Continue to make xrange() and slice() a bit more alike in how they work and the values they return, but keep them separate and don't subclass range from slice. Each has a definite different purpose although they are related in some ways they shouldn't try to 'be' the other I think. The following examples show some inconsistencies in how they work or where they could be more alike. For example viewing a xrange vs slice objects returns differing representations depending on what the values of the indices are. These are just minor (barely) annoyances, and there isn't anything actually wrong, but they could be improved a bit I think. # slice always shows all three values if viewed. (This is ok) >>> slice(10) slice(None, 10, None) # None stored as indices. >>> slice(0, 10, 1) slice(0, 10, 1) # - xrange only shows values different from the defaults. >>> xrange(10) xrange(10) >>> xrange(1, 10) xrange(1, 10) >>> xrange(0, 10, 1) xrange(10) # hides 0 and 1 # - The xrange stop value is always an even increment of # the step value + start. is even numbered. >>> xrange(1, 10, 2) xrange(1, 11, 2) # 11! why not 10 here? >>> xrange(0, 10, 3) xrange(0, 12, 3) # and 10 here instead of 12? # slice accepts anything! >>> slice(1, 10, 0) # zero for step slice(1, 10, 0) >>> slice(list, int, dict) slice(, , ) # xrange rejects any invalid index's. >>> xrange(None, 10, None) # None not an integer. Traceback (most recent call last): File "", line 1, in ? TypeError: an integer is required >>> xrange(1, 10, 0) Traceback (most recent call last): File "", line 1, in ? ValueError: xrange() arg 3 must not be zero 4. Allow slice objects to be sub-classed. That will allow for experimentation and or for programmers to modify slice in ways they may find useful for their "own" applications. Most likely it would be a way to group methods together that all use the same start, stop and or step indices. And then could it be possible to apply those via the slice operation at once? 5. Find a way to avoid slice wrap-a-rounds. These happen when iterating past zero in either direction. It usually requires a different approach and/or check to avoid going past the zero/-1 boundary. One thought I've had on this is to allow only positive integers along with a symbol to indicate an index is to be counted from the far end. Then an exception could be raised if a negative index is used. Possibly something like: [i:\j] # '\' indicate j is to be counted from the far end. The line continuation back slash could be special cased for use with slices I think. But some other symbol might be better. I think this group of separate items taken together will do what the title in this thread suggests. But each of these is a separate item in itself as well and has its own reasons why it could be helpful. Regarding the other items... The above changes possibly make some (or most) of the other suggestions possible and/or easier to implement. So then a programmer can roll their own string views or slice partition functions in a clean way if they want to. That's the point of the "Making more effective use of slice objects". Its not a specific idea, but a generality that may come about by doing these other smaller things first. And doing them as a group is probably a good way to address these things. I hope this clarifies at least my view point if not Nicks. But I'll keep an open mind and see what he has to offer in his PEP. Cheers, Ron From ncoghlan at gmail.com Mon Aug 28 13:40:41 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 28 Aug 2006 21:40:41 +1000 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: References: <44F0107B.20205@iinet.net.au> <44F1BA0E.3040203@gmail.com> Message-ID: <44F2D639.1080808@gmail.com> Guido van Rossum wrote: > On 8/27/06, Nick Coghlan wrote: >> I believe the current mixed model is actually an artifact of the >> transition >> from simple slicing to extended slicing, > > Really? Extended slicing mostly meant adding a third "step" option to > the slice syntax, which is useful for NumPy but completely pointless > for string searches as we're discussing here. The slice() object was > invented as an API hack so that we didn't have to add new special > methods. This is exactly what I'm talking about - I believe the reason you don't see it as an oddity, is because you were used to the "start+stop" idiom from before slice() was added. For me, only starting to seriously use Python after the __*slice__ family of methods had already been deprecated, slice() objects are the basic idiom, with any occurrences of "start+stop" being artifacts of the old slicing model. For someone picking up the language after slice() has been added, it's like "we've gone to all the effort of defining a type just for sequence slices, but we're only going to use it in this one little corner of the language". >> All other instances in the core and standard library which use a >> different >> representation of a sequence slice (like the optional arguments to string >> methods, or the result of the indices() method) would change to use >> one of >> those two types. The methods of the types would be driven by the needs >> of the >> standard library. > > What's the indices() method? An existing method on slice objects that accepts a sequence length and returns the appropriate (start, stop, step) 3-tuple. Very handy for implementing __getitem__ methods properly. > Write the PEP and make sure it is plentiful of examples of old and new > ways of doing common string operations. Indeed! Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From edcjones at comcast.net Mon Aug 28 16:21:10 2006 From: edcjones at comcast.net (Edward C. Jones) Date: Mon, 28 Aug 2006 10:21:10 -0400 Subject: [Python-3000] Warning about future-unsafe usage patterns in Python 2.x e.g. dict.keys().sort() In-Reply-To: References: Message-ID: <44F2FBD6.6040205@comcast.net> Brian Quinlan said: > It is my understanding that, in Python 3000, certain functions and > methods that currently return lists will return some sort of view type > (e.g. dict.values()) or an iterator (e.g. zip). So certain usage > patterns will no longer be supported e.g. d.keys().sort(). I use this idiom fairly often: d = dict() ... thekeys = d.keys() thekeys.sort() for key in thekeys: ... What should I use in Python 3.0? From fdrake at acm.org Mon Aug 28 16:45:23 2006 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon, 28 Aug 2006 10:45:23 -0400 Subject: [Python-3000] Warning about future-unsafe usage patterns in Python 2.x e.g. dict.keys().sort() In-Reply-To: <44F2FBD6.6040205@comcast.net> References: <44F2FBD6.6040205@comcast.net> Message-ID: <200608281045.24215.fdrake@acm.org> On Monday 28 August 2006 10:21, Edward C. Jones wrote: > d = dict() > ... > thekeys = d.keys() > thekeys.sort() > for key in thekeys: > ... > > What should I use in Python 3.0? d = dict() ... for key in sorted(d.keys()): ... -Fred -- Fred L. Drake, Jr. From ronaldoussoren at mac.com Mon Aug 28 16:46:53 2006 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Mon, 28 Aug 2006 16:46:53 +0200 Subject: [Python-3000] Warning about future-unsafe usage patterns in Python 2.x e.g. dict.keys().sort() In-Reply-To: <44F2FBD6.6040205@comcast.net> References: <44F2FBD6.6040205@comcast.net> Message-ID: <6667A80E-E767-4408-8B24-AF9AF3F2DAB0@mac.com> On 28-aug-2006, at 16:21, Edward C. Jones wrote: > > Brian Quinlan said: >> It is my understanding that, in Python 3000, certain functions and >> methods that currently return lists will return some sort of view >> type >> (e.g. dict.values()) or an iterator (e.g. zip). So certain usage >> patterns will no longer be supported e.g. d.keys().sort(). > > I use this idiom fairly often: > > d = dict() > ... > thekeys = d.keys() > thekeys.sort() > for key in thekeys: > ... > > What should I use in Python 3.0? for key in sorted(d.keys()): ... This works in python 2.4 as well. Ronald From david.nospam.hopwood at blueyonder.co.uk Mon Aug 28 17:33:31 2006 From: david.nospam.hopwood at blueyonder.co.uk (David Hopwood) Date: Mon, 28 Aug 2006 16:33:31 +0100 Subject: [Python-3000] Warning about future-unsafe usage patterns in Python 2.x e.g. dict.keys().sort() In-Reply-To: <44F2B8EB.6040704@sweetapp.com> References: <20060827191547.1AEB.JCARLSON@uci.edu> <44F255D6.2060002@canterbury.ac.nz> <20060827214348.1AF4.JCARLSON@uci.edu> <44F2B8EB.6040704@sweetapp.com> Message-ID: <44F30CCB.8080705@blueyonder.co.uk> Brian Quinlan wrote: > It is my understanding that, in Python 3000, certain functions and > methods that currently return lists will return some sort of view type > (e.g. dict.values()) or an iterator (e.g. zip). So certain usage > patterns will no longer be supported e.g. d.keys().sort(). > > The attached patch, which is a diff against the subversion "trunk" of > Python 2.x, tries to warn the user about these kind of future-unsafe > usage patterns. It works by storing the type that the list will become > in the future, at creation time, and checking to see if called list > functions will be supported by that type in the future. +1 on the idea of the patch. Some nitpicking: > +#define PY_REMAIN_LIST 0x01 /* List will remain a list in Py2K */ "in Py3K". > + /* XXX This should be PyExc_PendingDeprecationWarning */ > + if (PyErr_WarnEx(PyExc_DeprecationWarning, message, 1) < 0) > + return -1; Why isn't it PyExc_PendingDeprecationWarning? > +#define WARN_LIST_USAGE(self, supported_types, operation) \ > + if (warn_future_usage((PyListObject *) self, \ > + supported_types, operation) < 0) \ > + return NULL; > + > +#define WARN_LIST_USAGE_INT(self, supported_types, operation) \ > + if (warn_future_usage((PyListObject *) self, \ > + supported_types, operation) < 0) \ > + return -1; These are macros that hide control flow. In this case I don't think that the difference in verbosity between, say, if (warn_future_usage(a, PY_REMAIN_LIST | PY_BECOME_DICTVIEW, "len") < 0) return -1; and WARN_LIST_USAGE_INT(a, PY_REMAIN_LIST | PY_BECOME_DICTVIEW, "len"); is sufficient to justify hiding the return in a macro. (The cast to PyListObject * is not needed: you have the same cast within warn_future_usage, so its 'self' argument could just as well be declared as PyObject *.) The 'operation' string is sometimes a gerund ("slicing", etc.) and sometimes the name of a method. This should be more consistent. > + WARN_LIST_USAGE(a, PY_REMAIN_LIST, "repitition"); "repetition" -- David Hopwood From guido at python.org Mon Aug 28 18:22:52 2006 From: guido at python.org (Guido van Rossum) Date: Mon, 28 Aug 2006 09:22:52 -0700 Subject: [Python-3000] Warning about future-unsafe usage patterns in Python 2.x e.g. dict.keys().sort() In-Reply-To: <44F2B8EB.6040704@sweetapp.com> References: <20060827191547.1AEB.JCARLSON@uci.edu> <44F255D6.2060002@canterbury.ac.nz> <20060827214348.1AF4.JCARLSON@uci.edu> <44F2B8EB.6040704@sweetapp.com> Message-ID: Not much time to review the patch, but +1 on this -- I've described this a few times in my Py3k talk, glad that some code is forthcoming now! --Guido On 8/28/06, Brian Quinlan wrote: > It is my understanding that, in Python 3000, certain functions and > methods that currently return lists will return some sort of view type > (e.g. dict.values()) or an iterator (e.g. zip). So certain usage > patterns will no longer be supported e.g. d.keys().sort(). > > The attached patch, which is a diff against the subversion "trunk" of > Python 2.x, tries to warn the user about these kind of future-unsafe > usage patterns. It works by storing the type that the list will become > in the future, at creation time, and checking to see if called list > functions will be supported by that type in the future. > > Currently the patch if very incomplete and the idea itself may be > flawed. But I thought it was interesting to run against my own code to > see what potential problems it has. Example: > > ... > Type "help", "copyright", "credits" or "license" for more information. > >>> d = {"apple" : "sweet", "orange" : "tangy"} > >>> "juicy" in d.values() > False > >>> d.keys().sort() > __main__:1: DeprecationWarning: dictionary view will not support sort > >>> "a" in zip([1,2,3,4], "abcd") > __main__:1: DeprecationWarning: iterator will not support contains > False > > Cheers, > Brian > > > Index: Python/bltinmodule.c > =================================================================== > --- Python/bltinmodule.c (revision 51629) > +++ Python/bltinmodule.c (working copy) > @@ -1570,7 +1570,7 @@ > goto Fail; > } > > - v = PyList_New(n); > + v = PyList_NewFutureType(n, PY_BECOME_ITER); > if (v == NULL) > goto Fail; > > @@ -1678,7 +1678,7 @@ > "range() result has too many items"); > return NULL; > } > - v = PyList_New(n); > + v = PyList_NewFutureType(n, PY_BECOME_ITER); > if (v == NULL) > return NULL; > for (i = 0; i < n; i++) { > @@ -2120,7 +2120,7 @@ > Py_ssize_t len; /* guess at result length */ > > if (itemsize == 0) > - return PyList_New(0); > + return PyList_NewFutureType(0, PY_BECOME_ITER); > > /* args must be a tuple */ > assert(PyTuple_Check(args)); > @@ -2148,7 +2148,7 @@ > /* allocate result list */ > if (len < 0) > len = 10; /* arbitrary */ > - if ((ret = PyList_New(len)) == NULL) > + if ((ret = PyList_NewFutureType(len, PY_BECOME_ITER)) == NULL) > return NULL; > > /* obtain iterators */ > Index: Include/listobject.h > =================================================================== > --- Include/listobject.h (revision 51629) > +++ Include/listobject.h (working copy) > @@ -19,6 +19,12 @@ > extern "C" { > #endif > > +/* Constants representing the types that may be used instead of a list > + in Python 3000 */ > +#define PY_REMAIN_LIST 0x01 /* List will remain a list in Py2K */ > +#define PY_BECOME_DICTVIEW 0x02 /* List will become a "view" on a dict */ > +#define PY_BECOME_ITER 0x04 /* List will become an iterator */ > + > typedef struct { > PyObject_VAR_HEAD > /* Vector of pointers to list elements. list[0] is ob_item[0], etc. */ > @@ -36,6 +42,7 @@ > * the list is not yet visible outside the function that builds it. > */ > Py_ssize_t allocated; > + int future_type; /* The type the object will have in Py3K */ > } PyListObject; > > PyAPI_DATA(PyTypeObject) PyList_Type; > @@ -44,6 +51,7 @@ > #define PyList_CheckExact(op) ((op)->ob_type == &PyList_Type) > > PyAPI_FUNC(PyObject *) PyList_New(Py_ssize_t size); > +PyAPI_FUNC(PyObject *) PyList_NewFutureType(Py_ssize_t size, int future_type); > PyAPI_FUNC(Py_ssize_t) PyList_Size(PyObject *); > PyAPI_FUNC(PyObject *) PyList_GetItem(PyObject *, Py_ssize_t); > PyAPI_FUNC(int) PyList_SetItem(PyObject *, Py_ssize_t, PyObject *); > @@ -57,6 +65,9 @@ > PyAPI_FUNC(PyObject *) _PyList_Extend(PyListObject *, PyObject *); > > /* Macro, trading safety for speed */ > +/* XXX These functions do not (yet) trigger future usage warnings. > + So e.g. range(100)[0] will slip though > +*/ > #define PyList_GET_ITEM(op, i) (((PyListObject *)(op))->ob_item[i]) > #define PyList_SET_ITEM(op, i, v) (((PyListObject *)(op))->ob_item[i] = (v)) > #define PyList_GET_SIZE(op) (((PyListObject *)(op))->ob_size) > Index: Objects/dictobject.c > =================================================================== > --- Objects/dictobject.c (revision 51629) > +++ Objects/dictobject.c (working copy) > @@ -1003,7 +1003,7 @@ > > again: > n = mp->ma_used; > - v = PyList_New(n); > + v = PyList_NewFutureType(n, PY_BECOME_DICTVIEW); > if (v == NULL) > return NULL; > if (n != mp->ma_used) { > @@ -1037,7 +1037,7 @@ > > again: > n = mp->ma_used; > - v = PyList_New(n); > + v = PyList_NewFutureType(n, PY_BECOME_DICTVIEW); > if (v == NULL) > return NULL; > if (n != mp->ma_used) { > @@ -1076,7 +1076,7 @@ > */ > again: > n = mp->ma_used; > - v = PyList_New(n); > + v = PyList_NewFutureType(n, PY_BECOME_DICTVIEW); > if (v == NULL) > return NULL; > for (i = 0; i < n; i++) { > Index: Objects/listobject.c > =================================================================== > --- Objects/listobject.c (revision 51629) > +++ Objects/listobject.c (working copy) > @@ -8,6 +8,49 @@ > #include /* For size_t */ > #endif > > +static int warn_future_usage(PyListObject *self, > + int supported_types, char *operation) > +{ > + char message[256]; > + > + if ((((PyListObject *) self)->future_type & supported_types) == 0) > + { > + switch (self->future_type) { > + case PY_BECOME_DICTVIEW: > + PyOS_snprintf(message, sizeof(message), > + "dictionary view will not support %s", > + operation); > + break; > + case PY_BECOME_ITER: > + PyOS_snprintf(message, sizeof(message), > + "iterator will not support %s", > + operation); > + break; > + default: /* This shouldn't happen */ > + PyErr_BadInternalCall(); > + return -1; > + } > + > + /* XXX This should be PyExc_PendingDeprecationWarning */ > + if (PyErr_WarnEx(PyExc_DeprecationWarning, message, 1) < 0) > + return -1; > + } > + > + return 0; > +} > + > +#define WARN_LIST_USAGE(self, supported_types, operation) \ > + if (warn_future_usage((PyListObject *) self, \ > + supported_types, operation) < 0) \ > + return NULL; > + > +#define WARN_LIST_USAGE_INT(self, supported_types, operation) \ > + if (warn_future_usage((PyListObject *) self, \ > + supported_types, operation) < 0) \ > + return -1; > + > +#define PyList_Check(op) PyObject_TypeCheck(op, &PyList_Type) > + > /* Ensure ob_item has room for at least newsize elements, and set > * ob_size to newsize. If newsize > ob_size on entry, the content > * of the new slots at exit is undefined heap trash; it's the caller's > @@ -116,10 +159,29 @@ > } > op->ob_size = size; > op->allocated = size; > + op->future_type = PY_REMAIN_LIST; > _PyObject_GC_TRACK(op); > return (PyObject *) op; > } > > +PyObject * > +PyList_NewFutureType(Py_ssize_t size, int future_type) > +{ > + PyListObject *op = (PyListObject *) PyList_New(size); > + if (op == NULL) > + return NULL; > + else { > + if (future_type == 0) > + { > + Py_DECREF(op); > + PyErr_BadInternalCall(); > + return NULL; > + } > + op->future_type = future_type; > + return (PyObject *) op; > + } > +} > + > Py_ssize_t > PyList_Size(PyObject *op) > { > @@ -369,6 +431,7 @@ > static Py_ssize_t > list_length(PyListObject *a) > { > + WARN_LIST_USAGE_INT(a, PY_REMAIN_LIST | PY_BECOME_DICTVIEW, "len"); > return a->ob_size; > } > > @@ -378,6 +441,7 @@ > Py_ssize_t i; > int cmp; > > + WARN_LIST_USAGE_INT(a, PY_REMAIN_LIST | PY_BECOME_DICTVIEW, "contains"); > for (i = 0, cmp = 0 ; cmp == 0 && i < a->ob_size; ++i) > cmp = PyObject_RichCompareBool(el, PyList_GET_ITEM(a, i), > Py_EQ); > @@ -387,6 +451,7 @@ > static PyObject * > list_item(PyListObject *a, Py_ssize_t i) > { > + WARN_LIST_USAGE(a, PY_REMAIN_LIST, "item indexing"); > if (i < 0 || i >= a->ob_size) { > if (indexerr == NULL) > indexerr = PyString_FromString( > @@ -404,6 +469,8 @@ > PyListObject *np; > PyObject **src, **dest; > Py_ssize_t i, len; > + > + WARN_LIST_USAGE(a, PY_REMAIN_LIST, "slicing"); > if (ilow < 0) > ilow = 0; > else if (ilow > a->ob_size) > @@ -444,6 +511,9 @@ > Py_ssize_t i; > PyObject **src, **dest; > PyListObject *np; > + > + WARN_LIST_USAGE(a, PY_REMAIN_LIST, "concatenation"); > + > if (!PyList_Check(bb)) { > PyErr_Format(PyExc_TypeError, > "can only concatenate list (not \"%.200s\") to list", > @@ -484,6 +554,8 @@ > PyListObject *np; > PyObject **p, **items; > PyObject *elem; > + > + WARN_LIST_USAGE(a, PY_REMAIN_LIST, "repitition"); > if (n < 0) > n = 0; > size = a->ob_size * n; > @@ -521,6 +593,8 @@ > { > Py_ssize_t i; > PyObject **item = a->ob_item; > + > + WARN_LIST_USAGE_INT(a, PY_REMAIN_LIST, "clear"); > if (item != NULL) { > /* Because XDECREF can recursively invoke operations on > this list, we make it empty first. */ > @@ -565,6 +639,9 @@ > Py_ssize_t k; > size_t s; > int result = -1; /* guilty until proved innocent */ > + > + WARN_LIST_USAGE_INT(a, PY_REMAIN_LIST, "slicing"); > + > #define b ((PyListObject *)v) > if (v == NULL) > n = 0; > @@ -658,9 +735,9 @@ > { > PyObject **items; > Py_ssize_t size, i, j, p; > + size = PyList_GET_SIZE(self); > > - > - size = PyList_GET_SIZE(self); > + WARN_LIST_USAGE(self, PY_REMAIN_LIST, "repeat"); > if (size == 0) { > Py_INCREF(self); > return (PyObject *)self; > @@ -692,6 +769,8 @@ > list_ass_item(PyListObject *a, Py_ssize_t i, PyObject *v) > { > PyObject *old_value; > + > + WARN_LIST_USAGE_INT(a, PY_REMAIN_LIST, "item assignment"); > if (i < 0 || i >= a->ob_size) { > PyErr_SetString(PyExc_IndexError, > "list assignment index out of range"); > @@ -711,6 +790,8 @@ > { > Py_ssize_t i; > PyObject *v; > + > + WARN_LIST_USAGE(self, PY_REMAIN_LIST, "insert"); > if (!PyArg_ParseTuple(args, "nO:insert", &i, &v)) > return NULL; > if (ins1(self, i, v) == 0) > @@ -721,6 +802,7 @@ > static PyObject * > listappend(PyListObject *self, PyObject *v) > { > + WARN_LIST_USAGE(self, PY_REMAIN_LIST, "append"); > if (app1(self, v) == 0) > Py_RETURN_NONE; > return NULL; > @@ -736,6 +818,7 @@ > Py_ssize_t i; > PyObject *(*iternext)(PyObject *); > > + WARN_LIST_USAGE(self, PY_REMAIN_LIST, "extend"); > /* Special cases: > 1) lists and tuples which can use PySequence_Fast ops > 2) extending self to self requires making a copy first > @@ -851,6 +934,7 @@ > { > PyObject *result; > > + WARN_LIST_USAGE(self, PY_REMAIN_LIST, "concatentation"); > result = listextend(self, other); > if (result == NULL) > return result; > @@ -866,6 +950,7 @@ > PyObject *v, *arg = NULL; > int status; > > + WARN_LIST_USAGE(self, PY_REMAIN_LIST, "pop"); > if (!PyArg_UnpackTuple(args, "pop", 0, 1, &arg)) > return NULL; > if (arg != NULL) { > @@ -1995,6 +2080,8 @@ > PyObject *key, *value, *kvpair; > static char *kwlist[] = {"cmp", "key", "reverse", 0}; > > + WARN_LIST_USAGE(self, PY_REMAIN_LIST, "sort"); > + > assert(self != NULL); > assert (PyList_Check(self)); > if (args != NULL) { > @@ -2163,6 +2250,7 @@ > static PyObject * > listreverse(PyListObject *self) > { > + WARN_LIST_USAGE(self, PY_REMAIN_LIST, "reverse"); > if (self->ob_size > 1) > reverse_slice(self->ob_item, self->ob_item + self->ob_size); > Py_RETURN_NONE; > @@ -2213,6 +2301,7 @@ > Py_ssize_t i, start=0, stop=self->ob_size; > PyObject *v; > > + WARN_LIST_USAGE(self, PY_REMAIN_LIST, "index"); > if (!PyArg_ParseTuple(args, "O|O&O&:index", &v, > _PyEval_SliceIndex, &start, > _PyEval_SliceIndex, &stop)) > @@ -2244,6 +2333,7 @@ > Py_ssize_t count = 0; > Py_ssize_t i; > > + WARN_LIST_USAGE(self, PY_REMAIN_LIST, "count"); > for (i = 0; i < self->ob_size; i++) { > int cmp = PyObject_RichCompareBool(self->ob_item[i], v, Py_EQ); > if (cmp > 0) > @@ -2259,6 +2349,7 @@ > { > Py_ssize_t i; > > + WARN_LIST_USAGE(self, PY_REMAIN_LIST, "remove"); > for (i = 0; i < self->ob_size; i++) { > int cmp = PyObject_RichCompareBool(self->ob_item[i], v, Py_EQ); > if (cmp > 0) { > @@ -2372,6 +2463,7 @@ > self->allocated == 0 || self->allocated == -1); > > /* Empty previous contents */ > + self->future_type = PY_REMAIN_LIST; > if (self->ob_item != NULL) { > (void)list_clear(self); > } > @@ -2456,6 +2548,8 @@ > static PyObject * > list_subscript(PyListObject* self, PyObject* item) > { > + WARN_LIST_USAGE(self, PY_REMAIN_LIST, "__getitem__"); > + > if (PyIndex_Check(item)) { > Py_ssize_t i; > i = PyNumber_AsSsize_t(item, PyExc_IndexError); > @@ -2505,6 +2599,7 @@ > static int > list_ass_subscript(PyListObject* self, PyObject* item, PyObject* value) > { > + WARN_LIST_USAGE_INT(self, PY_REMAIN_LIST, "item assignment"); > if (PyIndex_Check(item)) { > Py_ssize_t i = PyNumber_AsSsize_t(item, PyExc_IndexError); > if (i == -1 && PyErr_Occurred()) > @@ -2874,6 +2969,7 @@ > { > listreviterobject *it; > > + WARN_LIST_USAGE(seq, PY_REMAIN_LIST, "reversed"); > it = PyObject_GC_New(listreviterobject, &PyListRevIter_Type); > if (it == NULL) > return NULL; > > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Aug 28 18:42:06 2006 From: guido at python.org (Guido van Rossum) Date: Mon, 28 Aug 2006 09:42:06 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <20060827184941.1AE8.JCARLSON@uci.edu> References: <20060827091000.1ADF.JCARLSON@uci.edu> <20060827184941.1AE8.JCARLSON@uci.edu> Message-ID: Josiah (and other supporters of string views), You seem to be utterly convinced of the superior performance of your proposal without having done any measurements. You appear to have a rather naive view on what makes code execute fast or slow (e.g. you don't seem to appreciate the savings due to a string object header and its data being consecutive in memory). Unless you have serious benchmark data (for realistic Python code) I can't continue to participate in this discussion, where you have said nothing new in many posts. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik at pythonware.com Mon Aug 28 18:48:52 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Mon, 28 Aug 2006 18:48:52 +0200 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: References: <20060827091000.1ADF.JCARLSON@uci.edu> <20060827184941.1AE8.JCARLSON@uci.edu> Message-ID: Guido van Rossum wrote: > (e.g. you don't seem to appreciate the savings due to a string > object header and its data being consecutive in memory). footnote: note that the Unicode string type still doesn't do that (my original implementation *did* support string views, and nobody's ever gotten around to fully rip it out), so if anyone wants to benchmark things related to this specific feature, comparing unicode strings with 8-bit strings could be someone useful. From guido at python.org Mon Aug 28 18:52:23 2006 From: guido at python.org (Guido van Rossum) Date: Mon, 28 Aug 2006 09:52:23 -0700 Subject: [Python-3000] Set literals In-Reply-To: References: Message-ID: On 8/28/06, Georg Brandl wrote: > At python.org/sf/1547796, there is a preliminary patch for Py3k set literals > as specified in PEP 3100. Very cool! This is now checked in. Georg, can you do something about repr() of an empty set? This currently produces "{}" while it should produce "set()". > Set comprehensions are not implemented. ETA? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From g.brandl at gmx.net Mon Aug 28 19:44:52 2006 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 28 Aug 2006 19:44:52 +0200 Subject: [Python-3000] Set literals In-Reply-To: References: Message-ID: Guido van Rossum wrote: > On 8/28/06, Georg Brandl wrote: >> At python.org/sf/1547796, there is a preliminary patch for Py3k set literals >> as specified in PEP 3100. > > Very cool! This is now checked in. Wow, that's fast... > Georg, can you do something about repr() of an empty set? This > currently produces "{}" while it should produce "set()". Right, forgot about that case. I'll correct that now. (Grr, I even mindlessly changed the unittest that would have caught it) In the meantime, I played around with the peepholer and tried to copy the "for x in tuple_or_list" optimization for sets. Results are in SF patch #1548082. >> Set comprehensions are not implemented. > > ETA? There are some points I'd like to have clarified first: * would it be wise to have some general listcomp <-> genexp cleanup first? This starts with the grammar, which currently is slightly different (see Grammar:79), and it looks like there's quite a lot of (almost) duplicated code in ast.c and compile.c too. * list comprehensions are special-cased because of the LIST_APPEND opcode. If there isn't going to be a special-cased SET_ADD, it's probably the easiest thing to transform {x for x in a} into set(x for x in a) in the AST step, with "set" of course always being the builtin set. Georg From guido at python.org Mon Aug 28 20:55:30 2006 From: guido at python.org (Guido van Rossum) Date: Mon, 28 Aug 2006 11:55:30 -0700 Subject: [Python-3000] Set literals In-Reply-To: References: Message-ID: On 8/28/06, Georg Brandl wrote: > Guido van Rossum wrote: > > On 8/28/06, Georg Brandl wrote: > >> At python.org/sf/1547796, there is a preliminary patch for Py3k set literals > >> as specified in PEP 3100. > > > > Very cool! This is now checked in. > > Wow, that's fast... Well it passed all unit tests and the rules for the py3k branch are a bit looser than for the head... :) > > Georg, can you do something about repr() of an empty set? This > > currently produces "{}" while it should produce "set()". > > Right, forgot about that case. I'll correct that now. > (Grr, I even mindlessly changed the unittest that would have caught it) Checkin? > In the meantime, I played around with the peepholer and tried to copy > the "for x in tuple_or_list" optimization for sets. Results are in SF > patch #1548082. > > >> Set comprehensions are not implemented. > > > > ETA? > > There are some points I'd like to have clarified first: > > * would it be wise to have some general listcomp <-> genexp > cleanup first? This starts with the grammar, which currently is slightly > different (see Grammar:79), and it looks like there's quite a lot of > (almost) duplicated code in ast.c and compile.c too. I expec this cleanup to be quite a bit of work since the semantics are seriously different. ([...] uses the surrounding scope for the loop control variables.) However you might be able to just cleanup the grammar so they are identical, that would be simpler I suspect. > * list comprehensions are special-cased because of the LIST_APPEND opcode. > If there isn't going to be a special-cased SET_ADD, it's probably the > easiest thing to transform {x for x in a} into set(x for x in a) in the > AST step, with "set" of course always being the builtin set. Right. That might actually become a prototype for how to the list translation as well. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jcarlson at uci.edu Mon Aug 28 21:49:39 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Mon, 28 Aug 2006 12:49:39 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: References: <20060827184941.1AE8.JCARLSON@uci.edu> Message-ID: <20060828120741.1AF7.JCARLSON@uci.edu> "Guido van Rossum" wrote: > > Josiah (and other supporters of string views), > > You seem to be utterly convinced of the superior performance of your > proposal without having done any measurements. > > You appear to have a rather naive view on what makes code execute fast > or slow (e.g. you don't seem to appreciate the savings due to a string > object header and its data being consecutive in memory). > > Unless you have serious benchmark data (for realistic Python code) I > can't continue to participate in this discussion, where you have said > nothing new in many posts. Put up or shut up, eh? I have written a simple extension module using Pyrex (my manual C extension writing is awful). Here are some sample interactions showing that string views are indeed quite fast. In all of these examples, a naive implementation using only stringview.partition() was able to beat Python 2.5 str.partition, str.split, and re.finditer. Attached you will find the implementation of stringview I used, along with sufficient build scripts to get it working using Python 2.3 and Pyrex 0.9.3 . Aside from replacing int usage with Py_ssize_t for 2.5, and *nix users performing a dos2unix call, it should work without change with the most recent Python and Pyrex versions. - Josiah Using 2.3 : >>> x = stringview(40000*' ') >>> if 1: ... t = time.time() ... while x: ... _1, _2, x = x.partition(' ') ... print time.time()-t ... 0.18700003624 >>> Compared with Python 2.5 beta 2 >>> x = 40000*' ' >>> if 1: ... t = time.time() ... while x: ... _1, _2, x = x.partition(' ') ... print time.time()-t ... 0.625 >>> But that's about as bad for Python 2.5 as it can get. What about something else? Like a mail file? In my 21.5 meg archive of py3k, which contains 3456 messages, I wanted to discover all messages. Python 2.3.5 (#62, Feb 8 2005, 16:23:02) [MSC v.1200 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> from stringview import * >>> rest = stringview(open('mail', 'rb').read()) >>> import time >>> if 1: ... x = [] ... t = time.time() ... while rest: ... cur, found, rest = rest.partition('\r\n.\r\n') ... x.append(cur) ... print time.time()-t, len(x) ... 0.0780000686646 3456 >>> What about Python 2.5 using split? That should be fast... Python 2.5b2 (r25b2:50512, Jul 11 2006, 10:16:14) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> rest = open('mail', 'rb').read() >>> import time >>> if 1: ... t = time.time() ... x = rest.split('\r\n.\r\n') ... print time.time()-t, len(x) ... 0.109999895096 3457 >>> Hrm...what about using re? >>> import re >>> pat = re.compile('\r\n\.\r\n') >>> rest = open('mail', 'rb').read() >>> import time >>> if 1: ... x = [] ... t = time.time() ... for i in pat.finditer(rest): ... x.append(i) ... print time.time()-t, len(x) ... 0.125 3456 >>> Even that's not as good as Python 2.3 + string views. -------------- next part -------------- A non-text attachment was scrubbed... Name: stringview_build.py Type: application/octet-stream Size: 654 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20060828/916e8238/attachment.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: stringview.pyx Type: application/octet-stream Size: 2639 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20060828/916e8238/attachment-0001.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: stringview_helper.h Type: application/octet-stream Size: 1656 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20060828/916e8238/attachment-0002.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: _setup.py Type: application/octet-stream Size: 255 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20060828/916e8238/attachment-0003.obj From g.brandl at gmx.net Mon Aug 28 21:52:53 2006 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 28 Aug 2006 21:52:53 +0200 Subject: [Python-3000] Set literals In-Reply-To: References: Message-ID: Guido van Rossum wrote: >> > Georg, can you do something about repr() of an empty set? This >> > currently produces "{}" while it should produce "set()". >> >> Right, forgot about that case. I'll correct that now. >> (Grr, I even mindlessly changed the unittest that would have caught it) > > Checkin? Done. It now also renders repr(frozenset()) as "frozenset()", which should cause no problems though. >> In the meantime, I played around with the peepholer and tried to copy >> the "for x in tuple_or_list" optimization for sets. Results are in SF >> patch #1548082. >> >> >> Set comprehensions are not implemented. >> > >> > ETA? >> >> There are some points I'd like to have clarified first: >> >> * would it be wise to have some general listcomp <-> genexp >> cleanup first? This starts with the grammar, which currently is slightly >> different (see Grammar:79), and it looks like there's quite a lot of >> (almost) duplicated code in ast.c and compile.c too. > > I expec this cleanup to be quite a bit of work since the semantics are > seriously different. ([...] uses the surrounding scope for the loop > control variables.) I didn't say that I wanted to champion that cleanup ;) > However you might be able to just cleanup the grammar so they are > identical, that would be simpler I suspect. Looking at the grammar, there's only testlist_safe left to kill, in favor of or_test like in generator expressions. The old_ rules are still needed. Hm. Is the precedence in x = lambda: 1 if 0 else 2 really obvious? >> * list comprehensions are special-cased because of the LIST_APPEND opcode. >> If there isn't going to be a special-cased SET_ADD, it's probably the >> easiest thing to transform {x for x in a} into set(x for x in a) in the >> AST step, with "set" of course always being the builtin set. > > Right. That might actually become a prototype for how to the list > translation as well. Would this need a new opcode, or should generators be special-cased by BUILD_SET? Which doesn't seem like a good idea because it means that {(x for x in iterable)} == {x for x in iterable} Georg From guido at python.org Mon Aug 28 22:07:55 2006 From: guido at python.org (Guido van Rossum) Date: Mon, 28 Aug 2006 13:07:55 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <20060828120741.1AF7.JCARLSON@uci.edu> References: <20060827184941.1AE8.JCARLSON@uci.edu> <20060828120741.1AF7.JCARLSON@uci.edu> Message-ID: Those are all microbenchmarks. It's easy to prove the superiority of an approach that way. But what about realistic applications? What if your views don't end up saving memory or time for an application, but still cost in terms of added complexity in all string operations? Anyway, let me begin with your microbenchmark. The first one pits a linear algorithm against a quadratic algorithm with the expected result. The second one is more interesting; your version doesn't copy while the split() version copies, and that gives your version the expected speedup. I never doubted this. But your code has a worst-case problem: if you take a single short view of a really long string and then drop the long string, the view keeps it around. Something like this: rest = ... # your mailbox file results = [] for i in range(1000): x = rest + "." # Just to force a copy results.append(x.partition("\r\n.\r\n")[0]) Save the *first* message over and over Now watch the memory growth with your version vs. with standard partition. Now fix this in your code and re-run your benchmark. Then I come with another worst-case scenario, etc. Then I ask you to make it so that string views are 99.999% indistinguishable from strings -- they have all the same methods, are usable everywhere else, etc. --Guido On 8/28/06, Josiah Carlson wrote: > > "Guido van Rossum" wrote: > > > > Josiah (and other supporters of string views), > > > > You seem to be utterly convinced of the superior performance of your > > proposal without having done any measurements. > > > > You appear to have a rather naive view on what makes code execute fast > > or slow (e.g. you don't seem to appreciate the savings due to a string > > object header and its data being consecutive in memory). > > > > Unless you have serious benchmark data (for realistic Python code) I > > can't continue to participate in this discussion, where you have said > > nothing new in many posts. > > Put up or shut up, eh? > > I have written a simple extension module using Pyrex (my manual C > extension writing is awful). Here are some sample interactions showing > that string views are indeed quite fast. In all of these examples, a > naive implementation using only stringview.partition() was able to beat > Python 2.5 str.partition, str.split, and re.finditer. > > Attached you will find the implementation of stringview I used, along > with sufficient build scripts to get it working using Python 2.3 and > Pyrex 0.9.3 . Aside from replacing int usage with Py_ssize_t for 2.5, > and *nix users performing a dos2unix call, it should work without change > with the most recent Python and Pyrex versions. > > - Josiah > > > Using 2.3 : > >>> x = stringview(40000*' ') > >>> if 1: > ... t = time.time() > ... while x: > ... _1, _2, x = x.partition(' ') > ... print time.time()-t > ... > 0.18700003624 > >>> > > Compared with Python 2.5 beta 2 > >>> x = 40000*' ' > >>> if 1: > ... t = time.time() > ... while x: > ... _1, _2, x = x.partition(' ') > ... print time.time()-t > ... > 0.625 > >>> > > But that's about as bad for Python 2.5 as it can get. What about > something else? Like a mail file? In my 21.5 meg archive of py3k, > which contains 3456 messages, I wanted to discover all messages. > > Python 2.3.5 (#62, Feb 8 2005, 16:23:02) [MSC v.1200 32 bit (Intel)] on win32 > Type "help", "copyright", "credits" or "license" for more information. > >>> from stringview import * > >>> rest = stringview(open('mail', 'rb').read()) > >>> import time > >>> if 1: > ... x = [] > ... t = time.time() > ... while rest: > ... cur, found, rest = rest.partition('\r\n.\r\n') > ... x.append(cur) > ... print time.time()-t, len(x) > ... > 0.0780000686646 3456 > >>> > > What about Python 2.5 using split? That should be fast... > > Python 2.5b2 (r25b2:50512, Jul 11 2006, 10:16:14) [MSC v.1310 32 bit (Intel)] on > win32 > Type "help", "copyright", "credits" or "license" for more information. > >>> rest = open('mail', 'rb').read() > >>> import time > >>> if 1: > ... t = time.time() > ... x = rest.split('\r\n.\r\n') > ... print time.time()-t, len(x) > ... > 0.109999895096 3457 > >>> > > Hrm...what about using re? > >>> import re > >>> pat = re.compile('\r\n\.\r\n') > >>> rest = open('mail', 'rb').read() > >>> import time > >>> if 1: > ... x = [] > ... t = time.time() > ... for i in pat.finditer(rest): > ... x.append(i) > ... print time.time()-t, len(x) > ... > 0.125 3456 > >>> > > Even that's not as good as Python 2.3 + string views. > > > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From rhettinger at ewtllc.com Mon Aug 28 22:08:48 2006 From: rhettinger at ewtllc.com (Raymond Hettinger) Date: Mon, 28 Aug 2006 13:08:48 -0700 Subject: [Python-3000] Set literals In-Reply-To: References: Message-ID: <44F34D50.2080805@ewtllc.com> Georg Brandl wrote: >In the meantime, I played around with the peepholer and tried to copy >the "for x in tuple_or_list" optimization for sets. Results are in SF >patch #1548082. > > > Did you mean "if x in tuple_or_list"? IIRC, there was some reason that mutable lists were not supposed to be made into constants in for-loops. >* list comprehensions are special-cased because of the LIST_APPEND opcode. > If there isn't going to be a special-cased SET_ADD, it's probably the > easiest thing to transform {x for x in a} into set(x for x in a) in the > AST step, with "set" of course always being the builtin set. > > > Set comprehensions and list comprehensions are fundamentally the same and therefore should have identical implementations. While transformation to a generator expression may seem like a good idea now, I expect that you'll observe a two-fold performance hit and end-up abandoning that approach in favor of the current LIST_APPEND approach. So it would probably be best to start by teaching the compiler to hide the loop variable in a LIST_APPEND approach to list comprehensions and then duplicate that approach for set comprehensions. Raymond From guido at python.org Mon Aug 28 22:14:17 2006 From: guido at python.org (Guido van Rossum) Date: Mon, 28 Aug 2006 13:14:17 -0700 Subject: [Python-3000] Set literals In-Reply-To: References: Message-ID: On 8/28/06, Georg Brandl wrote: > Guido van Rossum wrote: > > >> > Georg, can you do something about repr() of an empty set? This > >> > currently produces "{}" while it should produce "set()". > >> > >> Right, forgot about that case. I'll correct that now. > >> (Grr, I even mindlessly changed the unittest that would have caught it) > > > > Checkin? > > Done. It now also renders repr(frozenset()) as "frozenset()", which should > cause no problems though. Thanks -- looks good! > >> In the meantime, I played around with the peepholer and tried to copy > >> the "for x in tuple_or_list" optimization for sets. Results are in SF > >> patch #1548082. > >> > >> >> Set comprehensions are not implemented. > >> > > >> > ETA? > >> > >> There are some points I'd like to have clarified first: > >> > >> * would it be wise to have some general listcomp <-> genexp > >> cleanup first? This starts with the grammar, which currently is slightly > >> different (see Grammar:79), and it looks like there's quite a lot of > >> (almost) duplicated code in ast.c and compile.c too. > > > > I expec this cleanup to be quite a bit of work since the semantics are > > seriously different. ([...] uses the surrounding scope for the loop > > control variables.) > > I didn't say that I wanted to champion that cleanup ;) That's fine! > > However you might be able to just cleanup the grammar so they are > > identical, that would be simpler I suspect. > > Looking at the grammar, there's only testlist_safe left to kill, in > favor of or_test like in generator expressions. The old_ rules are still > needed. Hm, it's been so long... Why? > Hm. Is the precedence in > > x = lambda: 1 if 0 else 2 > > really obvious? Yes if you think about how you would use it. Conditionally returning a lambda or something else is kind of rare. A lambda using a condition is kind of useful. :-) > >> * list comprehensions are special-cased because of the LIST_APPEND opcode. > >> If there isn't going to be a special-cased SET_ADD, it's probably the > >> easiest thing to transform {x for x in a} into set(x for x in a) in the > >> AST step, with "set" of course always being the builtin set. > > > > Right. That might actually become a prototype for how to the list > > translation as well. > > Would this need a new opcode, or should generators be special-cased by > BUILD_SET? Can't remember what BUILD_SET is. > Which doesn't seem like a good idea because it means that > {(x for x in iterable)} == {x for x in iterable} That should definitely not happen! -- --Guido van Rossum (home page: http://www.python.org/~guido/) From g.brandl at gmx.net Mon Aug 28 22:32:53 2006 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 28 Aug 2006 22:32:53 +0200 Subject: [Python-3000] Set literals In-Reply-To: References: Message-ID: Guido van Rossum wrote: >> > However you might be able to just cleanup the grammar so they are >> > identical, that would be simpler I suspect. >> >> Looking at the grammar, there's only testlist_safe left to kill, in >> favor of or_test like in generator expressions. The old_ rules are still >> needed. > > Hm, it's been so long... Why? In listcomps/genexps, old_test and old_lambdef do not allow conditional expressions in order to avoid confusion with the loop's "if". >> Hm. Is the precedence in >> >> x = lambda: 1 if 0 else 2 >> >> really obvious? > > Yes if you think about how you would use it. Conditionally returning a > lambda or something else is kind of rare. A lambda using a condition > is kind of useful. :-) Okay, that makes sense. >> >> * list comprehensions are special-cased because of the LIST_APPEND opcode. >> >> If there isn't going to be a special-cased SET_ADD, it's probably the >> >> easiest thing to transform {x for x in a} into set(x for x in a) in the >> >> AST step, with "set" of course always being the builtin set. >> > >> > Right. That might actually become a prototype for how to the list >> > translation as well. >> >> Would this need a new opcode, or should generators be special-cased by >> BUILD_SET? > > Can't remember what BUILD_SET is. Sorry... it's the newly introduced opcode that creates a new set. Georg From g.brandl at gmx.net Mon Aug 28 22:42:33 2006 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 28 Aug 2006 22:42:33 +0200 Subject: [Python-3000] Set literals In-Reply-To: <44F34D50.2080805@ewtllc.com> References: <44F34D50.2080805@ewtllc.com> Message-ID: Raymond Hettinger wrote: > Georg Brandl wrote: > >>In the meantime, I played around with the peepholer and tried to copy >>the "for x in tuple_or_list" optimization for sets. Results are in SF >>patch #1548082. >> > Did you mean "if x in tuple_or_list"? IIRC, there was some reason that > mutable lists were not supposed to be made into constants in for-loops. Yep, I meant the "if" case. >>* list comprehensions are special-cased because of the LIST_APPEND opcode. >> If there isn't going to be a special-cased SET_ADD, it's probably the >> easiest thing to transform {x for x in a} into set(x for x in a) in the >> AST step, with "set" of course always being the builtin set. > > Set comprehensions and list comprehensions are fundamentally the same > and therefore should have identical implementations. > > While transformation to a generator expression may seem like a good idea > now, I expect that you'll observe a two-fold performance hit and end-up > abandoning that approach in favor of the current LIST_APPEND approach. Of course, the LIST_APPEND approach mustn't be thrown out. > So it would probably be best to start by teaching the compiler to hide > the loop variable in a LIST_APPEND approach to list comprehensions and > then duplicate that approach for set comprehensions. Okay, I'll look into that direction. But first I'll try to remove duplication in ast.c, which should be possible since the syntax of listcomps, genexps and setcomps will be the same in Py3k. Georg From greg.ewing at canterbury.ac.nz Tue Aug 29 03:10:07 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 29 Aug 2006 13:10:07 +1200 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <20060827214348.1AF4.JCARLSON@uci.edu> References: <20060827191547.1AEB.JCARLSON@uci.edu> <44F255D6.2060002@canterbury.ac.nz> <20060827214348.1AF4.JCARLSON@uci.edu> Message-ID: <44F393EF.6070304@canterbury.ac.nz> Josiah Carlson wrote: > If every operation on a view returned a string copy, then what would be > the point of the view in the first place? String views would have all the same methods as a real string, so you could find(), index(), etc. while operating efficiently on the original data. To my mind this is preferable to having little-used optional arguments on an easily-forgotten subset of the string methods: you only have to remember one thing (how to create a view) rather than a bunch of random things. For some things, such as partition(), it might be worth having a variant that returned views instead of new strings. But it would be named differently, so you'd still know whether you were getting a view or not. On the other hand, this would introduce another random set of things to remember, i.e. which methods have view-returning variants. Although maybe it would be easier to remember them, being different methods rather than optional arguments to existing methods. Their existence would show up more clearly under introspection, for example. I'm not personally advocating one approach or the other here -- just pointing out an alternative that might be more acceptable to the BDFL. -- Greg From greg.ewing at canterbury.ac.nz Tue Aug 29 03:20:58 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 29 Aug 2006 13:20:58 +1200 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: References: <44F0107B.20205@iinet.net.au> Message-ID: <44F3967A.7010504@canterbury.ac.nz> Ron Adam wrote: > 1. Remove None stored as indices in slice objects. Depending on the step > value, Any Nones can be converted to 0 or -1 immediately, But None isn't the same as -1 in a slice. None means the end of the sequence, whereas -1 means one less than the end. I'm also not all that happy about forcing slice indices to be ints. Traditionally they are, but someone might want to define a class that uses them in a more general way. > Once the slice is created the Nones are not needed, valid index values > can be determined. I don't understand what you mean here. Slice objects themselves know nothing about what object they're going to be used to slice, so there's no way they can determine "valid index values" (or even *types* -- see above). -- Greg From greg.ewing at canterbury.ac.nz Tue Aug 29 03:29:38 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 29 Aug 2006 13:29:38 +1200 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: References: <20060827091000.1ADF.JCARLSON@uci.edu> <20060827184941.1AE8.JCARLSON@uci.edu> Message-ID: <44F39882.7090501@canterbury.ac.nz> Guido van Rossum wrote: > You seem to be utterly convinced of the superior performance of your > proposal without having done any measurements. For my part, superior performance isn't the main reason for considering string views. Rather it's the simplification that would result from replacing the current ad-hoc set of optional start-stop arguments with a single easy-to-remember idiom. What are your thoughts on that aspect? -- Greg From barry at python.org Tue Aug 29 04:10:14 2006 From: barry at python.org (Barry Warsaw) Date: Mon, 28 Aug 2006 22:10:14 -0400 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <44F3967A.7010504@canterbury.ac.nz> References: <44F0107B.20205@iinet.net.au> <44F3967A.7010504@canterbury.ac.nz> Message-ID: <0B0F9D04-7A68-454C-91F1-E011B862F92A@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 28, 2006, at 9:20 PM, Greg Ewing wrote: > I'm also not all that happy about forcing slice indices to > be ints. Traditionally they are, but someone might want to > define a class that uses them in a more general way. In fact, we do. Our application is simulated execution of source code, so there are cases where we have multiple values due to indeterminate conditionals. For example, we might know that the variable "x" has a value between 1 and 5, and we might know that "z" is a string with the value "hello there world". We want to be able to index z with Range(1,5) or slice it with say Range(1,5):Range (3,7). Our "z" value is represented by a string-like object that presents much of the standard Python string API, so it knows what to do with wacky slices with non-integer indices. We'd definitely want to preserve the ability to do that. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iQCVAwUBRPOiDnEjvBPtnXfVAQJAWwQAnna3MD7qKDY0SFYyTmN/Dnoy3nBrsP/l kemAn8Rqdj/3EL/iJuesI8N81BtH6CUp3BR0XzCUpKnsTCcyZxjo9M9d96aF18Jm A8K/QKfRfRRNUe0FuSOwiizRjw8m1yP9k8GNqkOI5IO2B5qt6R8dvyvmAdigWIsg tVFftyC+1Dw= =HZRO -----END PGP SIGNATURE----- From guido at python.org Tue Aug 29 04:24:59 2006 From: guido at python.org (Guido van Rossum) Date: Mon, 28 Aug 2006 19:24:59 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <44F39882.7090501@canterbury.ac.nz> References: <20060827091000.1ADF.JCARLSON@uci.edu> <20060827184941.1AE8.JCARLSON@uci.edu> <44F39882.7090501@canterbury.ac.nz> Message-ID: On 8/28/06, Greg Ewing wrote: > Guido van Rossum wrote: > > > You seem to be utterly convinced of the superior performance of your > > proposal without having done any measurements. > > For my part, superior performance isn't the main > reason for considering string views. Rather it's > the simplification that would result from replacing > the current ad-hoc set of optional start-stop > arguments with a single easy-to-remember idiom. > > What are your thoughts on that aspect? A few days ago I posted a bit of code using start-stop arguments and the same code written using string views. I didn't think the latter looked better. The start-stop arguments are far from arbitrary. They are only ad-hoc in the sense that they haven't been added to every API -- only where they're needed occasionally for performance. I still fear that a meme will develop that will encourage the use of views in many cases where they aren't needed; newbies are more prone to premature optimization than experienced developers, for whom this feature is intended, and newbies will more likely copy sections of code without understanding when/why various complexifications are necessary. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jcarlson at uci.edu Tue Aug 29 07:17:11 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Mon, 28 Aug 2006 22:17:11 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: References: <20060828120741.1AF7.JCARLSON@uci.edu> Message-ID: <20060828132232.1AFD.JCARLSON@uci.edu> "Guido van Rossum" wrote: > Those are all microbenchmarks. It's easy to prove the superiority of > an approach that way. But what about realistic applications? What if > your views don't end up saving memory or time for an application, but > still cost in terms of added complexity in all string operations? At no point has anyone claimed that every operation on views will always be faster than on strings. Nor has anyone claimed that it will always reduce memory consumption. However, for a not insignificant number of operations, views can be faster, offer better memory use, etc. I agree with Jean-Paul Calderone: "If the goal is to avoid speeding up Python programs because views are too complex or unpythonic or whatever, fine. But there isn't really any question as to whether or not this is a real optimization." "I don't think we see people overusing buffer() in ways which damage readability now, and buffer is even a builtin. Tossing something off into a module somewhere shouldn't really be a problem. To most people who don't actually know what they're doing, the idea to optimize code by reducing memory copying usually just doesn't come up." While there are examples where views can be slower, this is no different than the cases where deque is slower than list; sometimes some data structures are more applicable to the problem than others. As we have given users the choice to use a structure that has been optimized for certain behaviors (set and deque being primary examples), this is just another structure that offers improved performance for some operations. > Then I ask you to make it so that string views are 99.999% > indistinguishable from strings -- they have all the same methods, are > usable everywhere else, etc. For reference, I'm about 2 hours into it (including re-reading the documentation for Pyrex), and I've got [r]partition, [r]find, [r]index, [r|l]strip. I don't see significant difficulty implementing all other methods on views. Astute readers of the original implementation will note that I never check that the argument being passed in is a string; I use the buffer interface, so anything offering the buffer interface can be seen as a read-only view with string methods attached. Expect a full implementation later this week. - Josiah From jcarlson at uci.edu Tue Aug 29 07:31:37 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Mon, 28 Aug 2006 22:31:37 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <44F393EF.6070304@canterbury.ac.nz> References: <20060827214348.1AF4.JCARLSON@uci.edu> <44F393EF.6070304@canterbury.ac.nz> Message-ID: <20060828213428.1B00.JCARLSON@uci.edu> Greg Ewing wrote: > Josiah Carlson wrote: > > > If every operation on a view returned a string copy, then what would be > > the point of the view in the first place? > > String views would have all the same methods as a real > string, so you could find(), index(), etc. while operating > efficiently on the original data. To my mind this is > preferable to having little-used optional arguments on > an easily-forgotten subset of the string methods: you > only have to remember one thing (how to create a view) > rather than a bunch of random things. Indeed, and all of those are preserved if views always returned views, strings always returned strings, and one used the standard constructors for both to convert between them; eg. str(view) -> str and view(str) -> view. If one ever wanted a string from a view, rather than guessing which would be the correct one to return (during the implementation of views), always return a view when operating on views; it's a constant-time operation per view returned, and if the user really wanted a string, they can always call str on the returned values. > For some things, such as partition(), it might be worth > having a variant that returned views instead of new strings. > But it would be named differently, so you'd still know > whether you were getting a view or not. But wouldn't it be confusing if some methods on views returned views, while others returned strings? Wouldn't it make more sense if methods on an object, generally, returned instances of the same type (when it made sense)? This seems to be the case with almost every other object available in the Python standard library, with the notable exceptions of buffer and mmap. The slicing operations on mmaps make sense, as only recently did mmaps gain the ability to map partial files not starting from the beginning, but I'm not sure how well operating system would handle overlapping mmaps in the same process (especially during a larger mmap free; that could bork the heap address space). For buffer? I don't know. Buffer lacks basically every operation that I use on a string, so I have had little use for it except as a way of virtually slicing mmaps (for operations where I don't want to pass an offset argument) and handling socket writing of large blocks of data that it doesn't make sense to pre-slice*. > I'm not personally advocating one approach or the other > here -- just pointing out an alternative that might be > more acceptable to the BDFL. Thank you for the input (and thank you for Pyrex, it's making writing the view object quite easy), - Josiah * Arguably it never makes sense to pre-slice; connection speeds can vary so significantly that choosing a slice too small results in poor speeds and high numbers of system calls, and slices that are too large results in further slicing. Buffers or their equivalents win by a large margin. One trick is to slice the buffer (turning it into a string) when over half of the original string has been written. This results in using at most 2x the minimum amount of memory necessary, while also guaranteeing that you will only ever slice as much as the minimum pre-slicing operation would necessitate. From rrr at ronadam.com Tue Aug 29 07:47:24 2006 From: rrr at ronadam.com (Ron Adam) Date: Tue, 29 Aug 2006 00:47:24 -0500 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <44F3967A.7010504@canterbury.ac.nz> References: <44F0107B.20205@iinet.net.au> <44F3967A.7010504@canterbury.ac.nz> Message-ID: Greg Ewing wrote: > Ron Adam wrote: > >> 1. Remove None stored as indices in slice objects. Depending on the step >> value, Any Nones can be converted to 0 or -1 immediately, > > But None isn't the same as -1 in a slice. None means the end > of the sequence, whereas -1 means one less than the end. Yes, you are correct, thats one of those things I get caught on when I haven't had enough sleep. ;-) >>> 'abcdefg'[-1] 'g' >>> 'abcdefg'[0:-1] 'abcdef' And in addition to that... 0 is not the beginning if the step is -1. >>> 'abcdefg'[-1:0:-1] 'gfedcb' So None for the start index can be 0 or -1. But for the end index it can't be determined. In the first case above, the stop index would need to be one greater than -1 which is 0, and that causes a problem. In the second case above, the stop index would need to be one less than 0, then that would again cause a problem. > I'm also not all that happy about forcing slice indices to > be ints. Traditionally they are, but someone might want to > define a class that uses them in a more general way. Hmmmm, thanks for pointing this out. It sounds interesting and is something I hadn't thought about. In most cases I've seen only integers and None are ever used. And I'm used to seeing an exception if anything else is used. >>> 'abc'[1.0] Traceback (most recent call last): File "", line 1, in ? TypeError: string indices must be integers That is a string method that is generating the exception then and not the slice object? But then what about the slice.indices() method? It does generate exceptions. >>> slc = slice(1.0) >>> slc.indices(10) Traceback (most recent call last): File "", line 1, in ? TypeError: slice indices must be integers >> Once the slice is created the Nones are not needed, valid index values >> can be determined. > > I don't understand what you mean here. Slice objects themselves > know nothing about what object they're going to be used to > slice, so there's no way they can determine "valid index > values" (or even *types* -- see above). Ok, I hadn't considered the possibility of methods being defined to read the slice object. Do you know where I could find an example of that? Cheers, Ron From g.brandl at gmx.net Tue Aug 29 10:35:43 2006 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 29 Aug 2006 10:35:43 +0200 Subject: [Python-3000] Set literals In-Reply-To: References: Message-ID: Guido van Rossum wrote: > On 8/28/06, Georg Brandl wrote: >> At python.org/sf/1547796, there is a preliminary patch for Py3k set literals >> as specified in PEP 3100. > > Very cool! This is now checked in. > > Georg, can you do something about repr() of an empty set? This > currently produces "{}" while it should produce "set()". > >> Set comprehensions are not implemented. > > ETA? See patch #1548388. Cheers, Georg From greg.ewing at canterbury.ac.nz Tue Aug 29 11:14:18 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 29 Aug 2006 21:14:18 +1200 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: References: <44F0107B.20205@iinet.net.au> <44F3967A.7010504@canterbury.ac.nz> Message-ID: <44F4056A.6000009@canterbury.ac.nz> Ron Adam wrote: > And in addition to that... 0 is not the beginning if the step is -1. Negative steps are downright confusing however you think about them. :-) > In most cases I've seen only integers > and None are ever used. Numeric uses various strange things as array indexes, such as Ellipsis and NewAxis. I don't think it uses them as parts of slices, but I wouldn't be surprised if they came up with some such usage one day. > >>> 'abc'[1.0] > Traceback (most recent call last): > File "", line 1, in ? > TypeError: string indices must be integers > > That is a string method that is generating the exception then and not > the slice object? Yes, I expect so. From experimenting, it seems you can pass anything you want to slice(): Python 2.3 (#1, Aug 5 2003, 15:52:30) [GCC 3.1 20020420 (prerelease)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> slice(42.3, "banana", {}) slice(42.299999999999997, 'banana', {}) > But then what about the slice.indices() method? It does generate > exceptions. > > >>> slc = slice(1.0) > >>> slc.indices(10) > Traceback (most recent call last): > File "", line 1, in ? > TypeError: slice indices must be integers That particular method seems to require ints, yes. But a slice-using object can extract the start, stop and step and do whatever it wants with them. > Ok, I hadn't considered the possibility of methods being defined to read > the slice object. Do you know where I could find an example of that? -- Greg From rrr at ronadam.com Tue Aug 29 13:42:16 2006 From: rrr at ronadam.com (Ron Adam) Date: Tue, 29 Aug 2006 06:42:16 -0500 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <44F4056A.6000009@canterbury.ac.nz> References: <44F0107B.20205@iinet.net.au> <44F3967A.7010504@canterbury.ac.nz> <44F4056A.6000009@canterbury.ac.nz> Message-ID: Greg Ewing wrote: > Ron Adam wrote: > >> And in addition to that... 0 is not the beginning if the step is -1. > > Negative steps are downright confusing however you > think about them. :-) Yes, and it seems to me it could be easier. Of course that would mean changing something, and any solutions so far is in some way not perfect, depending on how you look at it. >> In most cases I've seen only integers >> and None are ever used. > > Numeric uses various strange things as array indexes, such > as Ellipsis and NewAxis. I don't think it uses them as parts > of slices, but I wouldn't be surprised if they came up with > some such usage one day. > >> >>> 'abc'[1.0] >> Traceback (most recent call last): >> File "", line 1, in ? >> TypeError: string indices must be integers >> >> That is a string method that is generating the exception then and not >> the slice object? > > Yes, I expect so. From experimenting, it seems you can > pass anything you want to slice(): Hmm..., after playing around with it, list and string methods probably call the slices indices() method from within __getitem__. So it's the slices indices() method that is producing the exceptions in both cases. If other objects allow something besides strings then they are probably accessing the stop, start, and step indices directly and are not going though slice.indices() to get at them. The way it seems to work is approximately ... s[i:j:k] -> s.__getitem__(x = slice(i,j,k)) # via the SLICE byte code i, j, k = x.indices(len(self)) # by s.__getitem__() The indices method does a type check and fixes the values depending on what length is. >> But then what about the slice.indices() method? It does generate >> exceptions. >> >> >>> slc = slice(1.0) >> >>> slc.indices(10) >> Traceback (most recent call last): >> File "", line 1, in ? >> TypeError: slice indices must be integers > > That particular method seems to require ints, yes. But > a slice-using object can extract the start, stop and step > and do whatever it wants with them. If you could sub class slice, then it would be possible to replace the indices method and turn off the int check and/or put in your own value check. but you wouldn't be able to use the i:j:k syntax. (maybe a good thing) sequence[myslice_object] # would work. You would still need to produce int like values if you use it with builtin types. But not with any of your own objects if you have supplied your own __getitem__ method. Cheers, Ron From guido at python.org Tue Aug 29 17:42:18 2006 From: guido at python.org (Guido van Rossum) Date: Tue, 29 Aug 2006 08:42:18 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <20060828132232.1AFD.JCARLSON@uci.edu> References: <20060828120741.1AF7.JCARLSON@uci.edu> <20060828132232.1AFD.JCARLSON@uci.edu> Message-ID: On 8/28/06, Josiah Carlson wrote: > > "Guido van Rossum" wrote: > > Those are all microbenchmarks. It's easy to prove the superiority of > > an approach that way. But what about realistic applications? What if > > your views don't end up saving memory or time for an application, but > > still cost in terms of added complexity in all string operations? > > At no point has anyone claimed that every operation on views will always > be faster than on strings. Nor has anyone claimed that it will always > reduce memory consumption. However, for a not insignificant number of > operations, views can be faster, offer better memory use, etc. > > > I agree with Jean-Paul Calderone: > > "If the goal is to avoid speeding up Python programs because views are > too complex or unpythonic or whatever, fine. But there isn't really any > question as to whether or not this is a real optimization." And without qualification that is as false as anything you've said. > "I don't think we see people overusing buffer() in ways which damage > readability now, and buffer is even a builtin. Tossing something off > into a module somewhere shouldn't really be a problem. To most people > who don't actually know what they're doing, the idea to optimize code > by reducing memory copying usually just doesn't come up." Another "yes they do -- no they don't" argument. As I've said repeatedly before, optimizations are likely to be copied without being understood by newbies. The buffer() built-in has such a poor reputation and API that it doesn't get much play; but a new "views" feature that will magically make all your string processing go faster surely will. > While there are examples where views can be slower, this is no different > than the cases where deque is slower than list; sometimes some data > structures are more applicable to the problem than others. As we have > given users the choice to use a structure that has been optimized for > certain behaviors (set and deque being primary examples), this is just > another structure that offers improved performance for some operations. As long as it is very carefully presented as such I have much less of a problem with it. Earlier proposals were implying that all string ops should return views whenever possibe. That, I believe, is never going to fly, and that's where my main objection lies. Having views in a library module alleviates many of my objections. While I still worry that it will be overused, deque doesn't seem to be overused, so perhaps I should relax. > > Then I ask you to make it so that string views are 99.999% > > indistinguishable from strings -- they have all the same methods, are > > usable everywhere else, etc. > > For reference, I'm about 2 hours into it (including re-reading the > documentation for Pyrex), and I've got [r]partition, [r]find, [r]index, > [r|l]strip. I don't see significant difficulty implementing all other > methods on views. > > Astute readers of the original implementation will note that I never > check that the argument being passed in is a string; I use the buffer > interface, so anything offering the buffer interface can be seen as a > read-only view with string methods attached. Expect a full > implementation later this week. Good luck! -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jcarlson at uci.edu Tue Aug 29 18:24:29 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Tue, 29 Aug 2006 09:24:29 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: References: <20060828132232.1AFD.JCARLSON@uci.edu> Message-ID: <20060829091403.1B09.JCARLSON@uci.edu> "Guido van Rossum" wrote: > On 8/28/06, Josiah Carlson wrote: > > While there are examples where views can be slower, this is no different > > than the cases where deque is slower than list; sometimes some data > > structures are more applicable to the problem than others. As we have > > given users the choice to use a structure that has been optimized for > > certain behaviors (set and deque being primary examples), this is just > > another structure that offers improved performance for some operations. > > As long as it is very carefully presented as such I have much less of > a problem with it. > > Earlier proposals were implying that all string ops should return > views whenever possibe. That, I believe, is never going to fly, and > that's where my main objection lies. String operations always returning views would be arguably insane. I hope no one was recommending it (I certainly wasn't, but if my words were confusing on that part, I apologize); strings are strings, and views should only be constructed explicitly. After you have a view, I'm of the opinion that view operations should return views, except in the case where you explicitly ask for a string via str(view). > Having views in a library module alleviates many of my objections. > While I still worry that it will be overused, deque doesn't seem to be > overused, so perhaps I should relax. While it would be interesting (as a social experiment) for views to be in the __builtins__ module (to test abuse theories), it is probably much better for it to sit in the collections module. > > > Then I ask you to make it so that string views are 99.999% > > > indistinguishable from strings -- they have all the same methods, are > > > usable everywhere else, etc. > > > > For reference, I'm about 2 hours into it (including re-reading the > > documentation for Pyrex), and I've got [r]partition, [r]find, [r]index, > > [r|l]strip. I don't see significant difficulty implementing all other > > methods on views. > > > > Astute readers of the original implementation will note that I never > > check that the argument being passed in is a string; I use the buffer > > interface, so anything offering the buffer interface can be seen as a > > read-only view with string methods attached. Expect a full > > implementation later this week. > > Good luck! Thank you! - Josiah From fredrik at pythonware.com Tue Aug 29 18:32:59 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Tue, 29 Aug 2006 18:32:59 +0200 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <20060827184941.1AE8.JCARLSON@uci.edu> References: <20060827091000.1ADF.JCARLSON@uci.edu> <20060827184941.1AE8.JCARLSON@uci.edu> Message-ID: Josiah Carlson wrote: > 1. Let us say I was parsing XML. Rather than allocating a bunch of small > strings for the various tags, attributes, and data, I could instead > allocate a bunch of string views with pointers into the one larger XML > string. when did you last write an XML parser ? From jcarlson at uci.edu Tue Aug 29 19:30:59 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Tue, 29 Aug 2006 10:30:59 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: References: <20060827184941.1AE8.JCARLSON@uci.edu> Message-ID: <20060829102307.1B0F.JCARLSON@uci.edu> Fredrik Lundh wrote: > Josiah Carlson wrote: > > > 1. Let us say I was parsing XML. Rather than allocating a bunch of small > > strings for the various tags, attributes, and data, I could instead > > allocate a bunch of string views with pointers into the one larger XML > > string. > > when did you last write an XML parser ? Comparing what I have written as an XML parser to xml.dom, xml.sax, ElementTree, or others, is a bit like comparing a go-kart with an automobile. That is to say, it's been a few years, and it was to scratch an itch for a particular application, and no other xml parser existed at the time for my particular applicaion, that I knew of. Presumably by your question, you think that the particular example I've offered is bollocks. Sounds reasonable, I withdraw it. - Josiah From guido at python.org Tue Aug 29 19:31:49 2006 From: guido at python.org (Guido van Rossum) Date: Tue, 29 Aug 2006 10:31:49 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <20060829091403.1B09.JCARLSON@uci.edu> References: <20060828132232.1AFD.JCARLSON@uci.edu> <20060829091403.1B09.JCARLSON@uci.edu> Message-ID: On 8/29/06, Josiah Carlson wrote: > > "Guido van Rossum" wrote: > > On 8/28/06, Josiah Carlson wrote: > > > While there are examples where views can be slower, this is no different > > > than the cases where deque is slower than list; sometimes some data > > > structures are more applicable to the problem than others. As we have > > > given users the choice to use a structure that has been optimized for > > > certain behaviors (set and deque being primary examples), this is just > > > another structure that offers improved performance for some operations. > > > > As long as it is very carefully presented as such I have much less of > > a problem with it. > > > > Earlier proposals were implying that all string ops should return > > views whenever possibe. That, I believe, is never going to fly, and > > that's where my main objection lies. > > String operations always returning views would be arguably insane. I > hope no one was recommending it (I certainly wasn't, but if my words > were confusing on that part, I apologize); strings are strings, and > views should only be constructed explicitly. I don't know about you, but others have definitely been arguing for that passionately in the past. > After you have a view, I'm of the opinion that view operations should > return views, except in the case where you explicitly ask for a string > via str(view). I think it's a mixed bag, and depends on the semantics of the operation. For operations that are guaranteed to return a substring (like slicing or partition() -- are there even others?) I think views should return views (on the original buffer, never views on views). For operations that may be forced to return a new string (e.g. concatenation) I think the return value should always be a new string, even if it could be optimized. So for example if v is a view and s is a string, v+s should always return a new string, even if s is empty. BTW beware that in py3k, strings (which will always be unicode strings) won't support the buffer API -- bytes objects will. Would you want views on strings or ob bytes or on both? > > Having views in a library module alleviates many of my objections. > > While I still worry that it will be overused, deque doesn't seem to be > > overused, so perhaps I should relax. > > While it would be interesting (as a social experiment) for views to be > in the __builtins__ module (to test abuse theories), it is probably much > better for it to sit in the collections module. I'm still very strong on having only a small number of data types truly built-in; too much choice is much more likely to encourage the wrong choice, or reduced maintainability. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik at pythonware.com Tue Aug 29 19:44:27 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Tue, 29 Aug 2006 19:44:27 +0200 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <20060829102307.1B0F.JCARLSON@uci.edu> References: <20060827184941.1AE8.JCARLSON@uci.edu> <20060829102307.1B0F.JCARLSON@uci.edu> Message-ID: Josiah Carlson wrote: >> when did you last write an XML parser ? > > Comparing what I have written as an XML parser to xml.dom, xml.sax, > ElementTree, or others, is a bit like comparing a go-kart with an > automobile. That is to say, it's been a few years, and it was to > scratch an itch for a particular application, and no other xml parser > existed at the time for my particular applicaion, that I knew of. > > Presumably by your question, you think that the particular example I've > offered is bollocks. not necessarily, but there are lots of issues involved when doing high-performance XML stuff, and I'm not sure views would help quite as much as one might think. (writing and tuning cET was a great way to learn that not everything that you think you know about C performance applies to C code running inside the Python interpreter...) From jcarlson at uci.edu Tue Aug 29 21:04:35 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Tue, 29 Aug 2006 12:04:35 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: References: <20060829091403.1B09.JCARLSON@uci.edu> Message-ID: <20060829111904.1B12.JCARLSON@uci.edu> "Guido van Rossum" wrote: > On 8/29/06, Josiah Carlson wrote: > > String operations always returning views would be arguably insane. I > > hope no one was recommending it (I certainly wasn't, but if my words > > were confusing on that part, I apologize); strings are strings, and > > views should only be constructed explicitly. > > I don't know about you, but others have definitely been arguing for > that passionately in the past. > > > After you have a view, I'm of the opinion that view operations should > > return views, except in the case where you explicitly ask for a string > > via str(view). > > I think it's a mixed bag, and depends on the semantics of the operation. > > For operations that are guaranteed to return a substring (like slicing > or partition() -- are there even others?) I think views should return > views (on the original buffer, never views on views). I agree. > For operations that may be forced to return a new string (e.g. > concatenation) I think the return value should always be a new string, > even if it could be optimized. So for example if v is a view and s is > a string, v+s should always return a new string, even if s is empty. I'm on the fence about this. On the one hand, I understand the desireability of being able to get the underlying string object without difficulty. On the other hand, its performance characteristics could be confusing to users of Python who may have come to expect that "st+''" is a constant time operation, regardless of the length of st. The non-null string addition case, I agree that it could make some sense to return the string (considering you will need to copy it anyways), but if one returned a view on that string, it would be more consistant with other methods, and getting the string back via str(view) would offer equivalent functionality. It would also require the user to be explicit about what they really want; though there is the argument that if I'm passing a string as an operand to addition with a view, I actually want a string, so give me one. I'm going to implement it as returning a view, but leave commented sections for some of them to return a string. > BTW beware that in py3k, strings (which will always be unicode > strings) won't support the buffer API -- bytes objects will. Would you > want views on strings or ob bytes or on both? That's tricky. Views on bytes will come for free, like array, mmap, and anything else that supports the buffer protocol. It requires the removal of the __hash__ method for mutables, but that is certainly expected. Right now, a large portion of standard library code use strings and string methods to handle parsing, etc. Removing immutable byte strings from 3.x seems likely to result in a huge amount of rewriting necessary to utilize either bytes or text (something I have mentioned before). I believe that with views on bytes (and/or sufficient bytes methods), the vast majority would likely result in the use of bytes. Having a text view for such situtions that works with the same kinds of semantics as the bytes view would be nice from a purity/convenience standpoint, and only needing to handle a single data type (text) could make its implementation easier. I don't have any short-term plans of writing text views, but it may be somewhat easier to do after I'm done with string/byte views. - Josiah From tomerfiliba at gmail.com Tue Aug 29 21:43:57 2006 From: tomerfiliba at gmail.com (tomer filiba) Date: Tue, 29 Aug 2006 21:43:57 +0200 Subject: [Python-3000] regex literals? Message-ID: <1d85506f0608291243g2cdfd6f6reb0eb82a5c73fab@mail.gmail.com> i can't say i'm too fond of this, but i thought of bringing this up. most scripting languages (perl, ruby, and boo, to name some) have regular expressions as language literals. since such languages are heavily used for string manipulation, it might seem like a good idea to add them at the syntax level: e"[A-Za-z_][A-Za-z_0-9]*" i thought of prefixing "e" for "regular *e*xpression". could also be "p" for pattern. it's very simple -- regex literal strings are just passed to re.compile(), upon creation, i.e.: a = e"[A-Z]" is the same as a = re.compile("[A-Z]") what is it good for? if e"[A-Z]".match("Q"): print "success" since strings (as well as regex strings) are immutable, the compiler can re.compile them at compile time, as an optimization. again, i can't say i'like regex literals, and i don't think it would be a productivity boost (although you would no longer need to import re and re.compile() your patterns)... but i wanted to bring it to your consideration. -tomer -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060829/f6a1c658/attachment.htm From guido at python.org Tue Aug 29 21:46:09 2006 From: guido at python.org (Guido van Rossum) Date: Tue, 29 Aug 2006 12:46:09 -0700 Subject: [Python-3000] regex literals? In-Reply-To: <1d85506f0608291243g2cdfd6f6reb0eb82a5c73fab@mail.gmail.com> References: <1d85506f0608291243g2cdfd6f6reb0eb82a5c73fab@mail.gmail.com> Message-ID: Do I even have to say -1? Regular expressions shouldn't become the front and central of Python's text processing tools. --Guido On 8/29/06, tomer filiba wrote: > i can't say i'm too fond of this, but i thought of bringing this up. most > scripting > languages (perl, ruby, and boo, to name some) have regular expressions as > language literals. since such languages are heavily used for string > manipulation, it might seem like a good idea to add them at the syntax > level: > > e"[A-Za-z_][A-Za-z_0-9]*" > > i thought of prefixing "e" for "regular *e*xpression". could also be "p" for > pattern. > it's very simple -- regex literal strings are just passed to re.compile(), > upon > creation, i.e.: > a = e"[A-Z]" > > is the same as > a = re.compile("[A-Z]") > > what is it good for? > > if e"[A-Z]".match("Q"): > print "success" > > since strings (as well as regex strings) are immutable, the compiler can > re.compile them at compile time, as an optimization. > > again, i can't say i'like regex literals, and i don't think it would be a > productivity boost (although you would no longer need to import re and > re.compile() your patterns)... but i wanted to bring it to your > consideration. > > > -tomer > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/guido%40python.org > > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Aug 29 21:55:21 2006 From: guido at python.org (Guido van Rossum) Date: Tue, 29 Aug 2006 12:55:21 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <20060829111904.1B12.JCARLSON@uci.edu> References: <20060829091403.1B09.JCARLSON@uci.edu> <20060829111904.1B12.JCARLSON@uci.edu> Message-ID: On 8/29/06, Josiah Carlson wrote: > "Guido van Rossum" wrote: > > For operations that may be forced to return a new string (e.g. > > concatenation) I think the return value should always be a new string, > > even if it could be optimized. So for example if v is a view and s is > > a string, v+s should always return a new string, even if s is empty. > > I'm on the fence about this. On the one hand, I understand the > desireability of being able to get the underlying string object without > difficulty. On the other hand, its performance characteristics could be > confusing to users of Python who may have come to expect that "st+''" is > a constant time operation, regardless of the length of st. Well views aren't strings. And s+t (for s and t strings) normally takes O(len(s)+len(t)) time. The type consistency and predictability is more important to me. I didn't mean to recommend v+"" as the best way to turn a view v into a string; that would be str(v). > The non-null string addition case, I agree that it could make some sense > to return the string (considering you will need to copy it anyways), but > if one returned a view on that string, it would be more consistant with > other methods, and getting the string back via str(view) would offer > equivalent functionality. It would also require the user to be explicit > about what they really want; though there is the argument that if I'm > passing a string as an operand to addition with a view, I actually want > a string, so give me one. I strongly believe you're mistaken here. I don't think users will hvae any trouble with the concept "operations that don't (necessarily) return a substring will return a new string. > I'm going to implement it as returning a view, but leave commented > sections for some of them to return a string. > > > BTW beware that in py3k, strings (which will always be unicode > > strings) won't support the buffer API -- bytes objects will. Would you > > want views on strings or ob bytes or on both? > > That's tricky. Views on bytes will come for free, like array, mmap, and > anything else that supports the buffer protocol. It requires the removal > of the __hash__ method for mutables, but that is certainly expected. The question is, how useful is the buffer protocol going to be? We don't know yet. > Right now, a large portion of standard library code use strings and > string methods to handle parsing, etc. Removing immutable byte strings > from 3.x seems likely to result in a huge amount of rewriting necessary > to utilize either bytes or text (something I have mentioned before). I > believe that with views on bytes (and/or sufficient bytes methods), the > vast majority would likely result in the use of bytes. Um, unless you consider decoding a GIF file "parsing", parsing would seem to naturally fall in the realm of text (characters), not bytes. > Having a text view for such situtions that works with the same kinds of > semantics as the bytes view would be nice from a purity/convenience > standpoint, and only needing to handle a single data type (text) could > make its implementation easier. I don't have any short-term plans of > writing text views, but it may be somewhat easier to do after I'm done > with string/byte views. Unifying the semantics between byte views and text views will be difficult since bytes are mutable. I recommend that you have a good look at the bytes implementation in the p3yk branch. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jimjjewett at gmail.com Tue Aug 29 23:01:08 2006 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 29 Aug 2006 17:01:08 -0400 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: References: <20060829091403.1B09.JCARLSON@uci.edu> <20060829111904.1B12.JCARLSON@uci.edu> Message-ID: On 8/29/06, Guido van Rossum wrote: > On 8/29/06, Josiah Carlson wrote: > > "Guido van Rossum" wrote: > The type consistency and predictability is more important to me. Why is it essential that string views be a different type, rather than an internal implementation detail, like long vs int? Today's strings can already return a new object or an existing one which happens to be equal. Is this just a matter of efficiency, or are you making a fundamental distinction? -jJ From jcarlson at uci.edu Tue Aug 29 23:27:19 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Tue, 29 Aug 2006 14:27:19 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: References: <20060829111904.1B12.JCARLSON@uci.edu> Message-ID: <20060829132924.1B15.JCARLSON@uci.edu> "Guido van Rossum" wrote: > On 8/29/06, Josiah Carlson wrote: > > "Guido van Rossum" wrote: > > > For operations that may be forced to return a new string (e.g. > > > concatenation) I think the return value should always be a new string, > > > even if it could be optimized. So for example if v is a view and s is > > > a string, v+s should always return a new string, even if s is empty. > > > > I'm on the fence about this. On the one hand, I understand the > > desireability of being able to get the underlying string object without > > difficulty. On the other hand, its performance characteristics could be > > confusing to users of Python who may have come to expect that "st+''" is > > a constant time operation, regardless of the length of st. > > Well views aren't strings. And s+t (for s and t strings) normally > takes O(len(s)+len(t)) time. Right, but my hope is for users who want to use views to start using them and be able to not be surprised by what they get back. You have previously stated that changing return types based on a flag variable is a horrible idea. I agree, as providing a flag variable to change return types is surprising. This is changing return types based on variable type, which could be argued as an implicit flag variable, and perhaps subject to the same surprising behavior == bad criteria that has stopped other such suggestions in the past. > The type consistency and predictability is more important to me. Is view + -> view not consistant or predictable? > > The non-null string addition case, I agree that it could make some sense > > to return the string (considering you will need to copy it anyways), but > > if one returned a view on that string, it would be more consistant with > > other methods, and getting the string back via str(view) would offer > > equivalent functionality. It would also require the user to be explicit > > about what they really want; though there is the argument that if I'm > > passing a string as an operand to addition with a view, I actually want > > a string, so give me one. > > I strongly believe you're mistaken here. I don't think users will hvae > any trouble with the concept "operations that don't (necessarily) > return a substring will return a new string. I could certainly be, but offering both isn't difficult. > > I'm going to implement it as returning a view, but leave commented > > sections for some of them to return a string. > > > > > BTW beware that in py3k, strings (which will always be unicode > > > strings) won't support the buffer API -- bytes objects will. Would you > > > want views on strings or ob bytes or on both? > > > > That's tricky. Views on bytes will come for free, like array, mmap, and > > anything else that supports the buffer protocol. It requires the removal > > of the __hash__ method for mutables, but that is certainly expected. > > The question is, how useful is the buffer protocol going to be? We > don't know yet. Pretty useful apparently, bytes support decoding to unicode through the use of its own buffer interface, or really, it uses the decode machinery that takes a char* and length. On the other hand, CharBuffer (as opposed to ReadBuffer and WriteBuffer[1]) isn't really usable, as the reader has no idea about the *size* and *type* of the characters it is getting back (8, 16, or 32 bit integers or characters, even 16, 32, or 64 bit floats, etc.). Maybe fixing CharBuffer, or creating a different interface (deprecating CharBuffer) would make sense, and would offer the numarray folks their 'array interface'. > > Right now, a large portion of standard library code use strings and > > string methods to handle parsing, etc. Removing immutable byte strings > > from 3.x seems likely to result in a huge amount of rewriting necessary > > to utilize either bytes or text (something I have mentioned before). I > > believe that with views on bytes (and/or sufficient bytes methods), the > > vast majority would likely result in the use of bytes. > > Um, unless you consider decoding a GIF file "parsing", parsing would > seem to naturally fall in the realm of text (characters), not bytes. I'm using my own definition of parsing again, I apologize. What I meant by parsing is anything that currently performs processing of Python 2.x strings to determine what it is supposed to do. From http header processing (sending and recieving), email processing, socket protocols in smtplib, poplib, asynchat, etc. All currently use Python 2.x strings. They will need to be transitioned to 3.x if 2.x byte strings are removed, and that transition will be quite a bit of work, regardless of whether bytes get some string methods, or we wrap bytes to provide string methods, but significantly more if neither is done. > > Having a text view for such situtions that works with the same kinds of > > semantics as the bytes view would be nice from a purity/convenience > > standpoint, and only needing to handle a single data type (text) could > > make its implementation easier. I don't have any short-term plans of > > writing text views, but it may be somewhat easier to do after I'm done > > with string/byte views. > > Unifying the semantics between byte views and text views will be > difficult since bytes are mutable. The only significant nit is that the location of the underlying buffer pointer changes with byte views. This is already handled in a generally satisfactory way in 2.x buffers. > I recommend that you have a good look at the bytes implementation in > the p3yk branch. It is implemented the way I would have expected. - Josiah [1] http://www.python.org/doc/current/api/abstract-buffer.html From barry at python.org Tue Aug 29 23:37:16 2006 From: barry at python.org (Barry Warsaw) Date: Tue, 29 Aug 2006 17:37:16 -0400 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: References: <20060829091403.1B09.JCARLSON@uci.edu> <20060829111904.1B12.JCARLSON@uci.edu> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 29, 2006, at 5:01 PM, Jim Jewett wrote: > Why is it essential that string views be a different type, rather than > an internal implementation detail, like long vs int? Today's strings > can already return a new object or an existing one which happens to be > equal. > > Is this just a matter of efficiency, or are you making a fundamental > distinction? This is a good question. I haven't been following this thread in detail, but ISTM that users shouldn't care and that the object itself should do whatever makes the most sense for the most general audience. I'm eager to never have to worry about 8-bit strings vs. unicode strings, how they mix and match, and all the nasty corners when they interact. I'd hate to trade that for the worry about whether I have a string or a string-view. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iQCVAwUBRPSzjXEjvBPtnXfVAQJ3WAQAuLgT0yOfIo7gNcg7BS0hvKMb33e9Pbdi IQdlP0seSt6Q0GXMnCk2DPJdXHAap2co/RnqRXuavqAcJScYBwM626tHppjrgoDV fcQ6FBn1oshsOSChKIT1tVqiudPiEStWaks6d/xg4yP1EAOEbqEhaGoR3FM7e+Vh h/d6rtxYaXk= =XQCo -----END PGP SIGNATURE----- From guido at python.org Tue Aug 29 23:42:43 2006 From: guido at python.org (Guido van Rossum) Date: Tue, 29 Aug 2006 14:42:43 -0700 Subject: [Python-3000] Small Py3k task: fix modulefinder.py Message-ID: Is anyone familiar enough with modulefinder.py to fix its breakage in Py3k? It chokes in a nasty way (exceeding the recursion limit) on the relative import syntax. I suspect this is also a problem for 2.5, when people use that syntax; hence the cross-post. There's no unittest for modulefinder.py, but I believe py2exe depends on it (and of course freeze.py, but who uses that still?) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Aug 29 23:51:17 2006 From: guido at python.org (Guido van Rossum) Date: Tue, 29 Aug 2006 14:51:17 -0700 Subject: [Python-3000] Premature optimization and all that Message-ID: Over lunch with Neal we came upon the topic of optimization and Python 3000. It is our strong opinion that in this stage of the Py3k project we should focus on getting the new language spec and implementation feature-complete, without worrying much about optimizations. We're doing major feature-level surgery, e.g. int/long unification, str/unicode unification, range/xrange unification, keys() views, and many others. Keeping everything working is hard work in and of itself; having to keep it as fast as it was through all the transformations just makes it that much harder. if Python 3.0 alpha 1 is twice as slow as 2.5, that's okay with me; we will have another year to do performance measurements and add new optimizations in the ramp-up for 3.0 final. Even if 3.0 final is a bit slower than 2.5 it doesn't bother me too much; we can continue the tweaks during the 3.1 and 3.2 development cycle. Note: I'm note advicating wholesale proactive *removal* of optimizations. However, I'm allowing new features to slow down performance temporarily while we get all the features in place. I expect that the optimization possibilities and needs will be different than for 2.x, since some of the fundamental data types will be so different. In particular, I hope that Martin's int/long unification code can land ASAP; it's much better to have this feature landed in the p3yk branch, where everyone can bang on it easily, and learn how this affects user code, even if it makes everything twice as slow. This seems much preferable over having it languish in a separate branch until it's perfect. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jcarlson at uci.edu Tue Aug 29 23:58:41 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Tue, 29 Aug 2006 14:58:41 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: References: Message-ID: <20060829145412.1B18.JCARLSON@uci.edu> Barry Warsaw wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Aug 29, 2006, at 5:01 PM, Jim Jewett wrote: > > > Why is it essential that string views be a different type, rather than > > an internal implementation detail, like long vs int? Today's strings > > can already return a new object or an existing one which happens to be > > equal. > > > > Is this just a matter of efficiency, or are you making a fundamental > > distinction? > > This is a good question. I haven't been following this thread in > detail, but ISTM that users shouldn't care and that the object itself > should do whatever makes the most sense for the most general > audience. I'm eager to never have to worry about 8-bit strings vs. > unicode strings, how they mix and match, and all the nasty corners > when they interact. I'd hate to trade that for the worry about > whether I have a string or a string-view. If views are not automatically returned for methods on strings, then you won't have to worry about views unless you explicitly construct them. Also, you won't ever have a string-view in py3k, it will be a bytes-view, and if you want to do something like bts.[find|index|partition](sub), you are going to need the bytes-view, as bytes don't offer those methods natively. - Josiah From guido at python.org Wed Aug 30 00:04:09 2006 From: guido at python.org (Guido van Rossum) Date: Tue, 29 Aug 2006 15:04:09 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: References: <20060829091403.1B09.JCARLSON@uci.edu> <20060829111904.1B12.JCARLSON@uci.edu> Message-ID: On 8/29/06, Jim Jewett wrote: > On 8/29/06, Guido van Rossum wrote: > > On 8/29/06, Josiah Carlson wrote: > > > "Guido van Rossum" wrote: > > > The type consistency and predictability is more important to me. > > Why is it essential that string views be a different type, rather than > an internal implementation detail, like long vs int? Today's strings > can already return a new object or an existing one which happens to be > equal. > > Is this just a matter of efficiency, or are you making a fundamental > distinction? Sigh. Josiah just said he wouldn't dream of proposing that all string ops should return string views. You're not helping by questioning even that. The short answer is, if you don't have control over when a view on an existing string is returned and when a copy, there are easy to see worst-case behaviors that are worse than the problem they are trying to fix. For example, you'd get a whole series of problems like this one: res = [] for i in range(1000): s = " "*1000000 # a new 1MB string res.append(s[:1]) # a one-character string that is a view on s and hence keeps s alive if s[:1] were to return a view on s unconditionally the above loop would accumumate roughly 1 GB in wasted space. To fix this you'll have to add heuristics and all sorts of other things and that will complicate the string implementation and hence slow it down. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg.ewing at canterbury.ac.nz Wed Aug 30 02:35:17 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 30 Aug 2006 12:35:17 +1200 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: References: <20060828120741.1AF7.JCARLSON@uci.edu> <20060828132232.1AFD.JCARLSON@uci.edu> Message-ID: <44F4DD45.6060809@canterbury.ac.nz> Guido van Rossum wrote: > Having views in a library module alleviates many of my objections. > While I still worry that it will be overused, deque doesn't seem to be > overused, so perhaps I should relax. Another thought is that there will already be ways in which Py3k views could lead to inefficiencies if they're not used carefully. A keys() view of a dict, for example, will keep the values of the dict alive as well as the keys, unlike the existing keys() method. -- Greg From guido at python.org Wed Aug 30 02:59:00 2006 From: guido at python.org (Guido van Rossum) Date: Tue, 29 Aug 2006 17:59:00 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <44F4DD45.6060809@canterbury.ac.nz> References: <20060828120741.1AF7.JCARLSON@uci.edu> <20060828132232.1AFD.JCARLSON@uci.edu> <44F4DD45.6060809@canterbury.ac.nz> Message-ID: On 8/29/06, Greg Ewing wrote: > Guido van Rossum wrote: > > > Having views in a library module alleviates many of my objections. > > While I still worry that it will be overused, deque doesn't seem to be > > overused, so perhaps I should relax. > > Another thought is that there will already be ways > in which Py3k views could lead to inefficiencies if > they're not used carefully. A keys() view of a dict, > for example, will keep the values of the dict alive > as well as the keys, unlike the existing keys() > method. Right; but I don't expect that such a keys() view will typically have a lifetime longer than the dict. For substrings OTOH that's quite common (parsing etc.). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg.ewing at canterbury.ac.nz Wed Aug 30 03:37:06 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 30 Aug 2006 13:37:06 +1200 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <20060829145412.1B18.JCARLSON@uci.edu> References: <20060829145412.1B18.JCARLSON@uci.edu> Message-ID: <44F4EBC2.8020401@canterbury.ac.nz> Josiah Carlson wrote: > If views are not automatically returned for methods on strings, then you > won't have to worry about views unless you explicitly construct them. Although you might have to worry about someone else handing you a view when you weren't expecting it. Minimising the chance of that is a reason for operations on views not to return further views by default. -- Greg From greg.ewing at canterbury.ac.nz Wed Aug 30 03:44:57 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 30 Aug 2006 13:44:57 +1200 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <20060829132924.1B15.JCARLSON@uci.edu> References: <20060829111904.1B12.JCARLSON@uci.edu> <20060829132924.1B15.JCARLSON@uci.edu> Message-ID: <44F4ED99.2060408@canterbury.ac.nz> Josiah Carlson wrote: > This is changing return types based on variable type, How do you make that out? It seems the opposite to me -- Guido is saying that the return type of s+t should *not* depend on whether s or t happens to be a view rather than a real string. -- Greg From greg.ewing at canterbury.ac.nz Wed Aug 30 03:45:26 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 30 Aug 2006 13:45:26 +1200 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: References: <20060829091403.1B09.JCARLSON@uci.edu> <20060829111904.1B12.JCARLSON@uci.edu> Message-ID: <44F4EDB6.1000303@canterbury.ac.nz> Jim Jewett wrote: > Why is it essential that string views be a different type, rather than > an internal implementation detail, like long vs int? We're talking about a more abstract notion of "type" here. Strings and views are different things with different performance characteristics, so it's important to know which one you're getting, whether they're implemented as different type()s or not. -- Greg From greg.ewing at canterbury.ac.nz Wed Aug 30 03:46:17 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 30 Aug 2006 13:46:17 +1200 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <20060829111904.1B12.JCARLSON@uci.edu> References: <20060829091403.1B09.JCARLSON@uci.edu> <20060829111904.1B12.JCARLSON@uci.edu> Message-ID: <44F4EDE9.1060700@canterbury.ac.nz> Josiah Carlson wrote: > On the other hand, its performance characteristics could be > confusing to users of Python who may have come to expect that "st+''" is > a constant time operation, regardless of the length of st. Even if that's always true, I'm not sure it's really a useful thing to know. How often do you write a string concatenation expecting that one of the operands will almost always be empty? I can count the number of times I've done that on the fingers of one elbow. -- Greg From aahz at pythoncraft.com Wed Aug 30 04:16:25 2006 From: aahz at pythoncraft.com (Aahz) Date: Tue, 29 Aug 2006 19:16:25 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: References: <20060828120741.1AF7.JCARLSON@uci.edu> <20060828132232.1AFD.JCARLSON@uci.edu> <44F4DD45.6060809@canterbury.ac.nz> Message-ID: <20060830021625.GA19157@panix.com> On Tue, Aug 29, 2006, Guido van Rossum wrote: > On 8/29/06, Greg Ewing wrote: >> Guido van Rossum wrote: >>> >>> Having views in a library module alleviates many of my objections. >>> While I still worry that it will be overused, deque doesn't seem to >>> be overused, so perhaps I should relax. >> >> Another thought is that there will already be ways in which Py3k >> views could lead to inefficiencies if they're not used carefully. A >> keys() view of a dict, for example, will keep the values of the dict >> alive as well as the keys, unlike the existing keys() method. > > Right; but I don't expect that such a keys() view will typically have > a lifetime longer than the dict. That's true only for newer code that correctly uses sets instead of dicts -- but we've had this argument before. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." --Brian W. Kernighan From guido at python.org Wed Aug 30 05:16:26 2006 From: guido at python.org (Guido van Rossum) Date: Tue, 29 Aug 2006 20:16:26 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <44F4EBC2.8020401@canterbury.ac.nz> References: <20060829145412.1B18.JCARLSON@uci.edu> <44F4EBC2.8020401@canterbury.ac.nz> Message-ID: On 8/29/06, Greg Ewing wrote: > Josiah Carlson wrote: > > > If views are not automatically returned for methods on strings, then you > > won't have to worry about views unless you explicitly construct them. > > Although you might have to worry about someone else > handing you a view when you weren't expecting it. Minimising > the chance of that is a reason for operations on views > not to return further views by default. In support of Josiah here: I think that's the caller's responsibility then. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Aug 30 05:18:06 2006 From: guido at python.org (Guido van Rossum) Date: Tue, 29 Aug 2006 20:18:06 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <44F4ED99.2060408@canterbury.ac.nz> References: <20060829111904.1B12.JCARLSON@uci.edu> <20060829132924.1B15.JCARLSON@uci.edu> <44F4ED99.2060408@canterbury.ac.nz> Message-ID: On 8/29/06, Greg Ewing wrote: > Josiah Carlson wrote: > > This is changing return types based on variable type, > > How do you make that out? It seems the opposite to me -- > Guido is saying that the return type of s+t should *not* > depend on whether s or t happens to be a view rather than > a real string. No, I never meant to say that. There's nothing wrong with the type of x+y depending on the types of x and y. I meant that s+v, v+s and v+w (s being a string, v and w being views) should all return strings because -- in general -- they cannot always be views, and I don't want the return type to depend on the *value* of the inputs. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From talin at acm.org Wed Aug 30 06:00:49 2006 From: talin at acm.org (Talin) Date: Tue, 29 Aug 2006 21:00:49 -0700 Subject: [Python-3000] Comment on iostack library Message-ID: <44F50D71.5030402@acm.org> I've been thinking more about the iostack proposal. Right now, a typical file handle consists of 3 "layers" - one representing the backing store (file, memory, network, etc.), one for adding buffering, and one representing the program-level API for reading strings, bytes, decoded text, etc. I wonder if it wouldn't be better to cut that down to two. Specifically, I would like to suggest eliminating the buffering layer. My reasoning is fairly straightforward: Most file system handles, network handles and other operating system handles already support buffering, and they do a far better job of it than we can. The handles that don't support buffering are memory streams - which don't need buffering anyway. Of course, it would make sense for Python to provide its own buffering implementation if we were going to always use the lowest-level i/o API provided by the operating system, but I can't see why we would want to do that. The OS knows how to allocate an optimal buffer, using information such as the block size of the filesystem, whereas trying to achieve this same level of functionality in the Python standard library would be needlessly complex IMHO. -- Talin From guido at python.org Wed Aug 30 06:24:02 2006 From: guido at python.org (Guido van Rossum) Date: Tue, 29 Aug 2006 21:24:02 -0700 Subject: [Python-3000] Comment on iostack library In-Reply-To: <44F50D71.5030402@acm.org> References: <44F50D71.5030402@acm.org> Message-ID: On 8/29/06, Talin wrote: > I've been thinking more about the iostack proposal. Right now, a typical > file handle consists of 3 "layers" - one representing the backing store > (file, memory, network, etc.), one for adding buffering, and one > representing the program-level API for reading strings, bytes, decoded > text, etc. > > I wonder if it wouldn't be better to cut that down to two. Specifically, > I would like to suggest eliminating the buffering layer. > > My reasoning is fairly straightforward: Most file system handles, > network handles and other operating system handles already support > buffering, and they do a far better job of it than we can. The handles > that don't support buffering are memory streams - which don't need > buffering anyway. > > Of course, it would make sense for Python to provide its own buffering > implementation if we were going to always use the lowest-level i/o API > provided by the operating system, but I can't see why we would want to > do that. The OS knows how to allocate an optimal buffer, using > information such as the block size of the filesystem, whereas trying to > achieve this same level of functionality in the Python standard library > would be needlessly complex IMHO. I'm not sure I follow. We *definitely* don't want to use stdio -- it's not part of the OS anyway, and has some annoying quirks like not giving you any insight in how it is using the buffer, nor changing the buffer size on the fly, and crashing when you switch read and write calls. So given that, how would you implement readline()? Reading one byte at a time until you've got the \n is definitely way too slow given the constant overhead of system calls. Regarding optimal buffer size, I've never seen a program for which 8K wasn't optimal. Larger buffers simply don't pay off. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From talin at acm.org Wed Aug 30 07:26:59 2006 From: talin at acm.org (Talin) Date: Tue, 29 Aug 2006 22:26:59 -0700 Subject: [Python-3000] Comment on iostack library In-Reply-To: References: <44F50D71.5030402@acm.org> Message-ID: <44F521A3.1040304@acm.org> Guido van Rossum wrote: > On 8/29/06, Talin wrote: >> I've been thinking more about the iostack proposal. Right now, a typical >> file handle consists of 3 "layers" - one representing the backing store >> (file, memory, network, etc.), one for adding buffering, and one >> representing the program-level API for reading strings, bytes, decoded >> text, etc. >> >> I wonder if it wouldn't be better to cut that down to two. Specifically, >> I would like to suggest eliminating the buffering layer. >> >> My reasoning is fairly straightforward: Most file system handles, >> network handles and other operating system handles already support >> buffering, and they do a far better job of it than we can. The handles >> that don't support buffering are memory streams - which don't need >> buffering anyway. >> >> Of course, it would make sense for Python to provide its own buffering >> implementation if we were going to always use the lowest-level i/o API >> provided by the operating system, but I can't see why we would want to >> do that. The OS knows how to allocate an optimal buffer, using >> information such as the block size of the filesystem, whereas trying to >> achieve this same level of functionality in the Python standard library >> would be needlessly complex IMHO. > > I'm not sure I follow. > > We *definitely* don't want to use stdio -- it's not part of the OS > anyway, and has some annoying quirks like not giving you any insight > in how it is using the buffer, nor changing the buffer size on the > fly, and crashing when you switch read and write calls. > > So given that, how would you implement readline()? Reading one byte at > a time until you've got the \n is definitely way too slow given the > constant overhead of system calls. > > Regarding optimal buffer size, I've never seen a program for which 8K > wasn't optimal. Larger buffers simply don't pay off. Well, as far as readline goes: In order to split the text into lines, you have to decode the text first anyway, which is a layer 3 operation. You can't just read bytes until you get a \n, because the file you are reading might be encoded in UCS2 or something. So for example, in a big-endian UCS2 encoding, newline would be encoded as 0x00 0x0a, whereas in a little-endian UCS2 encoding, it would be 0x0A 0x00. Merely stopping at the 0x0A byte is incorrect, you've only read half the character. You're correct that reading by line does require a buffer if you want to do it efficiently. However, in a world of character encodings, the readline buffer has to be implemented at a higher level in the IO stack, at the same level which understands text encodings. There may be a different set of buffers at the lower level to minimize the number of disk i/o operations, but they can't really be the same buffer -- either that, or the text encoding layer will need to have fairly incestuous knowledge of what's going on at the lower layers so that it can peek inside its buffers. It seems to me that no matter how you slice it, you can't have an abstract "buffering" layer that is independent of both the layer beneath and the layer above. Both the text decoding layer and the disk i/o layer need to have fairly intimate knowledge of their buffers if you want maximum efficiency. (I'm not opposed to a custom implementation of buffering in the level 1 file object itself, although I suspect in most cases you'd be better off using what the OS or its standard libs provide.) As far as stdio not giving you hints as to how it is using the buffer, I am not sure what you mean...what kind of information would a custom buffer implementation give you that stdio would not? If its early detection of \n is what you are thinking of, I've already shown that won't work unless you are assuming an 8-bit encoding. -- Talin From ronaldoussoren at mac.com Wed Aug 30 07:47:16 2006 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Wed, 30 Aug 2006 07:47:16 +0200 Subject: [Python-3000] Comment on iostack library In-Reply-To: <44F521A3.1040304@acm.org> References: <44F50D71.5030402@acm.org> <44F521A3.1040304@acm.org> Message-ID: <79F38D6C-F609-4B58-9C43-6FF0C2BEECE5@mac.com> On 30-aug-2006, at 7:26, Talin wrote: > Guido van Rossum wrote: >> >> Regarding optimal buffer size, I've never seen a program for which 8K >> wasn't optimal. Larger buffers simply don't pay off. Larger buffers can be useful when doing binary I/O through stdio (at least on linux). I've recently had a program that had significant speedup when I used a 128K buffer. > > Well, as far as readline goes: In order to split the text into lines, > you have to decode the text first anyway, which is a layer 3 > operation. And buffering is a layer 2 operation. Function calls are signficantly cheaper than system calls. You don't want to do a system call for every character read, but might get away with doing a function call per character. Ronald From fredrik at pythonware.com Wed Aug 30 10:38:25 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed, 30 Aug 2006 10:38:25 +0200 Subject: [Python-3000] Making more effective use of slice objects in Py3k References: <20060827184941.1AE8.JCARLSON@uci.edu> <20060829102307.1B0F.JCARLSON@uci.edu> Message-ID: Fredrik Lundh wrote: > not necessarily, but there are lots of issues involved when doing > high-performance XML stuff, and I'm not sure views would help quite as > much as one might think. > > (writing and tuning cET was a great way to learn that not everything > that you think you know about C performance applies to C code running > inside the Python interpreter...) and also based on the cET (and NFS) experiences, it wouldn't surprise me if a naive 32-bit text string implementation will, on average, slow things down *more* than any string view implementation can speed things up again... (in other words, I'm convinced that we need a polymorphic string type. I'm not so sure we need views, but if we have the former, we can use that mechanism to support the latter) From qrczak at knm.org.pl Wed Aug 30 11:20:56 2006 From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk) Date: Wed, 30 Aug 2006 11:20:56 +0200 Subject: [Python-3000] Comment on iostack library In-Reply-To: <44F521A3.1040304@acm.org> (talin@acm.org's message of "Tue, 29 Aug 2006 22:26:59 -0700") References: <44F50D71.5030402@acm.org> <44F521A3.1040304@acm.org> Message-ID: <87wt8q1sw7.fsf@qrnik.zagroda> Talin writes: > It seems to me that no matter how you slice it, you can't have an > abstract "buffering" layer that is independent of both the layer > beneath and the layer above. I think buffering makes sense as the topmost layer, and typically only there. Encoding conversion and newline conversion should be performed a block at a time, below buffering, so not only I/O syscalls, but also invocations of the recoding machinery are amortized by buffering. Buffering comes in separate byte and character flavors. Placing buffering below that makes sense only in cases we want to decode as little bytes as possible at a time (accepting the slowdown of encoding one character at a time, but avoiding a syscall per character). I'm not sure whether this is ever necessary. Finding the end of HTTP headers can be done before conversion to text. -- __("< Marcin Kowalczyk \__/ qrczak at knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/ From ncoghlan at gmail.com Wed Aug 30 11:46:57 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 30 Aug 2006 19:46:57 +1000 Subject: [Python-3000] Premature optimization and all that In-Reply-To: References: Message-ID: <44F55E91.4020000@gmail.com> Guido van Rossum wrote: > Over lunch with Neal we came upon the topic of optimization and Python 3000. > > It is our strong opinion that in this stage of the Py3k project we > should focus on getting the new language spec and implementation > feature-complete, without worrying much about optimizations. +1 here - this sounds like an excellent plan to me. Step 1: Make it work Step 2: Make it work fast I've made life difficult for myself a few times by trying to do step 2 without doing step 1 first :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ncoghlan at gmail.com Wed Aug 30 12:06:48 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 30 Aug 2006 20:06:48 +1000 Subject: [Python-3000] Comment on iostack library In-Reply-To: <44F521A3.1040304@acm.org> References: <44F50D71.5030402@acm.org> <44F521A3.1040304@acm.org> Message-ID: <44F56338.5070802@gmail.com> Talin wrote: > It seems to me that no matter how you slice it, you can't have an > abstract "buffering" layer that is independent of both the layer beneath > and the layer above. Both the text decoding layer and the disk i/o layer > need to have fairly intimate knowledge of their buffers if you want > maximum efficiency. (I'm not opposed to a custom implementation of > buffering in the level 1 file object itself, although I suspect in most > cases you'd be better off using what the OS or its standard libs provide.) You'd insert a buffering layer at the appropriate point for whatever you're trying to do. The advantage of pulling the buffering out into a separate layer is that it can be reused with different byte sources & sinks by supplying the appropriate configuration parameters, instead of having to reimplement it for each different source/sink. Applications generally won't be expected to construct these IO stacks manually. File IO stacks, for example, will most likely still be created by a call to the open() builtin (although the default mode may change to be binary if no text encoding is specified). Here's a list of the IO stacks I believe will be commonly used: Unbuffered byte IO stack: - byte stream API - byte source/sink Block buffered byte IO stack: - byte stream API - block buffering layer - byte source/sink Character buffered text IO stack: - text stream API - text codec layer - byte source/sink (effectively unbuffered for single byte encodings like ASCII) Block buffered text IO stack: - text stream API - text codec layer - block buffering - byte source/sink Line buffered text IO stack: - text stream API - line buffering - text codec layer - block buffering - byte source/sink Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From guido at python.org Wed Aug 30 16:22:11 2006 From: guido at python.org (Guido van Rossum) Date: Wed, 30 Aug 2006 07:22:11 -0700 Subject: [Python-3000] Comment on iostack library In-Reply-To: <44F521A3.1040304@acm.org> References: <44F50D71.5030402@acm.org> <44F521A3.1040304@acm.org> Message-ID: On 8/29/06, Talin wrote: > Guido van Rossum wrote: > > I'm not sure I follow. > > > > We *definitely* don't want to use stdio -- it's not part of the OS > > anyway, and has some annoying quirks like not giving you any insight > > in how it is using the buffer, nor changing the buffer size on the > > fly, and crashing when you switch read and write calls. > > > > So given that, how would you implement readline()? Reading one byte at > > a time until you've got the \n is definitely way too slow given the > > constant overhead of system calls. > > > > Regarding optimal buffer size, I've never seen a program for which 8K > > wasn't optimal. Larger buffers simply don't pay off. > > Well, as far as readline goes: In order to split the text into lines, > you have to decode the text first anyway, which is a layer 3 operation. OK, I see some of your point. This may explain why in Java the buffering layer seems to be sitting on top of the encoding/decoding. Still, for binary file I/O, we'll need a buffering layer on top of the raw I/O operations. Lots of file formats are read/written in small chunks but it would be very expensive to turn each small chunk into a system call. > As far as stdio not giving you hints as to how it is using the buffer, I > am not sure what you mean...what kind of information would a custom > buffer implementation give you that stdio would not? The specific problem with stdio is that you can't tell if anything is in the buffer or not. This can make it difficult to do non-blocking I/O on a socket through stdio (e.g. when using the makefile() option of Python sockets). Another is that a read after a write is undefined in the C std and can give segfaults on some platforms, so Python has to keep track of the "state" of the I/O buffer. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From paul at prescod.net Wed Aug 30 17:16:39 2006 From: paul at prescod.net (Paul Prescod) Date: Wed, 30 Aug 2006 08:16:39 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: References: <20060829111904.1B12.JCARLSON@uci.edu> <20060829132924.1B15.JCARLSON@uci.edu> <44F4ED99.2060408@canterbury.ac.nz> Message-ID: <1cb725390608300816h2400b0f6s9e5a71656d38673e@mail.gmail.com> I don't understand. If the difference between a string and a string view is a difference of VALUES, not TYPES, then the return type is varying based upon the difference of input types (which you say is okay). Conversely, if the strings and string views only vary in their values (share a type) then the return code is only varying in its value (which EVERYBODY thinks is okay). Or maybe we're dealing with a third (new?) situation in which the performance characteristics of a return value is being dictated by the performance characteristics of the inputs rather than being predictable on the basis of the types or values. On 8/29/06, Guido van Rossum wrote: > > On 8/29/06, Greg Ewing wrote: > > Josiah Carlson wrote: > > > This is changing return types based on variable type, > > > > How do you make that out? It seems the opposite to me -- > > Guido is saying that the return type of s+t should *not* > > depend on whether s or t happens to be a view rather than > > a real string. > > No, I never meant to say that. There's nothing wrong with the type of > x+y depending on the types of x and y. I meant that s+v, v+s and v+w > (s being a string, v and w being views) should all return strings > because -- in general -- they cannot always be views, and I don't want > the return type to depend on the *value* of the inputs. > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/paul%40prescod.net > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060830/5916c974/attachment.htm From guido at python.org Wed Aug 30 17:31:07 2006 From: guido at python.org (Guido van Rossum) Date: Wed, 30 Aug 2006 08:31:07 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <1cb725390608300816h2400b0f6s9e5a71656d38673e@mail.gmail.com> References: <20060829111904.1B12.JCARLSON@uci.edu> <20060829132924.1B15.JCARLSON@uci.edu> <44F4ED99.2060408@canterbury.ac.nz> <1cb725390608300816h2400b0f6s9e5a71656d38673e@mail.gmail.com> Message-ID: The difference between a string and a view is one of TYPE. (Because they can have such different performance and memory usage characteristics, it's not right to treat them as the same type.) You seem to be misunderstanding what I said. I want the return type only to depend on the input types. This means that all string and view concatenations must return strings, not views, because we can always create a new string, but we cannot always create a new view representing the concatenation (unless views were to support disjoint sections, which leads to insanity and the complexity and slowness of ABC's B-tree string implementation). Assuming v and w are views: Just like v.lower() must sometimes create a new string, which implies it must always return a string, v+w must sometimes create a new string, so it must always return a string. (It's okay to return an existing string if one with the appropriate value happens to be lying around nearby; but it's not okay to return one of the input views, because they're not strings.) Hope this clarifies things, --Guido On 8/30/06, Paul Prescod wrote: > I don't understand. If the difference between a string and a string view is > a difference of VALUES, not TYPES, then the return type is varying based > upon the difference of input types (which you say is okay). Conversely, if > the strings and string views only vary in their values (share a type) then > the return code is only varying in its value (which EVERYBODY thinks is > okay). > > Or maybe we're dealing with a third (new?) situation in which the > performance characteristics of a return value is being dictated by the > performance characteristics of the inputs rather than being predictable on > the basis of the types or values. > > > On 8/29/06, Guido van Rossum wrote: > > > On 8/29/06, Greg Ewing wrote: > > Josiah Carlson wrote: > > > This is changing return types based on variable type, > > > > How do you make that out? It seems the opposite to me -- > > Guido is saying that the return type of s+t should *not* > > depend on whether s or t happens to be a view rather than > > a real string. > > No, I never meant to say that. There's nothing wrong with the type of > x+y depending on the types of x and y. I meant that s+v, v+s and v+w > (s being a string, v and w being views) should all return strings > because -- in general -- they cannot always be views, and I don't want > the return type to depend on the *value* of the inputs. > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > _______________________________________________ > > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/paul%40prescod.net > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From paul at prescod.net Wed Aug 30 18:04:47 2006 From: paul at prescod.net (Paul Prescod) Date: Wed, 30 Aug 2006 09:04:47 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: References: <20060829111904.1B12.JCARLSON@uci.edu> <20060829132924.1B15.JCARLSON@uci.edu> <44F4ED99.2060408@canterbury.ac.nz> <1cb725390608300816h2400b0f6s9e5a71656d38673e@mail.gmail.com> Message-ID: <1cb725390608300904i735df3fcu73d86a1cba83263f@mail.gmail.com> Yes, thanks for the clarification. From a type theory point of view there is nothing stopping string + view returning a view always (even if it is a view of a new string) but that would have very poor performance characteristics. On 8/30/06, Guido van Rossum wrote: > > The difference between a string and a view is one of TYPE. (Because > they can have such different performance and memory usage > characteristics, it's not right to treat them as the same type.) > > You seem to be misunderstanding what I said. I want the return type > only to depend on the input types. This means that all string and view > concatenations must return strings, not views, because we can always > create a new string, but we cannot always create a new view > representing the concatenation (unless views were to support disjoint > sections, which leads to insanity and the complexity and slowness of > ABC's B-tree string implementation). > > Assuming v and w are views: Just like v.lower() must sometimes create > a new string, which implies it must always return a string, v+w must > sometimes create a new string, so it must always return a string. > (It's okay to return an existing string if one with the appropriate > value happens to be lying around nearby; but it's not okay to return > one of the input views, because they're not strings.) > > Hope this clarifies things, > > --Guido > > On 8/30/06, Paul Prescod wrote: > > I don't understand. If the difference between a string and a string view > is > > a difference of VALUES, not TYPES, then the return type is varying based > > upon the difference of input types (which you say is okay). Conversely, > if > > the strings and string views only vary in their values (share a type) > then > > the return code is only varying in its value (which EVERYBODY thinks is > > okay). > > > > Or maybe we're dealing with a third (new?) situation in which the > > performance characteristics of a return value is being dictated by the > > performance characteristics of the inputs rather than being predictable > on > > the basis of the types or values. > > > > > > On 8/29/06, Guido van Rossum wrote: > > > > > On 8/29/06, Greg Ewing wrote: > > > Josiah Carlson wrote: > > > > This is changing return types based on variable type, > > > > > > How do you make that out? It seems the opposite to me -- > > > Guido is saying that the return type of s+t should *not* > > > depend on whether s or t happens to be a view rather than > > > a real string. > > > > No, I never meant to say that. There's nothing wrong with the type of > > x+y depending on the types of x and y. I meant that s+v, v+s and v+w > > (s being a string, v and w being views) should all return strings > > because -- in general -- they cannot always be views, and I don't want > > the return type to depend on the *value* of the inputs. > > > > -- > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > _______________________________________________ > > > > Python-3000 mailing list > > Python-3000 at python.org > > http://mail.python.org/mailman/listinfo/python-3000 > > Unsubscribe: > > http://mail.python.org/mailman/options/python-3000/paul%40prescod.net > > > > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060830/c77924b2/attachment-0001.html From guido at python.org Wed Aug 30 18:54:38 2006 From: guido at python.org (Guido van Rossum) Date: Wed, 30 Aug 2006 09:54:38 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <1cb725390608300904i735df3fcu73d86a1cba83263f@mail.gmail.com> References: <20060829111904.1B12.JCARLSON@uci.edu> <20060829132924.1B15.JCARLSON@uci.edu> <44F4ED99.2060408@canterbury.ac.nz> <1cb725390608300816h2400b0f6s9e5a71656d38673e@mail.gmail.com> <1cb725390608300904i735df3fcu73d86a1cba83263f@mail.gmail.com> Message-ID: I'd phrase it differently -- that would be plain silly. :-) On 8/30/06, Paul Prescod wrote: > Yes, thanks for the clarification. From a type theory point of view there is > nothing stopping string + view returning a view always (even if it is a view > of a new string) but that would have very poor performance characteristics. > > > On 8/30/06, Guido van Rossum wrote: > > The difference between a string and a view is one of TYPE. (Because > > they can have such different performance and memory usage > > characteristics, it's not right to treat them as the same type.) > > > > You seem to be misunderstanding what I said. I want the return type > > only to depend on the input types. This means that all string and view > > concatenations must return strings, not views, because we can always > > create a new string, but we cannot always create a new view > > representing the concatenation (unless views were to support disjoint > > sections, which leads to insanity and the complexity and slowness of > > ABC's B-tree string implementation). > > > > Assuming v and w are views: Just like v.lower() must sometimes create > > a new string, which implies it must always return a string, v+w must > > sometimes create a new string, so it must always return a string. > > (It's okay to return an existing string if one with the appropriate > > value happens to be lying around nearby; but it's not okay to return > > one of the input views, because they're not strings.) > > > > Hope this clarifies things, > > > > --Guido > > > > On 8/30/06, Paul Prescod wrote: > > > I don't understand. If the difference between a string and a string view > is > > > a difference of VALUES, not TYPES, then the return type is varying based > > > upon the difference of input types (which you say is okay). Conversely, > if > > > the strings and string views only vary in their values (share a type) > then > > > the return code is only varying in its value (which EVERYBODY thinks is > > > okay). > > > > > > Or maybe we're dealing with a third (new?) situation in which the > > > performance characteristics of a return value is being dictated by the > > > performance characteristics of the inputs rather than being predictable > on > > > the basis of the types or values. > > > > > > > > > On 8/29/06, Guido van Rossum wrote: > > > > > > > On 8/29/06, Greg Ewing wrote: > > > > Josiah Carlson wrote: > > > > > This is changing return types based on variable type, > > > > > > > > How do you make that out? It seems the opposite to me -- > > > > Guido is saying that the return type of s+t should *not* > > > > depend on whether s or t happens to be a view rather than > > > > a real string. > > > > > > No, I never meant to say that. There's nothing wrong with the type of > > > x+y depending on the types of x and y. I meant that s+v, v+s and v+w > > > (s being a string, v and w being views) should all return strings > > > because -- in general -- they cannot always be views, and I don't want > > > the return type to depend on the *value* of the inputs. > > > > > > -- > > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > _______________________________________________ > > > > > > Python-3000 mailing list > > > Python-3000 at python.org > > > http://mail.python.org/mailman/listinfo/python-3000 > > > Unsubscribe: > > > > http://mail.python.org/mailman/options/python-3000/paul%40prescod.net > > > > > > > > > > > > -- > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jcarlson at uci.edu Wed Aug 30 20:25:58 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Wed, 30 Aug 2006 11:25:58 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <1cb725390608300904i735df3fcu73d86a1cba83263f@mail.gmail.com> References: <1cb725390608300904i735df3fcu73d86a1cba83263f@mail.gmail.com> Message-ID: <20060830091620.1B30.JCARLSON@uci.edu> "Paul Prescod" wrote: > Yes, thanks for the clarification. From a type theory point of view there is > nothing stopping string + view returning a view always (even if it is a view > of a new string) but that would have very poor performance characteristics. It depends. Assume single-segment views (that's what I've been implementing). If you have two non-adjacent views, or a view+string (for non-empty strings), etc., you need to take the time to construct the new string, that's a given. But once you have a string, you could return either the string, or you could return a full view of the string. The performance differences are fairly insignificant (I was not able to measure any). Up until this morning I was planning on writing everything such that constructive manipulation (upper(), __add__, etc.) returned views of strings. While I still feel it would be more consistant to always return views, returning strings does let the user know that "this operation may take a while" by virtue of returning a string. - Josiah From steven.bethard at gmail.com Wed Aug 30 23:40:55 2006 From: steven.bethard at gmail.com (Steven Bethard) Date: Wed, 30 Aug 2006 15:40:55 -0600 Subject: [Python-3000] have zip() raise exception for sequences of different lengths Message-ID: A couple Python-3000 threads [1] [2] have indicated that the most natural use of zip() is with sequences of the same lengths. I feel the same way, and run into this all the time. Because the error would otherwise pass silently, I usually end up adding checks before each use of zip() to raise an exception if I accidentally pass in sequences of different lengths. Any chance that zip() in Python 3000 could automatically raise an exception if the sequence lengths are different? If there's really a need for a zip that just truncates, maybe that could be moved to itertools? I think the equal-length scenario is dramatically more common, and keeping that error from passing silently would be a good thing IMHO. [1] http://mail.python.org/pipermail/python-3000/2006-March/000160.html [2] http://mail.python.org/pipermail/python-3000/2006-August/003094.html Steve -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From rhettinger at ewtllc.com Wed Aug 30 23:52:54 2006 From: rhettinger at ewtllc.com (Raymond Hettinger) Date: Wed, 30 Aug 2006 14:52:54 -0700 Subject: [Python-3000] have zip() raise exception for sequences of different lengths In-Reply-To: References: Message-ID: <44F608B6.5010209@ewtllc.com> Steven Bethard wrote: >A couple Python-3000 threads [1] [2] have indicated that the most >natural use of zip() is with sequences of the same lengths. I feel >the same way, and run into this all the time. Because the error would >otherwise pass silently, I usually end up adding checks before each >use of zip() to raise an exception if I accidentally pass in sequences >of different lengths. > >Any chance that zip() in Python 3000 could automatically raise an >exception if the sequence lengths are different? If there's really a >need for a zip that just truncates, maybe that could be moved to >itertools? I think the equal-length scenario is dramatically more >common, and keeping that error from passing silently would be a good >thing IMHO. > > > -1 I think this would cause much more harm than good and wreck an otherwise easy-to-understand tool. Raymond From guido at python.org Wed Aug 30 23:57:48 2006 From: guido at python.org (Guido van Rossum) Date: Wed, 30 Aug 2006 14:57:48 -0700 Subject: [Python-3000] have zip() raise exception for sequences of different lengths In-Reply-To: <44F608B6.5010209@ewtllc.com> References: <44F608B6.5010209@ewtllc.com> Message-ID: > Steven Bethard wrote: > >A couple Python-3000 threads [1] [2] have indicated that the most > >natural use of zip() is with sequences of the same lengths. I feel > >the same way, and run into this all the time. Because the error would > >otherwise pass silently, I usually end up adding checks before each > >use of zip() to raise an exception if I accidentally pass in sequences > >of different lengths. > > > >Any chance that zip() in Python 3000 could automatically raise an > >exception if the sequence lengths are different? If there's really a > >need for a zip that just truncates, maybe that could be moved to > >itertools? I think the equal-length scenario is dramatically more > >common, and keeping that error from passing silently would be a good > >thing IMHO. [Raymond] > -1 > I think this would cause much more harm than good and wreck an > otherwise easy-to-understand tool. Perhaps a compromise could be to add a keyword parameter to request such an exception? (We could even add three options: truncate, pad, error, with truncate being the default, and pad being the old map() and filter() behavior.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at python.org Thu Aug 31 00:21:34 2006 From: barry at python.org (Barry Warsaw) Date: Wed, 30 Aug 2006 18:21:34 -0400 Subject: [Python-3000] have zip() raise exception for sequences of different lengths In-Reply-To: References: <44F608B6.5010209@ewtllc.com> Message-ID: <305688A8-0CFA-4F80-80EA-E3D2343D7226@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 30, 2006, at 5:57 PM, Guido van Rossum wrote: > Perhaps a compromise could be to add a keyword parameter to request > such an exception? (We could even add three options: truncate, pad, > error, with truncate being the default, and pad being the old map() > and filter() behavior.) Caveat: I don't even know if /I/ like this, but I'll spit it out anyway in case it spurs an actual good idea from someone else. :) What about a keyword argument called 'filler' which can be an n-sized sequence or a callable. If it's a sequence, then when zip arguments are exhausted, you pull values for that item from the appropriate element of the sequence. If it's a callable, you call it with the items you have and None's for the exhausted ones. Whatever filler() returns, zip returns. filler() could then splice in whatever values it wants. Yeah 'None' for the missing ones can be ambiguous but oh well. You raise a ValueError if filler is a sequence of size that doesn't match the number of zip arguments or if filler() doesn't return an appropriately sized sequence. yeah-okay-dumb-5-minute-idea-ly y'rs, - -Barry P.S. OTOH, zip's current semantics never bothered me much in practice. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iQCVAwUBRPYPc3EjvBPtnXfVAQL5RQQAh93Sr84HaLP0Zo4hr3JBuWkhipryIx+A eCnGKXxXa8fTvBuRcaHFAryPnXxrnrhs1pmpQsf3/scJTHcwXstX8OMJvHrFRqcV KHF8qRazP271RnbDQuDBTJwcwsTFpjHtDVyNbApYxQDreiy77q4ZDyuraICKlkqo rT8hfF3Mab8= =+9HZ -----END PGP SIGNATURE----- From rhettinger at ewtllc.com Thu Aug 31 00:41:08 2006 From: rhettinger at ewtllc.com (Raymond Hettinger) Date: Wed, 30 Aug 2006 15:41:08 -0700 Subject: [Python-3000] have zip() raise exception for sequences of different lengths In-Reply-To: References: <44F608B6.5010209@ewtllc.com> Message-ID: <44F61404.8010002@ewtllc.com> > > Perhaps a compromise could be to add a keyword parameter to request > such an exception? (We could even add three options: truncate, pad, > error, with truncate being the default, and pad being the old map() > and filter() behavior.) FWIW, I intend to add an itertool called izip_longest() which allows a pad value to be specified. In deciding to accept that feature request, I put a great deal of thought and research into the idea. Along the way, I looked at other languages and found both truncating and padding versions of zip but did not find any version that raised an exception. IMO, such a provision would foul the waters and complicate the use of an otherwise simple function. Until now, there have been zero requests for zip() to have exception raising behavior. For Python 3k, I recommend: * simply replacing zip() with itertools.izip() * keeping the zip_longest() in a separate module * punting on an exception raising version The first covers 99% of use cases. The second covers a handful of situations that are otherwise difficult to deal with. The third is a YAGNI. Raymond From steven.bethard at gmail.com Thu Aug 31 01:33:27 2006 From: steven.bethard at gmail.com (Steven Bethard) Date: Wed, 30 Aug 2006 17:33:27 -0600 Subject: [Python-3000] have zip() raise exception for sequences of different lengths In-Reply-To: <44F61404.8010002@ewtllc.com> References: <44F608B6.5010209@ewtllc.com> <44F61404.8010002@ewtllc.com> Message-ID: On 8/30/06, Raymond Hettinger wrote: > Until now, there have been zero requests for zip() to have exception > raising behavior. > > For Python 3k, I recommend: > * simply replacing zip() with itertools.izip() > * keeping the zip_longest() in a separate module > * punting on an exception raising version > > The first covers 99% of use cases. I guess it depends what you mean by "covers". If you mean "produces the correct output for correct input" then yes, it does, but so would the exception raising one. I contend that it often does the wrong thing for incorrect input by silently truncating. To try to give a fair evaluation of this contention, I looked at some stdlib examples and tried to classify them: Examples where different lengths should be an error: compiler/pycodegen.py: for i, for_ in zip(range(len(node.quals)), node.quals): dis.py: for byte_incr, line_incr in zip(byte_increments, line_increments): email/Header.py: return zip(chunks, [charset]*len(chunks)) filecmp.py: a = dict(izip(imap(os.path.normcase, self.left_list), self.left_list)) idlelib/keybindingDialog.py: for modifier, variable in zip(self.modifiers, self.modifier_vars): Examples where truncation is needed: csv.py: d = dict(zip(self.fieldnames, row)) idlelib/EditorWindow.py: for i, file in zip(count(), rf_list): A couple of the examples (pycodegen.py, EditorWindow.py) are really just performing a poor-man's enumerate(), but with a cursory glance it still looks to me like there are more cases in the stdlib where it is a programming error to have lists of different sizes. If changing zip()'s behavior to match the most common use case is totally out, the stdlib code at least argues for adding something like itertools.izip_exact(). Steve -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From steven.bethard at gmail.com Thu Aug 31 01:56:32 2006 From: steven.bethard at gmail.com (Steven Bethard) Date: Wed, 30 Aug 2006 17:56:32 -0600 Subject: [Python-3000] have zip() raise exception for sequences of different lengths In-Reply-To: <44F608B6.5010209@ewtllc.com> References: <44F608B6.5010209@ewtllc.com> Message-ID: On 8/30/06, Raymond Hettinger wrote: > Steven Bethard wrote: > > >A couple Python-3000 threads [1] [2] have indicated that the most > >natural use of zip() is with sequences of the same lengths. I feel > >the same way, and run into this all the time. Because the error would > >otherwise pass silently, I usually end up adding checks before each > >use of zip() to raise an exception if I accidentally pass in sequences > >of different lengths. > > > >Any chance that zip() in Python 3000 could automatically raise an > >exception if the sequence lengths are different? If there's really a > >need for a zip that just truncates, maybe that could be moved to > >itertools? I think the equal-length scenario is dramatically more > >common, and keeping that error from passing silently would be a good > >thing IMHO. > > -1 > I think this would cause much more harm than good and wreck an > otherwise easy-to-understand tool. Current documentation: zip( [iterable, ...]) This function returns a list of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. The returned list is truncated in length to the length of the shortest argument sequence... Proposed change: zip( [iterable, ...]) This function returns a list of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. It is an error if the argument sequences are of different lengths... That seems pretty comparable in complexity to me. Could you explain how this makes zip() harder to understand? Steve -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From rhettinger at ewtllc.com Thu Aug 31 01:58:04 2006 From: rhettinger at ewtllc.com (Raymond Hettinger) Date: Wed, 30 Aug 2006 16:58:04 -0700 Subject: [Python-3000] have zip() raise exception for sequences of different lengths In-Reply-To: References: <44F608B6.5010209@ewtllc.com> <44F61404.8010002@ewtllc.com> Message-ID: <44F6260C.1040502@ewtllc.com> >If changing zip()'s behavior to match the most common use case is >totally out, the stdlib code at least argues for adding something like >itertools.izip_exact(). > > I open to that. For this time being, let's do this. Add itertools.izip_longest() in Py2.5 and include a recipe for izip_exact() and see if anyone cares enough to ever use it. The new any() and all() functions started out as recipes and graduated when their popularity was shown. If izip_exact() proves its worth, then I would be happy to add it as a tool. Raymond From greg.ewing at canterbury.ac.nz Thu Aug 31 01:59:07 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 31 Aug 2006 11:59:07 +1200 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <20060830091620.1B30.JCARLSON@uci.edu> References: <1cb725390608300904i735df3fcu73d86a1cba83263f@mail.gmail.com> <20060830091620.1B30.JCARLSON@uci.edu> Message-ID: <44F6264B.4000005@canterbury.ac.nz> Josiah Carlson wrote: > Up until this morning I was planning on writing everything such that > constructive manipulation (upper(), __add__, etc.) returned views of > strings. I was about to say that this would be completely pointless, when I realised the point is so that further operations on these results would return views of them. In Josiah's views-always-return-views world, that would actually make sense -- but only if we really wanted such a world. To my mind, the use of views is to temporarily call out a part of a string for the purpose of applying some other operation to it. Views will therefore be short-lived objects that you won't want to keep and pass around. I suspect that, if views are the default result of anything done to a view, one will almost always be doing a str() on the result to turn it back into a non-view. If that's the case, then returning views would be the wrong default. -- Greg From rhettinger at ewtllc.com Thu Aug 31 02:03:17 2006 From: rhettinger at ewtllc.com (Raymond Hettinger) Date: Wed, 30 Aug 2006 17:03:17 -0700 Subject: [Python-3000] have zip() raise exception for sequences of different lengths In-Reply-To: References: <44F608B6.5010209@ewtllc.com> Message-ID: <44F62745.60006@ewtllc.com> >Proposed change: > >zip( [iterable, ...]) > This function returns a list of tuples, where the i-th tuple >contains the i-th element from each of the argument sequences or >iterables. It is an error if the argument sequences are of different >lengths... > >That seems pretty comparable in complexity to me. Could you explain >how this makes zip() harder to understand? > > It's a PITA because it precludes all of the use cases whether the inputs ARE intentionally of different length (like when one argument supplys an infinite iterator): for lineno, ts, line in zip(count(1), timestamp(), sys.stdin): print 'Line %d, Time %s: %s)' % (lineno, ts, line) Raymond From greg.ewing at canterbury.ac.nz Thu Aug 31 02:06:56 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 31 Aug 2006 12:06:56 +1200 Subject: [Python-3000] have zip() raise exception for sequences of different lengths In-Reply-To: References: <44F608B6.5010209@ewtllc.com> Message-ID: <44F62820.3090206@canterbury.ac.nz> Guido van Rossum wrote: > Perhaps a compromise could be to add a keyword parameter to request > such an exception? But who is going to bother using such a keyword, when it's not necessary for correct operation of the program in the absence of bugs? > (We could even add three options: truncate, pad, > error, with truncate being the default, and pad being the old map() > and filter() behavior.) This seems to fall foul of the no-constant-parameters guideline. -- Greg From rrr at ronadam.com Thu Aug 31 03:26:55 2006 From: rrr at ronadam.com (Ron Adam) Date: Wed, 30 Aug 2006 20:26:55 -0500 Subject: [Python-3000] have zip() raise exception for sequences of different lengths In-Reply-To: References: <44F608B6.5010209@ewtllc.com> Message-ID: Guido van Rossum wrote: > Perhaps a compromise could be to add a keyword parameter to request > such an exception? (We could even add three options: truncate, pad, > error, with truncate being the default, and pad being the old map() > and filter() behavior.) Maybe it can be done with just two optional keywords. If 'match' is True, raise an error if iterables are mismatched. if a 'pad' is specified then pad, else truncate. The current truncating behavior would be the default. Ron From jcarlson at uci.edu Thu Aug 31 04:20:06 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Wed, 30 Aug 2006 19:20:06 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <44F6264B.4000005@canterbury.ac.nz> References: <20060830091620.1B30.JCARLSON@uci.edu> <44F6264B.4000005@canterbury.ac.nz> Message-ID: <20060830185158.1B3F.JCARLSON@uci.edu> Greg Ewing wrote: > Josiah Carlson wrote: > > > Up until this morning I was planning on writing everything such that > > constructive manipulation (upper(), __add__, etc.) returned views of > > strings. > > I was about to say that this would be completely pointless, > when I realised the point is so that further operations on > these results would return views of them. In Josiah's > views-always-return-views world, that would actually make > sense -- but only if we really wanted such a world. Code wise, it could easily be a keyword argument on construction. > To my mind, the use of views is to temporarily call out > a part of a string for the purpose of applying some > other operation to it. Views will therefore be > short-lived objects that you won't want to keep and > pass around. I suspect that, if views are the default > result of anything done to a view, one will almost > always be doing a str() on the result to turn it back > into a non-view. If that's the case, then returning > views would be the wrong default. If views are always returned, then we can perform some optimizations (adjacent view concatenation, etc.), which may reduce running time, memory use, etc. If the user *needs* a string to be returned, they can always perform str(view). But remember, since 2.x strings are going away in 3.x, then it would really be bytes(view). I've looked through the methods available to them, and I'm happy that views are gaining traction, if only so that I can get view(bytes).partition() . If we always return strings (or bytes in 3.x), then all of those optimizations are lost. I'm writing them with optimizations, but they can certainly be removed later. Oh, and I've only got about 15 methods of the 60+ left to implement. - Josiah From talin at acm.org Thu Aug 31 04:35:48 2006 From: talin at acm.org (Talin) Date: Wed, 30 Aug 2006 19:35:48 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <20060830185158.1B3F.JCARLSON@uci.edu> References: <20060830091620.1B30.JCARLSON@uci.edu> <44F6264B.4000005@canterbury.ac.nz> <20060830185158.1B3F.JCARLSON@uci.edu> Message-ID: <44F64B04.9080200@acm.org> Josiah Carlson wrote: > If views are always returned, then we can perform some optimizations > (adjacent view concatenation, etc.), which may reduce running time, > memory use, etc. If the user *needs* a string to be returned, they can > always perform str(view). But remember, since 2.x strings are going > away in 3.x, then it would really be bytes(view). I've looked through > the methods available to them, and I'm happy that views are gaining > traction, if only so that I can get view(bytes).partition() . I know this was shot down before, but I would still like to see a "characters" type - that is, a mutable sequence of wide characters, much like the Java StringBuffer class - to go along with "bytes". From my perspective, it makes perfect sense to have an "array of character" type as well as an "array of byte" type, and since the "array of byte" is simply called "bytes", then by extension the "array of character" type would be called "characters". Of course, both the 'array' and 'list' types already give you that, but "characters" would have additional string-like methods. (However since it is mutable, it would not be capable of producing views.) The 'characters' data type would be particularly optimized for character-at-a-time operations, i.e. building up a string one character at a time. An example use would be processing escape sequences in strings, where you are transforming the escaped string into its non-escaped equivalent. -- Talin From guido at python.org Thu Aug 31 05:01:04 2006 From: guido at python.org (Guido van Rossum) Date: Wed, 30 Aug 2006 20:01:04 -0700 Subject: [Python-3000] have zip() raise exception for sequences of different lengths In-Reply-To: References: <44F608B6.5010209@ewtllc.com> Message-ID: Actually given Raymond's preferences I take it back On 8/30/06, Ron Adam wrote: > Guido van Rossum wrote: > > > Perhaps a compromise could be to add a keyword parameter to request > > such an exception? (We could even add three options: truncate, pad, > > error, with truncate being the default, and pad being the old map() > > and filter() behavior.) > > Maybe it can be done with just two optional keywords. > > > If 'match' is True, raise an error if iterables are mismatched. > > if a 'pad' is specified then pad, else truncate. > > The current truncating behavior would be the default. > > > Ron > > > > > > > > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Thu Aug 31 05:05:26 2006 From: guido at python.org (Guido van Rossum) Date: Wed, 30 Aug 2006 20:05:26 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <44F64B04.9080200@acm.org> References: <20060830091620.1B30.JCARLSON@uci.edu> <44F6264B.4000005@canterbury.ac.nz> <20060830185158.1B3F.JCARLSON@uci.edu> <44F64B04.9080200@acm.org> Message-ID: On 8/30/06, Talin wrote: > I know this was shot down before, but I would still like to see a > "characters" type - that is, a mutable sequence of wide characters, much > like the Java StringBuffer class - to go along with "bytes". From my > perspective, it makes perfect sense to have an "array of character" type > as well as an "array of byte" type, and since the "array of byte" is > simply called "bytes", then by extension the "array of character" type > would be called "characters". > > Of course, both the 'array' and 'list' types already give you that, but > "characters" would have additional string-like methods. (However since > it is mutable, it would not be capable of producing views.) > > The 'characters' data type would be particularly optimized for > character-at-a-time operations, i.e. building up a string one character > at a time. An example use would be processing escape sequences in > strings, where you are transforming the escaped string into its > non-escaped equivalent. The array module was always usable for this purpose (even for Unicode characters) but it doesn't seem to have gotten any traction. So it sounds like a YAGNI to me. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From steven.bethard at gmail.com Thu Aug 31 05:32:14 2006 From: steven.bethard at gmail.com (Steven Bethard) Date: Wed, 30 Aug 2006 21:32:14 -0600 Subject: [Python-3000] have zip() raise exception for sequences of different lengths In-Reply-To: <44F6260C.1040502@ewtllc.com> References: <44F608B6.5010209@ewtllc.com> <44F61404.8010002@ewtllc.com> <44F6260C.1040502@ewtllc.com> Message-ID: On 8/30/06, Raymond Hettinger wrote: > >If changing zip()'s behavior to match the most common use case is > >totally out, the stdlib code at least argues for adding something like > >itertools.izip_exact(). > > I open to that. > > For this time being, let's do this. Add itertools.izip_longest() in > Py2.5 and include a recipe for izip_exact() and see if anyone cares > enough to ever use it. The new any() and all() functions started out as > recipes and graduated when their popularity was shown. If izip_exact() > proves its worth, then I would be happy to add it as a tool. Fair enough. Michael Chermside provided a recipe here: http://mail.python.org/pipermail/python-3000/2006-March/000160.html Maybe there's a cleaner way to write this, but I couldn't spot one off-hand. STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From jcarlson at uci.edu Thu Aug 31 05:41:24 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Wed, 30 Aug 2006 20:41:24 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <44F64B04.9080200@acm.org> References: <20060830185158.1B3F.JCARLSON@uci.edu> <44F64B04.9080200@acm.org> Message-ID: <20060830203044.1B42.JCARLSON@uci.edu> Talin wrote: > I know this was shot down before, but I would still like to see a > "characters" type - that is, a mutable sequence of wide characters, much > like the Java StringBuffer class - to go along with "bytes". From my > perspective, it makes perfect sense to have an "array of character" type > as well as an "array of byte" type, and since the "array of byte" is > simply called "bytes", then by extension the "array of character" type > would be called "characters". If the buffer API offered information about the size of each element, similar to the way the proposed 'array API' is offering, this would just be one of the supportable cases. Views could offer the ability to specify the size of each element during construction (8, 16, or 32 bits), but variant methods for handling everything would need to be constructed. > Of course, both the 'array' and 'list' types already give you that, but > "characters" would have additional string-like methods. (However since > it is mutable, it would not be capable of producing views.) The view object I have now supports mutable and resizable objects (like bytes and array). > The 'characters' data type would be particularly optimized for > character-at-a-time operations, i.e. building up a string one character > at a time. An example use would be processing escape sequences in > strings, where you are transforming the escaped string into its > non-escaped equivalent. That is already possible with array.array('H', ...) or array.array('L', ...), depending on the unicode width of your platform. Array performs a more conservative reallocation strategy (1/16 rather than 1/8), but it seems to work well enough. Combine array with wide character support in views, and we could very well have the functionality that you desire. - Josiah From bob at redivi.com Thu Aug 31 05:56:03 2006 From: bob at redivi.com (Bob Ippolito) Date: Wed, 30 Aug 2006 20:56:03 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: References: <20060827184941.1AE8.JCARLSON@uci.edu> <20060829102307.1B0F.JCARLSON@uci.edu> Message-ID: <6a36e7290608302056v4b0e68abrfe0c5b1fc927ff@mail.gmail.com> On 8/30/06, Fredrik Lundh wrote: > Fredrik Lundh wrote: > > > not necessarily, but there are lots of issues involved when doing > > high-performance XML stuff, and I'm not sure views would help quite as > > much as one might think. > > > > (writing and tuning cET was a great way to learn that not everything > > that you think you know about C performance applies to C code running > > inside the Python interpreter...) > > and also based on the cET (and NFS) experiences, it wouldn't surprise me > if a naive 32-bit text string implementation will, on average, slow things down > *more* than any string view implementation can speed things up again... > > (in other words, I'm convinced that we need a polymorphic string type. I'm not > so sure we need views, but if we have the former, we can use that mechanism to > support the latter) +1 for polymorphic strings. This would give us the best of both worlds: compact representations for ASCII and Latin-1, full 32-bit text when needed, and the possibility to implement further optimizations when necessary. It could add a bit of complexity and/or a massive speed penalty (depending on how naive the implementation is) around character operations though. For implementation ideas, Apple's CoreFoundation has a mature implementation of polymorphic strings in C (which is the basis for their NSString type in Objective-C), and there's a cross-platform subset of it available as CF-Lite: http://developer.apple.com/opensource/cflite.html -bob From jackdied at jackdied.com Thu Aug 31 06:00:41 2006 From: jackdied at jackdied.com (Jack Diederich) Date: Thu, 31 Aug 2006 00:00:41 -0400 Subject: [Python-3000] have zip() raise exception for sequences of different lengths In-Reply-To: References: <44F608B6.5010209@ewtllc.com> Message-ID: <20060831040041.GF6257@performancedrivers.com> No need to take it back, as a long time python-* list reader I only took your initial post as thinking out loud. List readers can spot similar threads in the future by looking for these three indicators: 1) Behavioral function arguments are discouraged and mostly on your say-so. 2) You didn't top post, so it wasn't a pronouncement. 3) Long time readers were sure enough of #1 and #2 that no one added a "GOOD GOD NO" reply top-posting-ly, -Jack On Wed, Aug 30, 2006 at 08:01:04PM -0700, Guido van Rossum wrote: > Actually given Raymond's preferences I take it back > > On 8/30/06, Ron Adam wrote: > > Guido van Rossum wrote: > > > > > Perhaps a compromise could be to add a keyword parameter to request > > > such an exception? (We could even add three options: truncate, pad, > > > error, with truncate being the default, and pad being the old map() > > > and filter() behavior.) > > > > Maybe it can be done with just two optional keywords. > > > > > > If 'match' is True, raise an error if iterables are mismatched. > > > > if a 'pad' is specified then pad, else truncate. > > > > The current truncating behavior would be the default. > > > > > > Ron > > > > > > > > > > > > > > > > > > _______________________________________________ > > Python-3000 mailing list > > Python-3000 at python.org > > http://mail.python.org/mailman/listinfo/python-3000 > > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/jack%40performancedrivers.com > From rrr at ronadam.com Thu Aug 31 06:27:27 2006 From: rrr at ronadam.com (Ron Adam) Date: Wed, 30 Aug 2006 23:27:27 -0500 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <20060830185158.1B3F.JCARLSON@uci.edu> References: <20060830091620.1B30.JCARLSON@uci.edu> <44F6264B.4000005@canterbury.ac.nz> <20060830185158.1B3F.JCARLSON@uci.edu> Message-ID: Josiah Carlson wrote: > If views are always returned, then we can perform some optimizations > (adjacent view concatenation, etc.), which may reduce running time, > memory use, etc. d Given a empty string and a view to it, how much memory do you think a view object will take in comparison to the string object? Wouldn't there be a minimum size of a string where it would be better to just copy the string? From jack at psynchronous.com Thu Aug 31 06:43:54 2006 From: jack at psynchronous.com (Jack Diederich) Date: Thu, 31 Aug 2006 00:43:54 -0400 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <6a36e7290608302056v4b0e68abrfe0c5b1fc927ff@mail.gmail.com> References: <20060827184941.1AE8.JCARLSON@uci.edu> <20060829102307.1B0F.JCARLSON@uci.edu> <6a36e7290608302056v4b0e68abrfe0c5b1fc927ff@mail.gmail.com> Message-ID: <20060831044354.GH6257@performancedrivers.com> On Wed, Aug 30, 2006 at 08:56:03PM -0700, Bob Ippolito wrote: > On 8/30/06, Fredrik Lundh wrote: > > Fredrik Lundh wrote: > > > > > not necessarily, but there are lots of issues involved when doing > > > high-performance XML stuff, and I'm not sure views would help quite as > > > much as one might think. > > > > > > (writing and tuning cET was a great way to learn that not everything > > > that you think you know about C performance applies to C code running > > > inside the Python interpreter...) > > > > and also based on the cET (and NFS) experiences, it wouldn't surprise me > > if a naive 32-bit text string implementation will, on average, slow things down > > *more* than any string view implementation can speed things up again... > > > > (in other words, I'm convinced that we need a polymorphic string type. I'm not > > so sure we need views, but if we have the former, we can use that mechanism to > > support the latter) > > +1 for polymorphic strings. > > This would give us the best of both worlds: compact representations > for ASCII and Latin-1, full 32-bit text when needed, and the > possibility to implement further optimizations when necessary. It > could add a bit of complexity and/or a massive speed penalty > (depending on how naive the implementation is) around character > operations though. > > For implementation ideas, Apple's CoreFoundation has a mature > implementation of polymorphic strings in C (which is the basis for > their NSString type in Objective-C), and there's a cross-platform > subset of it available as CF-Lite: > http://developer.apple.com/opensource/cflite.html > Having watched Fredrik casually double the speed of many str and unicode operations in a week I'm easily +1 on whatever he says. Bob's support makes that a +2, he struck me as quite sane too. That said can you guys expand on what polymorphic[1] means here in particular? Python wise I can only think of the str/unicode/buffer split. If the fraternity of strings doesn't include views (which I haven't needed either) what are you considering for the other kinds? -Jack [1] My ten pound Webster's says "An organism having more that one adult form, as the different castes in social ants" which is close enough to what I think the comp sci definition is. From jcarlson at uci.edu Thu Aug 31 07:23:05 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Wed, 30 Aug 2006 22:23:05 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: References: <20060830185158.1B3F.JCARLSON@uci.edu> Message-ID: <20060830220511.1B45.JCARLSON@uci.edu> Ron Adam wrote: > > Josiah Carlson wrote: > > > If views are always returned, then we can perform some optimizations > > (adjacent view concatenation, etc.), which may reduce running time, > > memory use, etc. d > > Given a empty string and a view to it, how much memory do you think a > view object will take in comparison to the string object? On 32 bit platforms, the current implementation uses 8 more bytes than a Python 2.4 buffer, or 44 bytes rather than 36. The base string object takes up at least 24 bytes (for strings of length 2-4, all length 1 and 0 strings are interned). > Wouldn't there be a minimum size of a string where it would be better to > just copy the string? What do you mean by "better"? If your question is: at what size would returning a Python 2.x string be more space efficient than a the current view implementation, that would be a string of up to 24 bytes long. However, as I said before, with views we can do adjacent view concatenation... x,y,z = view.partition(a) left_with_sep = x+y right_with_sep = y+z If we returned views from view addition, then both of the additions above would be constant time operations. But if we returned strings from view additions, the above two additions would run in O(n) time together. If we were really crazy, we could even handle non-adjacent view concatenation by checking the readonly flag, and examining data to the right of the current view. But even I'm not that crazy. - Josiah From talin at acm.org Thu Aug 31 07:36:43 2006 From: talin at acm.org (Talin) Date: Wed, 30 Aug 2006 22:36:43 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <20060830203044.1B42.JCARLSON@uci.edu> References: <20060830185158.1B3F.JCARLSON@uci.edu> <44F64B04.9080200@acm.org> <20060830203044.1B42.JCARLSON@uci.edu> Message-ID: <44F6756B.2080606@acm.org> Josiah Carlson wrote: > Talin wrote: >> The 'characters' data type would be particularly optimized for >> character-at-a-time operations, i.e. building up a string one character >> at a time. An example use would be processing escape sequences in >> strings, where you are transforming the escaped string into its >> non-escaped equivalent. > > That is already possible with array.array('H', ...) or array.array('L', ...), > depending on the unicode width of your platform. Array performs a more > conservative reallocation strategy (1/16 rather than 1/8), but it seems > to work well enough. Combine array with wide character support in views, > and we could very well have the functionality that you desire. Well, one of the things I wanted to be able to do is: 'characters += str' Or more precisely: token_buf = characters() token_buf += "example" token_buf += "\n" print token_buf >>> "example\n" Now, an ordinary list would concatenate the string *object* onto the end of the list; whereas the character array would concatenate the string characters to the end of the character array. Also note that the __str__ method of the character array returns a vanilla string object of its contents. (What I am describing here is exactly the behavior of Java StringBuffer.) -- Talin From paul at prescod.net Thu Aug 31 10:05:18 2006 From: paul at prescod.net (Paul Prescod) Date: Thu, 31 Aug 2006 01:05:18 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <20060831044354.GH6257@performancedrivers.com> References: <20060827184941.1AE8.JCARLSON@uci.edu> <20060829102307.1B0F.JCARLSON@uci.edu> <6a36e7290608302056v4b0e68abrfe0c5b1fc927ff@mail.gmail.com> <20060831044354.GH6257@performancedrivers.com> Message-ID: <1cb725390608310105j2f8ee298p3a44d91fc91140ad@mail.gmail.com> On 8/30/06, Jack Diederich wrote: > > On Wed, Aug 30, 2006 at 08:56:03PM -0700, Bob Ippolito wrote: > > > and also based on the cET (and NFS) experiences, it wouldn't surprise > me > > > if a naive 32-bit text string implementation will, on average, slow > things down > > > *more* than any string view implementation can speed things up > again... > > > > > > (in other words, I'm convinced that we need a polymorphic string > type. I'm not > > > so sure we need views, but if we have the former, we can use that > mechanism to > > > support the latter) > > > > +1 for polymorphic strings. > > > > This would give us the best of both worlds: compact representations > > for ASCII and Latin-1, full 32-bit text when needed, and the > > possibility to implement further optimizations when necessary. It > > could add a bit of complexity and/or a massive speed penalty > > (depending on how naive the implementation is) around character > > operations though. > > Having watched Fredrik casually double the speed of many str and unicode > operations in a week I'm easily +1 on whatever he says. Bob's support > makes that a +2, he struck me as quite sane too. > > That said can you guys expand on what polymorphic[1] means here in > particular? I think that Bob alluded to it. They are talking about a string that uses 1 byte-per-character for ASCII text, perhaps two bytes-per-character for a mix of Greek and Russian text and four bytes-per-character for certain Chinese or Japanese strings. From the Python programmers' point of view it should be an invisible optimization. Paul Prescod -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060831/0f61bc29/attachment-0001.htm From fredrik at pythonware.com Thu Aug 31 10:21:00 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Thu, 31 Aug 2006 10:21:00 +0200 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <20060831044354.GH6257@performancedrivers.com> References: <20060827184941.1AE8.JCARLSON@uci.edu> <20060829102307.1B0F.JCARLSON@uci.edu> <6a36e7290608302056v4b0e68abrfe0c5b1fc927ff@mail.gmail.com> <20060831044354.GH6257@performancedrivers.com> Message-ID: Jack Diederich wrote: > That said can you guys expand on what polymorphic[1] means here in particular? > Python wise I can only think of the str/unicode/buffer split. If the > fraternity of strings doesn't include views (which I haven't needed either) > what are you considering for the other kinds? the idea is to allow a given string object to use different kinds of storage depending on what data it contains, and how it's being used. off the top of my head, I'd imagine using at least: wide unicode (32-bit) 8-bit ascii/iso-8859-1 utf-8 and possibly also one or more of narrow unicode (16-bit) 8-bit encoded (arbitrary 8-bit encodings) utf-16 selected asian encodings all these look and behave the same at the Python level, as well as when using "high-level" C API:s. ob_type may differ (also during an object's lifetime), but type(s) is always the same. this approach gives you lots of advantages: - lots of operations can be carried out without having to convert the data (all the formats listed above supports forward iteration, and most text-level operations). - you'll save tons of memory in applications that uses text mostly in a few character sets (and less memory means more speed). - adding (or removing) specific string implementations becomes trivial, both for the core developers and extension writers. etc. the main disadvantage is that it becomes a bit more difficult to deal with strings at the C level (but properly dealing with both 8-bit and Unicode strings is already a pain in the ass, and I'm not sure this has to be any harder. just slightly different). for some details on apple's implementation (thanks bob!), see: https://developer.apple.com/documentation/CoreFoundation/Conceptual/CFStrings/Concepts/StringStorage.html From jimjjewett at gmail.com Thu Aug 31 17:38:59 2006 From: jimjjewett at gmail.com (Jim Jewett) Date: Thu, 31 Aug 2006 11:38:59 -0400 Subject: [Python-3000] have zip() raise exception for sequences of different lengths In-Reply-To: <305688A8-0CFA-4F80-80EA-E3D2343D7226@python.org> References: <44F608B6.5010209@ewtllc.com> <305688A8-0CFA-4F80-80EA-E3D2343D7226@python.org> Message-ID: On 8/30/06, Barry Warsaw wrote: > On Aug 30, 2006, at 5:57 PM, Guido van Rossum wrote: > > Perhaps a compromise could be to add a keyword parameter to request > > such an exception? (We could even add three options: truncate, pad, > > error, with truncate being the default, and pad being the old map() > > and filter() behavior.) > What about a keyword argument called 'filler' which can be an n-sized > sequence or a callable. How about a keyword-only argument called finish which is a callable to deal with the problem? When any sequence is exhausted, its position is filled with StopIteration, and then finish(result) is returned. For example, >>> g=zip("abc", (1,2)) The third call to g.next() will return the result of finish('c', StopIteration) def finish_truncate(*args): # The default, like today raise StopIteration def finish_error(*args): if all(v is StopIteration for v in args): raise StopIteration raise ValueError("Mismatched sequence length %s" % args) def finish_padNone(*args): if all(v is StopIteration for v in args): raise StopIteration return tuple((v if v is not StopIteration else None) for v in args) -jJ From barry at python.org Thu Aug 31 17:44:53 2006 From: barry at python.org (Barry Warsaw) Date: Thu, 31 Aug 2006 11:44:53 -0400 Subject: [Python-3000] have zip() raise exception for sequences of different lengths In-Reply-To: References: <44F608B6.5010209@ewtllc.com> <305688A8-0CFA-4F80-80EA-E3D2343D7226@python.org> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 31, 2006, at 11:38 AM, Jim Jewett wrote: > On 8/30/06, Barry Warsaw wrote: > >> What about a keyword argument called 'filler' which can be an n-sized >> sequence or a callable. > > How about a keyword-only argument called finish which is a callable to > deal with the problem? When any sequence is exhausted, its position > is filled with StopIteration, and then finish(result) is returned. Nice! - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iQCVAwUBRPcD+3EjvBPtnXfVAQKviQP/fEcBu7t2iXEfBom3flvDgcoauJp+/XSS s2zdIivkQAZgs8kmbtYpk0R4KPyIUhyjHahzcxvUKKXGakfpIl73FBGSK+XfG/iq IqQ33dW4Gl6YBt9HpOLVd0NP1RWUGl+QNegLP2ihgLoRFi0QK8fBj0FPoxHdHrfu rIGXwJe6Qlg= =0PRM -----END PGP SIGNATURE----- From rhettinger at ewtllc.com Thu Aug 31 18:12:44 2006 From: rhettinger at ewtllc.com (Raymond Hettinger) Date: Thu, 31 Aug 2006 09:12:44 -0700 Subject: [Python-3000] have zip() raise exception for sequences of different lengths In-Reply-To: References: <44F608B6.5010209@ewtllc.com> <305688A8-0CFA-4F80-80EA-E3D2343D7226@python.org> Message-ID: <44F70A7C.602@ewtllc.com> >How about a keyword-only argument called finish which is a callable to >deal with the problem? When any sequence is exhausted, its position >is filled with StopIteration, and then finish(result) is returned. > > > How about we resist the urge to complicate the snot out of a basic looping construct. Hypergeneralization is more of a sin than premature optimization. It is important that zip() be left as dirt simple as possible. In the tutorial (section 5.6), we're able to use short, simple examples to teach all of the fundamental looping techniques to total beginners in a way that lets them save their brain power for learning exceptions, classes, generators, packages, and whatnot. Creative talent is being wasted here just to solve a non-problem. Please keep Py3k on track for cruft removal. We're seeing way too much discussion on random, screwball proposals rather that focusing on what really matters: Keeping the tried and true while removing stuff we've always wanted to take away. Raymond From guido at python.org Thu Aug 31 18:29:32 2006 From: guido at python.org (Guido van Rossum) Date: Thu, 31 Aug 2006 09:29:32 -0700 Subject: [Python-3000] have zip() raise exception for sequences of different lengths In-Reply-To: <44F70A7C.602@ewtllc.com> References: <44F608B6.5010209@ewtllc.com> <305688A8-0CFA-4F80-80EA-E3D2343D7226@python.org> <44F70A7C.602@ewtllc.com> Message-ID: On 8/31/06, Raymond Hettinger wrote: > > >How about a keyword-only argument called finish which is a callable to > >deal with the problem? When any sequence is exhausted, its position > >is filled with StopIteration, and then finish(result) is returned. > > How about we resist the urge to complicate the snot out of a basic > looping construct. Hypergeneralization is more of a sin than premature > optimization. Hear, hear! Hypergeneralization adds features you can never get rid of even though they may only be useful for <1% of the populations. At least unnecessary optimizations can be rolled back safely. > It is important that zip() be left as dirt simple as possible. In the > tutorial (section 5.6), we're able to use short, simple examples to > teach all of the fundamental looping techniques to total beginners in a > way that lets them save their brain power for learning exceptions, > classes, generators, packages, and whatnot. > > Creative talent is being wasted here just to solve a non-problem. > Please keep Py3k on track for cruft removal. We're seeing way too much > discussion on random, screwball proposals rather that focusing on what > really matters: Keeping the tried and true while removing stuff we've > always wanted to take away. Amen. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From g.brandl at gmx.net Thu Aug 31 19:34:59 2006 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 31 Aug 2006 19:34:59 +0200 Subject: [Python-3000] have zip() raise exception for sequences of different lengths In-Reply-To: <44F70A7C.602@ewtllc.com> References: <44F608B6.5010209@ewtllc.com> <305688A8-0CFA-4F80-80EA-E3D2343D7226@python.org> <44F70A7C.602@ewtllc.com> Message-ID: Raymond Hettinger wrote: >>How about a keyword-only argument called finish which is a callable to >>deal with the problem? When any sequence is exhausted, its position >>is filled with StopIteration, and then finish(result) is returned. >> >> >> > > How about we resist the urge to complicate the snot out of a basic > looping construct. Hypergeneralization is more of a sin than premature > optimization. > > It is important that zip() be left as dirt simple as possible. Added to PEP 3099. Georg From ironfroggy at gmail.com Thu Aug 31 19:42:57 2006 From: ironfroggy at gmail.com (Calvin Spealman) Date: Thu, 31 Aug 2006 13:42:57 -0400 Subject: [Python-3000] Exception Expressions Message-ID: <76fd5acf0608311042k231fb36w1bf5d1e7e4eebe0c@mail.gmail.com> I thought I felt in the mood for some abuse today, so I'm proposing something sure to give me plenty of crap, but maybe someone will enjoy the idea, anyway. This is a step beyond the recently added conditional expressions. I actually made this up as a joke, explaining at which point we would have gone too far with branching logic in an expression. After making the joke, I was sad to realize I didn't mind the idea and thought I'd see if anyone else doesn't mind it either. expr1 except expr2 if exc_type For example, given a list, letters, of ['a', 'b', 'c'], we would be able to do the following: print letters[7] except "N/A" if IndexError This would translate to something along the lines of: try: _tmp = letters[7] except IndexError: _tmp = "N/A" print _tmp Obviously, the except in an expression has to take precedence over if expressions, otherwise it would evaluate '"N/A" if IndexError" first. The syntax can be extended in some ways, to allow for handling multiple exception types for one result or different results for different exception types: foo() except "Bar or Baz!?" if BarError, BazError foo() except "Bar!" if BarError, "Baz!" if BazError Other example use cases: # Fallback on an alternative path open(filename) except open(filename2) if IOError # Handle divide-by-zero while expr != "quit": print eval(expr) except "Can not divide by zero!" if ZeroDivisionError expr = raw_input() # Use a cache when an external resource timesout db.get(key) except cache.get(key) if TimeoutError Only very basic exception handling would be useful with this syntax, so nothing would ever get out of hand, unless someone wasn't caring about their code looking good and keeping good line lengths, so their code probably wouldn't look great to begin with. If there is any positive response I'll write up a PEP. From brett at python.org Thu Aug 31 20:20:20 2006 From: brett at python.org (Brett Cannon) Date: Thu, 31 Aug 2006 11:20:20 -0700 Subject: [Python-3000] Exception Expressions In-Reply-To: <76fd5acf0608311042k231fb36w1bf5d1e7e4eebe0c@mail.gmail.com> References: <76fd5acf0608311042k231fb36w1bf5d1e7e4eebe0c@mail.gmail.com> Message-ID: On 8/31/06, Calvin Spealman wrote: > > I thought I felt in the mood for some abuse today, so I'm proposing > something sure to give me plenty of crap, but maybe someone will enjoy > the idea, anyway. Never hurts too much to try, huh? =) Plus it gives me a break from my work. This is a step beyond the recently added conditional > expressions. I actually made this up as a joke, explaining at which > point we would have gone too far with branching logic in an > expression. After making the joke, I was sad to realize I didn't mind > the idea and thought I'd see if anyone else doesn't mind it either. > > expr1 except expr2 if exc_type > > For example, given a list, letters, of ['a', 'b', 'c'], we would be > able to do the following: > > print letters[7] except "N/A" if IndexError So this feels like the Perl idiom of using die: ``open(file) or die`` (or something like that; I have never been a Perl guy so I could be off). This would translate to something along the lines of: > > try: > _tmp = letters[7] > except IndexError: > _tmp = "N/A" > print _tmp > > Obviously, the except in an expression has to take precedence over if > expressions, otherwise it would evaluate '"N/A" if IndexError" first. > The syntax can be extended in some ways, to allow for handling > multiple exception types for one result or different results for > different exception types: > > foo() except "Bar or Baz!?" if BarError, BazError > foo() except "Bar!" if BarError, "Baz!" if BazError > > Other example use cases: > > # Fallback on an alternative path > open(filename) except open(filename2) if IOError > > # Handle divide-by-zero > while expr != "quit": > print eval(expr) except "Can not divide by zero!" if > ZeroDivisionError > expr = raw_input() > > # Use a cache when an external resource timesout > db.get(key) except cache.get(key) if TimeoutError > > Only very basic exception handling would be useful with this syntax, > so nothing would ever get out of hand, unless someone wasn't caring > about their code looking good and keeping good line lengths, so their > code probably wouldn't look great to begin with. The problem I have with this whole proposal is that catching exceptions should be very obvious in the source code. This proposal does not help with that ideal. So I am -1 on the whole idea. -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060831/602811a4/attachment.html From jimjjewett at gmail.com Thu Aug 31 20:21:32 2006 From: jimjjewett at gmail.com (Jim Jewett) Date: Thu, 31 Aug 2006 14:21:32 -0400 Subject: [Python-3000] Exception Expressions In-Reply-To: <76fd5acf0608311042k231fb36w1bf5d1e7e4eebe0c@mail.gmail.com> References: <76fd5acf0608311042k231fb36w1bf5d1e7e4eebe0c@mail.gmail.com> Message-ID: > expr1 except expr2 if exc_type ... > print letters[7] except "N/A" if IndexError I sort of like it, though I'm more worried than you about ugly code. There have been many times when I wanted it so that I could use a list comprehension (or generator comprehension) instead of a function or block. The bad news is that I seem to be an anti-channeller, so my interest is perhaps not a *good* sign. -jJ From barry at python.org Thu Aug 31 20:28:14 2006 From: barry at python.org (Barry Warsaw) Date: Thu, 31 Aug 2006 14:28:14 -0400 Subject: [Python-3000] have zip() raise exception for sequences of different lengths In-Reply-To: <44F70A7C.602@ewtllc.com> References: <44F608B6.5010209@ewtllc.com> <305688A8-0CFA-4F80-80EA-E3D2343D7226@python.org> <44F70A7C.602@ewtllc.com> Message-ID: <01A13D75-12A7-4590-A4A1-F0488D4C105C@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 31, 2006, at 12:12 PM, Raymond Hettinger wrote: > It is important that zip() be left as dirt simple as possible. In > the tutorial (section 5.6), we're able to use short, simple > examples to teach all of the fundamental looping techniques to > total beginners in a way that lets them save their brain power for > learning exceptions, classes, generators, packages, and whatnot. Without addressing zip() in particular (as I said before, its current API is just fine to me), and while agreeing with the general principle of keeping things as simple as they can be, I don't believe you have to teach all the ins-and-outs of a particular function, class, or module as soon as it's introduced in the tutorial. It's perfectly fine to keep the intro examples short and sweet with a footnote saying "go here for more advanced usage". There's a ton of stuff in Python that total beginners just don't need to know right away. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iQCVAwUBRPcqQ3EjvBPtnXfVAQJxSAP/Yk2Dqh88iHThSKoqHHr9rURGbO2UWPvt R4xAFr4QMy4L8GtzLaG3l/RyeG59UwELgZCzRefw/aDuMotLrjrx4KvSb+FIgWmA r/lwWnF34xWH+oSwD459WotkRIJxVnwCAUOJtiCGYqSKfSEf0z5OwDJfGCRCb6Iv 8RRqoeBlVVQ= =iT7K -----END PGP SIGNATURE----- From talin at acm.org Thu Aug 31 20:46:13 2006 From: talin at acm.org (Talin) Date: Thu, 31 Aug 2006 11:46:13 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <20060831044354.GH6257@performancedrivers.com> References: <20060827184941.1AE8.JCARLSON@uci.edu> <20060829102307.1B0F.JCARLSON@uci.edu> <6a36e7290608302056v4b0e68abrfe0c5b1fc927ff@mail.gmail.com> <20060831044354.GH6257@performancedrivers.com> Message-ID: <44F72E75.2050204@acm.org> Jack Diederich wrote: >>> (in other words, I'm convinced that we need a polymorphic string type. I'm not >>> so sure we need views, but if we have the former, we can use that mechanism to >>> support the latter) >> +1 for polymorphic strings. >> >> This would give us the best of both worlds: compact representations >> for ASCII and Latin-1, full 32-bit text when needed, and the >> possibility to implement further optimizations when necessary. It >> could add a bit of complexity and/or a massive speed penalty >> (depending on how naive the implementation is) around character >> operations though. >> >> For implementation ideas, Apple's CoreFoundation has a mature >> implementation of polymorphic strings in C (which is the basis for >> their NSString type in Objective-C), and there's a cross-platform >> subset of it available as CF-Lite: >> http://developer.apple.com/opensource/cflite.html >> > > Having watched Fredrik casually double the speed of many str and unicode > operations in a week I'm easily +1 on whatever he says. Bob's support > makes that a +2, he struck me as quite sane too. One way to handle this efficiently would be to only support the encodings which have a constant character size: ASCII, Latin-1, UCS-2 and UTF-32. In other words, if the content of your text is plain ASCII, use an 8-bit-per-character string; If the content is limited to the Unicode BMF (Basic Multilingual Plane) use UCS-2; And if you are using Unicode supplementary characters, use UTF-32. (The difference between UCS-2 and UTF-16 is that UCS-2 is always 2 bytes per character, and doesn't support the supplemental characters above 0xffff, whereas UTF-16 characters can be either 2 or 4 bytes.) By avoiding UTF-8, UTF-16 and other variable-character-length formats, you can always insure that character index operations are done in constant time. Index operations would simply require scaling the index by the character size, rather than having to scan through the string and count characters. The drawback of this method is that you may be forced to transform the entire string into a wider encoding if you add a single character that won't fit into the current encoding. (Another option is to simply make all strings UTF-32 -- which is not that unreasonable, considering that text strings normally make up only a small fraction of a program's memory footprint. I am sure that there are applications that don't conform to this generalization, however. ) -- Talin From guido at python.org Thu Aug 31 20:55:15 2006 From: guido at python.org (Guido van Rossum) Date: Thu, 31 Aug 2006 11:55:15 -0700 Subject: [Python-3000] Making more effective use of slice objects in Py3k In-Reply-To: <44F72E75.2050204@acm.org> References: <20060827184941.1AE8.JCARLSON@uci.edu> <20060829102307.1B0F.JCARLSON@uci.edu> <6a36e7290608302056v4b0e68abrfe0c5b1fc927ff@mail.gmail.com> <20060831044354.GH6257@performancedrivers.com> <44F72E75.2050204@acm.org> Message-ID: On 8/31/06, Talin wrote: > One way to handle this efficiently would be to only support the > encodings which have a constant character size: ASCII, Latin-1, UCS-2 > and UTF-32. In other words, if the content of your text is plain ASCII, > use an 8-bit-per-character string; If the content is limited to the > Unicode BMF (Basic Multilingual Plane) use UCS-2; And if you are using > Unicode supplementary characters, use UTF-32. > > (The difference between UCS-2 and UTF-16 is that UCS-2 is always 2 bytes > per character, and doesn't support the supplemental characters above > 0xffff, whereas UTF-16 characters can be either 2 or 4 bytes.) I think we should also support UTF-16, since Java and .NET (and Win32?) appear to be using effectively; making surrogate handling an application issue doesn't seem *too* big of a burden for many apps. > By avoiding UTF-8, UTF-16 and other variable-character-length formats, > you can always insure that character index operations are done in > constant time. Index operations would simply require scaling the index > by the character size, rather than having to scan through the string and > count characters. > > The drawback of this method is that you may be forced to transform the > entire string into a wider encoding if you add a single character that > won't fit into the current encoding. A way to handle UTF-8 strings and other variable-length encodings would be to maintain a small cache of index positions with the string object. > (Another option is to simply make all strings UTF-32 -- which is not > that unreasonable, considering that text strings normally make up only a > small fraction of a program's memory footprint. I am sure that there are > applications that don't conform to this generalization, however. ) Here you are effectively voting against polymorphic strings. I believe Fredrik has good reasons to doubt this assertion. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From tjreedy at udel.edu Thu Aug 31 22:58:55 2006 From: tjreedy at udel.edu (tjreedy) Date: Thu, 31 Aug 2006 22:58:55 +0200 Subject: [Python-3000] Making more effective use of slice objects in Py3k References: <20060827184941.1AE8.JCARLSON@uci.edu> <20060829102307.1B0F.JCARLSON@uci.edu> <6a36e7290608302056v4b0e68abrfe0c5b1fc927ff@mail.gmail.com> Message-ID: "Bob Ippolito" wrote in message news:6a36e7290608302056v4b0e68abrfe0c5b1fc927ff at mail.gmail.com... > +1 for polymorphic strings. A strong +1 here also. > > This would give us the best of both worlds: compact representations > for ASCII and Latin-1, full 32 bit text when needed, and the > possibility to implement further optimizations when necessary. As I understand current plans, Python 3 will have a polymorphic integer type that handles details of switching between the two current implementations, one for efficiency, and one for generality, behind the scenes. I think it would be a great selling point for people to adopt Python 3 if it also handled the even worse nastiness of text forms behind the scenes, and kept the efficiency of special case uses (as in all ascii chars) while making the transition to generality more seamless than it is now. These two similar features would be enough, to me, to make Py3 more than just 2.x with cruft removed. Terry J. Reedy From rhettinger at ewtllc.com Thu Aug 31 23:29:36 2006 From: rhettinger at ewtllc.com (Raymond Hettinger) Date: Thu, 31 Aug 2006 14:29:36 -0700 Subject: [Python-3000] Exception Expressions In-Reply-To: References: <76fd5acf0608311042k231fb36w1bf5d1e7e4eebe0c@mail.gmail.com> Message-ID: <44F754C0.8080404@ewtllc.com> >The bad news is that I seem to be an anti-channeller, so my interest >is perhaps not a *good* sign. > > > QOTW From tomerfiliba at gmail.com Thu Aug 31 23:43:44 2006 From: tomerfiliba at gmail.com (tomer filiba) Date: Thu, 31 Aug 2006 23:43:44 +0200 Subject: [Python-3000] Comment on iostack library Message-ID: <1d85506f0608311443s108822c1n31682ba765b2f3e0@mail.gmail.com> i haven't been online for the last couple of days, so i'll unify my replies into one post. [Talin] > Right now, a typical > file handle consists of 3 "layers" - one representing the backing store > (file, memory, network, etc.), one for adding buffering, and one > representing the program-level API for reading strings, bytes, decoded > text, etc. yes, and it's also good you noted *typical*. the design is to allow virtually unlimited number of such layers, stacked one after the other, giving you very fine level of control without having to write a single line of "procedural" or tailored code. you just mix in what you want. [Talin] > I wonder if it wouldn't be better to cut that down to two. Specifically, > I would like to suggest eliminating the buffering layer. > My reasoning is fairly straightforward: Most file system handles, > network handles and other operating system handles already support > buffering, and they do a far better job of it than we can. indeed, but as guido said (and i believe it also says so at my wiki page), stdio cannot be trusted, let alone the way different OSes implement things. buffering, for once, is a horrible issue. i remember an old C program i wrote that worked fine on windows, but not on linux, because i didn't print a newline and stdout was line-buffered... i couldn't see the output, and it was a nightmare to debug. [Talin] > Well, as far as readline goes: In order to split the text into lines, > you have to decode the text first anyway, which is a layer 3 operation. > You can't just read bytes until you get a \n, because the file you are > reading might be encoded in UCS2 or something. well, the LineBufferedLayer can be "configured" to split on any "marker", i.e.: LineBufferedLayer(stream, marker = "\x00\x0a") and of course layer 3, which creates layer 2, can set this marker to any byte sequence. note it's a *byte* sequence, not chars, since this passes down to layer 1 transparently. i.e. delimiters = {"utf8" : "\x0a", "utf16" : "\x00\x0a"} def textfile(filename, mode, encoding = None): f = FileStream(filename, mode) f = LineBufferingLayer(f, delimiters[encoding]) f = TextInterface(f, encoding) return f [Talin] > It seems to me that no matter how you slice it, you can't have an > abstract "buffering" layer that is independent of both the layer beneath > and the layer above. but that's the whole idea! buffering is a complicated task that must *not* be rewritten for every type of underlying storage. if one wanted to write or read lines over a socket, one shouldn't have need to reimplement file-like line buffering, as done by socket.py. i want to be able to read lines directly from any stream: socket, file, or memory. how i choose to implement my HTTP parser is my only concern, i don't want to be limited by the kind of stream my parser would work over. [Nick] > You'd insert a buffering layer at the appropriate point for whatever you're > trying to do. The advantage of pulling the buffering out into a separate layer > is that it can be reused with different byte sources & sinks by supplying the > appropriate configuration parameters, instead of having to reimplement it for > each different source/sink. indeed [Marcin] > I think buffering makes sense as the topmost layer, and typically only > there. > Encoding conversion and newline conversion should be performed a block > at a time, below buffering, so not only I/O syscalls, but also > invocations of the recoding machinery are amortized by buffering. you have a good point, which i also stumbled upon when implementing the TextInterface. but how would you suggest to solve it? write()ing is always simpler, because you already have the entire buffer, which you can encode as a chunk. when read()ing, you can decode() the entire pre-read buffer first, but then you have a "tail" of undecodable data (an incomplete character or record), which would be quite nasty to handle. besides, encoding suffers from many issues. suppose you have a damaged UTF8 file, which you read char-by-char. when we reach the damaged part, you'll never be able to "skip" it, as we'll just keep read()ing bytes, hoping to make a character out of it , until we reach EOF, i.e.: def read_char(self): buf = "" while not self._stream.eof: buf += self._stream.read(1) try: return buf.decode("utf8") except ValueError: pass which leads me to the following thought: maybe we should have an "enhanced" encoding library for py3k, which would report *incomplete* data differently from *invalid* data. today it's just a ValueError: suppose decode() would raise IncompleteDataError when the given data is not sufficient to be decoded successfully, and ValueError when the data is just corrupted. that could aid iostack greatly. -tomer From ironfroggy at gmail.com Thu Aug 31 23:50:02 2006 From: ironfroggy at gmail.com (Calvin Spealman) Date: Thu, 31 Aug 2006 17:50:02 -0400 Subject: [Python-3000] Exception Expressions In-Reply-To: References: <76fd5acf0608311042k231fb36w1bf5d1e7e4eebe0c@mail.gmail.com> Message-ID: <76fd5acf0608311450r6fbddd44n28ab6f83741b8699@mail.gmail.com> On 8/31/06, Brett Cannon wrote: > So this feels like the Perl idiom of using die: ``open(file) or die`` (or > something like that; I have never been a Perl guy so I could be off). > > > ... > > The problem I have with this whole proposal is that catching exceptions > should be very obvious in the source code. This proposal does not help with > that ideal. So I am -1 on the whole idea. > > -Brett "Ouch" on the associated my idea with perl! Although I agree that it is good to be obvious about exceptions, there are some cases when they are simply less than exceptional. For example, you can do d.get(key, default) if you know something is a dictionary, but for general mappings you can't rely on that, and may often use exceptions as a kind of logic control. No, that doesn't sync with the purity of exceptions, but sometimes practicality and real-world usage trumps theory. Only allowing a single expression, it shouldn't be able to get ugly. Also, maybe I hate to admit it but it could allow 'expr1 except expr2' as pretty something more like the 'or die' paradigm.