From jimjjewett at gmail.com Thu Feb 1 00:06:26 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Wed, 31 Jan 2007 18:06:26 -0500 Subject: [Python-3000] Poll: Lazy Unicode Strings For Py3k In-Reply-To: <20070131133737.5A61.JCARLSON@uci.edu> References: <20070131122426.5A5C.JCARLSON@uci.edu> <20070131133737.5A61.JCARLSON@uci.edu> Message-ID: On 1/31/07, Josiah Carlson wrote: > > "Jim Jewett" wrote: > > On 1/31/07, Josiah Carlson wrote: > > > Do you remember my "string view" post from last September/October or so? > > > It implemented almost all of the string API exactly as the string API > > > did, except that rather than returning strings, it returned views. > > So there would be places where you couldn't safely use it, even though > > it had all the required functionality. > Almost certainly, but the point is that you could get back to what you > wanted via str(obj), unicode(obj), etc., which would incur (in the worst > case) the overhead you saved before, or raise a MemoryError exception > (unless its linux, in which case you will likely segfault). > > How would you feel if it also > > (1) Claimed to be a subclass of str (though it might not actually > > inherit anything) > > (2) Implemented the rest of the methods by delegation. (Call str on > > itself, switch its "real" object to the new string, and delegate to > > that.) > I'm not terribly concerned about the implementation details of an object > I don't need to use. As long as it works, it is fine. I am concerned > about the implementation details of objects I will use. The reason to ask for these is that then you could use it anywhere a str could be used (unless they explicitly did CheckExact). Since the object itself would be in charge of creating a "normal" str when needed, you wouldn't have to do it pre-emptively before passing it to a library. > I believe the base type included with Python should allocate the memory > on creation. Why? Because the implementation is simple, and I believe > that a base type implementation should be as simple as possible. Do you think it should happen to do that as an implementation detail, or that it should *promise* to do so, and bind all string-alikes to the same promise? -jJ From jcarlson at uci.edu Thu Feb 1 01:18:17 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Wed, 31 Jan 2007 16:18:17 -0800 Subject: [Python-3000] Poll: Lazy Unicode Strings For Py3k In-Reply-To: References: <20070131133737.5A61.JCARLSON@uci.edu> Message-ID: <20070131160422.5A6D.JCARLSON@uci.edu> "Jim Jewett" wrote: > > On 1/31/07, Josiah Carlson wrote: > > > > "Jim Jewett" wrote: > > > On 1/31/07, Josiah Carlson wrote: > > > > > Do you remember my "string view" post from last September/October or so? > > > > It implemented almost all of the string API exactly as the string API > > > > did, except that rather than returning strings, it returned views. > > > > So there would be places where you couldn't safely use it, even though > > > it had all the required functionality. > > > Almost certainly, but the point is that you could get back to what you > > wanted via str(obj), unicode(obj), etc., which would incur (in the worst > > case) the overhead you saved before, or raise a MemoryError exception > > (unless its linux, in which case you will likely segfault). > > > > How would you feel if it also > > > > (1) Claimed to be a subclass of str (though it might not actually > > > inherit anything) > > > (2) Implemented the rest of the methods by delegation. (Call str on > > > itself, switch its "real" object to the new string, and delegate to > > > that.) > > > I'm not terribly concerned about the implementation details of an object > > I don't need to use. As long as it works, it is fine. I am concerned > > about the implementation details of objects I will use. > > The reason to ask for these is that then you could use it anywhere a > str could be used (unless they explicitly did CheckExact). Since the > object itself would be in charge of creating a "normal" str when > needed, you wouldn't have to do it pre-emptively before passing it to > a library. Certainly, but a well-behaved C extension or library should be using the single segment buffer interface anyways, to allow for the passing of array.array, numpy.array, buffer(...), etc. For views, this works great, you just return a pointer and length into the original object. For concatenation objects, one needs to render the string, but that is expected. For reference, I have written quite a bit of code that expects string-like things to be passed to C extensions, and by using the buffer interface, I have been able to use str, array and mmap instances interchangeably depending on what I want as a result, or what I'm using as temporary memory. > > I believe the base type included with Python should allocate the memory > > on creation. Why? Because the implementation is simple, and I believe > > that a base type implementation should be as simple as possible. > > Do you think it should happen to do that as an implementation detail, > or that it should *promise* to do so, and bind all string-alikes to > the same promise? What subtypes do are their own business. I only have an opinion for the str and unicode types allocating memory on creation. Aside from being simpler, it keeps the base types from delaying a MemoryError in low memory conditions. An implementation of string views (or concatenations) that is a subclass of the string type, which defers creation until necessary (or never), is perfectly reasonable. Whether it is in the stdlib or 3rd party, I don't care. (I swear, this is at least the second or third time I've said this) - Josiah From ntoronto at cs.byu.edu Thu Feb 1 03:53:49 2007 From: ntoronto at cs.byu.edu (Neil Toronto) Date: Wed, 31 Jan 2007 19:53:49 -0700 Subject: [Python-3000] Poll: Lazy Unicode Strings For Py3k In-Reply-To: <1cb725390701311318n21f57a7et95aef7d41dd8f130@mail.gmail.com> References: <45C07234.8070808@hastings.org> <1cb725390701311318n21f57a7et95aef7d41dd8f130@mail.gmail.com> Message-ID: <45C1563D.4070902@cs.byu.edu> Paul Prescod wrote: > String concatenation is a known issue in Python programming and > workarounds for it are common obfuscations in a language otherwise > famous for being clean. So I vote +1 on it. I abstain on slicing. > Seconded: +1 on concatenation, no opinion on the rest. It'd be great to retire the ''.join(my_big_list_of_strings) idiom. Neil From aahz at pythoncraft.com Thu Feb 1 04:02:20 2007 From: aahz at pythoncraft.com (Aahz) Date: Wed, 31 Jan 2007 19:02:20 -0800 Subject: [Python-3000] Poll: Lazy Unicode Strings For Py3k In-Reply-To: <45C07234.8070808@hastings.org> References: <45C07234.8070808@hastings.org> Message-ID: <20070201030220.GA9206@panix.com> On Wed, Jan 31, 2007, Larry Hastings wrote: > > I'd like to start a (hopefully final) round of discussion on the "lazy > strings" series of patches. What follows is a summary on the current > state of the patches, followed by five poll questions. While I don't have an opinion about the patch itself, I do have an opinion about other people's opinions. ;-) That is, my opinion is that unless you get a +1 from at least one of Fredrik, MvL, or MAL (and no -1 from any of them), this patch should be abandoned. (The exact set of developers doesn't matter, though you should be focused on people with commits in unicodeobject.c, and I'd recommend that Fredrik or MvL be on that list regardless.) -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "I disrespectfully agree." --SJM From larry at hastings.org Thu Feb 1 10:51:01 2007 From: larry at hastings.org (Larry Hastings) Date: Thu, 01 Feb 2007 01:51:01 -0800 Subject: [Python-3000] Poll: Lazy Unicode Strings For Py3k In-Reply-To: <20070201030220.GA9206@panix.com> References: <45C07234.8070808@hastings.org> <20070201030220.GA9206@panix.com> Message-ID: <45C1B805.5030306@hastings.org> Aahz wrote: > While I don't have an opinion about the patch itself, I do have an > opinion about other people's opinions. ;-) That is, my opinion is that > unless you get a +1 from at least one of Fredrik, MvL, or MAL (and no -1 > from any of them), this patch should be abandoned. (The exact set of > developers doesn't matter, though you should be focused on people with > commits in unicodeobject.c, and I'd recommend that Fredrik or MvL be on > that list regardless.) > I should focus how? With offers of cash rewards? I'm happy to field questions from anybody, on the list or via email. I'm sure all those folks are as aware of this thread as they need to be. Beyond that I don't see how I can affect if or when they render a vote. Not-that-cash-rewards-are-out-of-the-question-ly, /larry/ From tomerfiliba at gmail.com Thu Feb 1 12:43:03 2007 From: tomerfiliba at gmail.com (tomer filiba) Date: Thu, 1 Feb 2007 13:43:03 +0200 Subject: [Python-3000] the types module Message-ID: <1d85506f0702010343j26ddb0eeub63dafad8a83cf78@mail.gmail.com> i've had some difficulty with code that attempts to locate a type by its __module__ and __name__, something like: getattr(sys.modules[t.__module__], t.__name__) the trouble is, all builtin types claim to belong to the __builtin__ module. for example: >>> types.FunctionType >>> types.FunctionType.__name__ 'funcrtion' >>> types.FunctionType.__module__ '__builtin__' but -- >>> __builtin__.function Traceback (most recent call last): File "", line 1, in AttributeError: 'module' object has no attribute 'function' most, but not all, of the types are exposed in __builtin__... this required me to create an artificial mapping in which "__builtin__.function" is mapped to types.FunctionType, and then use this mapping instead of sys.modules, which adds more special cases on my part. on the other hand, the exceptions module works differently. all builtin exceptions are defined in the exceptions module, but are exposed through __builtin__: >>> EOFError.__module__ 'exceptions' >>> exceptions.EOFError >>> __builtin__.EOFError so i thought why not do the same with all builtin types? currently the types module (types.py) exposes some type objects (not all), and uses witchcraft to obtain them: try: raise TypeError except TypeError: tb = sys.exc_info()[2] TracebackType = type(tb) FrameType = type(tb.tb_frame) instead, let's make it a builtin module, in which all types will be defined; the useful types (int, str, ...) would be exposed into __builtin__ (just as the exceptions module does), while the less useful will be kept unexposed. this would make FunctionType.__module__ == "types", rather than "__builtin__", which would allow me to fetch it by name from sys.modules. -tomer -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070201/162438b9/attachment.html From aahz at pythoncraft.com Thu Feb 1 15:01:28 2007 From: aahz at pythoncraft.com (Aahz) Date: Thu, 1 Feb 2007 06:01:28 -0800 Subject: [Python-3000] Poll: Lazy Unicode Strings For Py3k In-Reply-To: <45C1B805.5030306@hastings.org> References: <45C07234.8070808@hastings.org> <20070201030220.GA9206@panix.com> <45C1B805.5030306@hastings.org> Message-ID: <20070201140128.GA24639@panix.com> On Thu, Feb 01, 2007, Larry Hastings wrote: > Aahz wrote: >> >> While I don't have an opinion about the patch itself, I do have an >> opinion about other people's opinions. ;-) That is, my opinion is that >> unless you get a +1 from at least one of Fredrik, MvL, or MAL (and no -1 >> from any of them), this patch should be abandoned. (The exact set of >> developers doesn't matter, though you should be focused on people with >> commits in unicodeobject.c, and I'd recommend that Fredrik or MvL be on >> that list regardless.) > > I should focus how? With offers of cash rewards? > > I'm happy to field questions from anybody, on the list or via email. > I'm sure all those folks are as aware of this thread as they need to > be. Beyond that I don't see how I can affect if or when they render a > vote. Maybe they are and maybe they aren't -- people don't always pay full attention to mailing lists. MvL at least has a standing offer to review patches if you review five other patches. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "I disrespectfully agree." --SJM From brett at python.org Thu Feb 1 20:12:20 2007 From: brett at python.org (Brett Cannon) Date: Thu, 1 Feb 2007 11:12:20 -0800 Subject: [Python-3000] the types module In-Reply-To: <1d85506f0702010343j26ddb0eeub63dafad8a83cf78@mail.gmail.com> References: <1d85506f0702010343j26ddb0eeub63dafad8a83cf78@mail.gmail.com> Message-ID: On 2/1/07, tomer filiba wrote: > i've had some difficulty with code that attempts to locate a type > by its __module__ and __name__, something like: > getattr(sys.modules[t.__module__], t.__name__) > > the trouble is, all builtin types claim to belong to the __builtin__ module. > for example: > >>> types.FunctionType > > >>> types.FunctionType.__name__ > 'funcrtion' > >>> types.FunctionType.__module__ > '__builtin__' > > but -- > >>> __builtin__.function > Traceback (most recent call last): > File "", line 1, in > AttributeError: 'module' object has no attribute 'function' > > most, but not all, of the types are exposed in __builtin__... this required > me to create an artificial mapping in which "__builtin__.function" is mapped > to types.FunctionType, and then use this mapping instead of sys.modules, > which adds more special cases on my part. > This has come up before on python-dev, IIRC. Double-check the archives. -Brett From rhamph at gmail.com Fri Feb 2 05:18:11 2007 From: rhamph at gmail.com (Adam Olsen) Date: Thu, 1 Feb 2007 21:18:11 -0700 Subject: [Python-3000] reference leak when pressing Enter at interpreter prompt In-Reply-To: References: Message-ID: On 1/31/07, Georg Brandl wrote: > Brett Cannon schrieb: > > Seems two references are leaking every time you press Enter at the > > interpreter prompt in a debug build. Anyone have an inkling of who > > introduced it? > > If anyone wants to look into it: > It was rev. 53421, the merging of the long-int-unification branch. long_richcompare doesn't Py_DECREF a and b allocated by CONVERT_BINOP. This exists in 53421 (and presumably earlier) by doing "1L == 2L" at the interpreter prompt. There might be another function or two with the same bug. -- Adam Olsen, aka Rhamphoryncus From brett at python.org Fri Feb 2 07:07:07 2007 From: brett at python.org (Brett Cannon) Date: Thu, 1 Feb 2007 22:07:07 -0800 Subject: [Python-3000] reference leak when pressing Enter at interpreter prompt In-Reply-To: References: Message-ID: On 2/1/07, Adam Olsen wrote: > On 1/31/07, Georg Brandl wrote: > > Brett Cannon schrieb: > > > Seems two references are leaking every time you press Enter at the > > > interpreter prompt in a debug build. Anyone have an inkling of who > > > introduced it? > > > > If anyone wants to look into it: > > It was rev. 53421, the merging of the long-int-unification branch. > > long_richcompare doesn't Py_DECREF a and b allocated by CONVERT_BINOP. > This exists in 53421 (and presumably earlier) by doing "1L == 2L" at > the interpreter prompt. There might be another function or two with > the same bug. > Thanks for the debugging, Adam. I personally don't have time right now to dig in to verify and patch, but hopefully someone does. Else I will try to get to it at some point between now and the end of PyCon. -Brett From martin at v.loewis.de Tue Feb 6 22:06:20 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 06 Feb 2007 22:06:20 +0100 Subject: [Python-3000] reference leak when pressing Enter at interpreter prompt In-Reply-To: References: Message-ID: <45C8EDCC.5060000@v.loewis.de> Brett Cannon schrieb: > Thanks for the debugging, Adam. I personally don't have time right > now to dig in to verify and patch, but hopefully someone does. Else I > will try to get to it at some point between now and the end of PyCon. I just fixed this and a few related bugs. Regards, Martin From collinw at gmail.com Fri Feb 9 15:55:23 2007 From: collinw at gmail.com (Collin Winter) Date: Fri, 9 Feb 2007 08:55:23 -0600 Subject: [Python-3000] Pre-peps on raise and except changes (was: Warning for 2.6 and greater) In-Reply-To: <43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com> References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com> <43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com> Message-ID: <43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com> The raise- and except-related PEPs from this discussion have been committed as PEP 3109 and PEP 3110, respectively. Thanks, everyone! Collin Winter From g.brandl at gmx.net Fri Feb 9 19:41:53 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 09 Feb 2007 19:41:53 +0100 Subject: [Python-3000] Pre-peps on raise and except changes (was: Warning for 2.6 and greater) In-Reply-To: <43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com> References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com> <43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com> <43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com> Message-ID: Collin Winter schrieb: > The raise- and except-related PEPs from this discussion have been > committed as PEP 3109 and PEP 3110, respectively. One question: will there be an exception keyword argument to set the traceback, to simplify e = Error(V) e.__traceback__ = tb raise e to raise Error(V, traceback=tb) I remember this being proposed, but could not find it in the PEPs. Georg From guido at python.org Fri Feb 9 21:09:55 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 9 Feb 2007 12:09:55 -0800 Subject: [Python-3000] Pre-peps on raise and except changes (was: Warning for 2.6 and greater) In-Reply-To: References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com> <43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com> <43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com> Message-ID: I agree that this API is better. If it's not in PEP 344 it should be added. On 2/9/07, Georg Brandl wrote: > Collin Winter schrieb: > > The raise- and except-related PEPs from this discussion have been > > committed as PEP 3109 and PEP 3110, respectively. > > One question: will there be an exception keyword argument to set the > traceback, to simplify > > e = Error(V) > e.__traceback__ = tb > raise e > > to > > raise Error(V, traceback=tb) > > I remember this being proposed, but could not find it in the PEPs. > > Georg > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From collinw at gmail.com Fri Feb 9 23:51:22 2007 From: collinw at gmail.com (Collin Winter) Date: Fri, 9 Feb 2007 16:51:22 -0600 Subject: [Python-3000] Pre-peps on raise and except changes (was: Warning for 2.6 and greater) In-Reply-To: References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com> <43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com> <43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com> Message-ID: <43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com> > On 2/9/07, Georg Brandl wrote: > > One question: will there be an exception keyword argument to set the > > traceback, to simplify > > > > e = Error(V) > > e.__traceback__ = tb > > raise e > > > > to > > > > raise Error(V, traceback=tb) > > > > I remember this being proposed, but could not find it in the PEPs. I believe the original proposal was something like raise E(V).with_traceback(T) My preference would be a method (as opposed to a keyword argument). On 2/9/07, Guido van Rossum wrote: > I agree that this API is better. If it's not in PEP 344 it should be added. Should this be added to PEP 344 or 3109? That is, do you want to see it before Python 3? Collin Winter From guido at python.org Fri Feb 9 23:57:13 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 9 Feb 2007 14:57:13 -0800 Subject: [Python-3000] Pre-peps on raise and except changes (was: Warning for 2.6 and greater) In-Reply-To: <43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com> References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com> <43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com> <43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com> <43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com> Message-ID: On 2/9/07, Collin Winter wrote: > > On 2/9/07, Georg Brandl wrote: > > > One question: will there be an exception keyword argument to set the > > > traceback, to simplify > > > > > > e = Error(V) > > > e.__traceback__ = tb > > > raise e > > > > > > to > > > > > > raise Error(V, traceback=tb) > > > > > > I remember this being proposed, but could not find it in the PEPs. > > I believe the original proposal was something like > > raise E(V).with_traceback(T) > > My preference would be a method (as opposed to a keyword argument). Fair enough; that way the signature of user-provided exceptions doesn't need to be messed with. > On 2/9/07, Guido van Rossum wrote: > > I agree that this API is better. If it's not in PEP 344 it should be added. > > Should this be added to PEP 344 or 3109? That is, do you want to see > it before Python 3? I think storing the traceback in the exception is a 3.0 feature, since it depends on the effective 'del e' at the end of the except clause for avoiding most cycles. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From pje at telecommunity.com Sat Feb 10 00:54:47 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri, 09 Feb 2007 18:54:47 -0500 Subject: [Python-3000] Pre-peps on raise and except changes (was: Warning for 2.6 and greater) In-Reply-To: References: <43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com> <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com> <43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com> <43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com> <43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com> Message-ID: <5.1.1.6.0.20070209185225.029f5c50@sparrow.telecommunity.com> At 02:57 PM 2/9/2007 -0800, Guido van Rossum wrote: > > On 2/9/07, Guido van Rossum wrote: > > > I agree that this API is better. If it's not in PEP 344 it should be > added. > > > > Should this be added to PEP 344 or 3109? That is, do you want to see > > it before Python 3? > >I think storing the traceback in the exception is a 3.0 feature, since >it depends on the effective 'del e' at the end of the except clause >for avoiding most cycles. We would then have to have a Python 3.0 API to fetch the traceback, otherwise there's no way to write code that works in both 2.6 and 3.0 and gets a traceback. Did we decide to keep sys.exc_info()? If so, then that would presumably work. From collinw at gmail.com Sat Feb 10 01:09:45 2007 From: collinw at gmail.com (Collin Winter) Date: Fri, 9 Feb 2007 18:09:45 -0600 Subject: [Python-3000] Pre-peps on raise and except changes (was: Warning for 2.6 and greater) In-Reply-To: <5.1.1.6.0.20070209185225.029f5c50@sparrow.telecommunity.com> References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com> <43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com> <43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com> <43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com> <5.1.1.6.0.20070209185225.029f5c50@sparrow.telecommunity.com> Message-ID: <43aa6ff70702091609h3a8a8fc8w875bad847190e4d7@mail.gmail.com> On 2/9/07, Phillip J. Eby wrote: > At 02:57 PM 2/9/2007 -0800, Guido van Rossum wrote: > > > On 2/9/07, Guido van Rossum wrote: > > > > I agree that this API is better. If it's not in PEP 344 it should be > > added. > > > > > > Should this be added to PEP 344 or 3109? That is, do you want to see > > > it before Python 3? > > > >I think storing the traceback in the exception is a 3.0 feature, since > >it depends on the effective 'del e' at the end of the except clause > >for avoiding most cycles. > > We would then have to have a Python 3.0 API to fetch the traceback, > otherwise there's no way to write code that works in both 2.6 and 3.0 and > gets a traceback. Did we decide to keep sys.exc_info()? If so, then that > would presumably work. sys.exc_info() will be kept, while the sys.exc_{type,value,traceback} attributes will be dropped. As an aside, should sys.exc_clear() be added to the to-drop list? Is there still a need for it given Python 3's exception cleanup semantics? Collin Winter From greg.ewing at canterbury.ac.nz Sat Feb 10 01:33:39 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 10 Feb 2007 13:33:39 +1300 Subject: [Python-3000] Pre-peps on raise and except changes In-Reply-To: <43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com> References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com> <43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com> <43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com> <43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com> Message-ID: <45CD12E3.9050803@canterbury.ac.nz> Collin Winter wrote: > I believe the original proposal was something like > > raise E(V).with_traceback(T) Does this mean you're not intending to have any syntactic variant of the raise statement that includes a traceback in 3.0? Or is this just so that forward-compatible code can be written in 2.6? If you wanted a distinctive syntax, it could be something like raise e with t -- Greg From guido at python.org Sat Feb 10 01:41:22 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 9 Feb 2007 16:41:22 -0800 Subject: [Python-3000] Pre-peps on raise and except changes In-Reply-To: <45CD12E3.9050803@canterbury.ac.nz> References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com> <43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com> <43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com> <43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com> <45CD12E3.9050803@canterbury.ac.nz> Message-ID: On 2/9/07, Greg Ewing wrote: > Collin Winter wrote: > > > I believe the original proposal was something like > > > > raise E(V).with_traceback(T) > > Does this mean you're not intending to have any syntactic > variant of the raise statement that includes a traceback > in 3.0? Or is this just so that forward-compatible code > can be written in 2.6? > > If you wanted a distinctive syntax, it could be something > like > > raise e with t I can see uses for endowing an exception object with a traceback without raising it (yet), so we'd still need the method; since we have the method I'm not sure that we need syntax; I don't expect this to be needed a lot. (Isn't there also a proposal for automatic exception chaining? That might mean we'll need this even less.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From pje at telecommunity.com Sat Feb 10 01:50:58 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri, 09 Feb 2007 19:50:58 -0500 Subject: [Python-3000] Pre-peps on raise and except changes In-Reply-To: <45CD12E3.9050803@canterbury.ac.nz> References: <43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com> <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com> <43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com> <43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com> <43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com> Message-ID: <5.1.1.6.0.20070209194256.03894be0@sparrow.telecommunity.com> At 01:33 PM 2/10/2007 +1300, Greg Ewing wrote: >Collin Winter wrote: > > > I believe the original proposal was something like > > > > raise E(V).with_traceback(T) > >Does this mean you're not intending to have any syntactic >variant of the raise statement that includes a traceback >in 3.0? That *is* the variant. ;) >Or is this just so that forward-compatible code >can be written in 2.6? Actually, forward compatible code would be easier with something syntactic, like your 'raise e with t' idea. It would allow the implementation to be different in 2.6 and 3.0, while using the same syntax. (In 2.6 it could use the existing machinery, while in 3.0 it could call the .with_traceback() method. Hm. Actually, that's not necessary. We could include .with_traceback(T) in 2.6, and just have old-style except: clauses delete the traceback from the returned objects. New-style except: clauses would work just as they would in 3.0. To summarize, in 2.6 we could support .with_traceback() and create exception instances with traceback attributes, but the old-style except: clauses could discard them to prevent cycles. Raising an exception instance with a __traceback__ attribute would get some special handling so that it's equivalent to 3-argument raise in today's Python. Likewise, generator.throw() would need the same special handling in 2.6. Meanwhile, sys.exc_info() still lives in both versions. To write 3.0-compatible code, you just use the 3.0 spellings of raise, throw(), and except. Sounds like a plan! From guido at python.org Sat Feb 10 02:03:14 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 9 Feb 2007 17:03:14 -0800 Subject: [Python-3000] Pre-peps on raise and except changes (was: Warning for 2.6 and greater) In-Reply-To: <43aa6ff70702091609h3a8a8fc8w875bad847190e4d7@mail.gmail.com> References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com> <43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com> <43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com> <43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com> <5.1.1.6.0.20070209185225.029f5c50@sparrow.telecommunity.com> <43aa6ff70702091609h3a8a8fc8w875bad847190e4d7@mail.gmail.com> Message-ID: On 2/9/07, Collin Winter wrote: > sys.exc_info() will be kept, while the sys.exc_{type,value,traceback} > attributes will be dropped. I understand why, but that doesn't make me uncomfortable with keeping it. Maybe in "3.0 compatibility mode" 2.6 could attach tracebacks to exception objects so we could be weened off it in 2.6? > As an aside, should sys.exc_clear() be added to the to-drop list? Is > there still a need for it given Python 3's exception cleanup > semantics? I don't think so -- AFAIK the same use case is handled well enough by the cleanup semantics of the except clause. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From collinw at gmail.com Sat Feb 10 02:08:28 2007 From: collinw at gmail.com (Collin Winter) Date: Fri, 9 Feb 2007 19:08:28 -0600 Subject: [Python-3000] Pre-peps on raise and except changes In-Reply-To: References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com> <43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com> <43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com> <43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com> <45CD12E3.9050803@canterbury.ac.nz> Message-ID: <43aa6ff70702091708v42d93ae5rb8a957088e955709@mail.gmail.com> On 2/9/07, Guido van Rossum wrote: > On 2/9/07, Greg Ewing wrote: > > Collin Winter wrote: > > > > > I believe the original proposal was something like > > > > > > raise E(V).with_traceback(T) > > > > Does this mean you're not intending to have any syntactic > > variant of the raise statement that includes a traceback > > in 3.0? Or is this just so that forward-compatible code > > can be written in 2.6? > > > > If you wanted a distinctive syntax, it could be something > > like > > > > raise e with t > > I can see uses for endowing an exception object with a traceback > without raising it (yet), so we'd still need the method; since we have > the method I'm not sure that we need syntax; I don't expect this to be > needed a lot. (Isn't there also a proposal for automatic exception > chaining? That might mean we'll need this even less.) The current 3-argument form of "raise" is used incredibly rarely (compared to other raise forms), so I don't see a need for this kind of syntactic support. Also, adding a "with" clause like that means we have to hash out whether it goes in front of "from" (in "raise ... from ...") or after it, etc, etc, and that's just begging for 100+-post bikeshedding threads. Collin Winter From guido at python.org Sat Feb 10 02:09:45 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 9 Feb 2007 17:09:45 -0800 Subject: [Python-3000] Pre-peps on raise and except changes (was: Warning for 2.6 and greater) In-Reply-To: References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com> <43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com> <43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com> <43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com> <5.1.1.6.0.20070209185225.029f5c50@sparrow.telecommunity.com> <43aa6ff70702091609h3a8a8fc8w875bad847190e4d7@mail.gmail.com> Message-ID: On 2/9/07, Guido van Rossum wrote: > On 2/9/07, Collin Winter wrote: > > sys.exc_info() will be kept, while the sys.exc_{type,value,traceback} > > attributes will be dropped. > > I understand why, but that doesn't make me uncomfortable with keeping (of course I means "doesn't make me *comfortable*") > it. Maybe in "3.0 compatibility mode" 2.6 could attach tracebacks to > exception objects so we could be weened off it in 2.6? > > > As an aside, should sys.exc_clear() be added to the to-drop list? Is > > there still a need for it given Python 3's exception cleanup > > semantics? > > I don't think so -- AFAIK the same use case is handled well enough by > the cleanup semantics of the except clause. > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sat Feb 10 02:14:47 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 9 Feb 2007 17:14:47 -0800 Subject: [Python-3000] Pre-peps on raise and except changes In-Reply-To: <5.1.1.6.0.20070209194256.03894be0@sparrow.telecommunity.com> References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com> <43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com> <43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com> <43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com> <45CD12E3.9050803@canterbury.ac.nz> <5.1.1.6.0.20070209194256.03894be0@sparrow.telecommunity.com> Message-ID: On 2/9/07, Phillip J. Eby wrote: > At 01:33 PM 2/10/2007 +1300, Greg Ewing wrote: > >Collin Winter wrote: > > > > > I believe the original proposal was something like > > > > > > raise E(V).with_traceback(T) > > > >Does this mean you're not intending to have any syntactic > >variant of the raise statement that includes a traceback > >in 3.0? > > That *is* the variant. ;) > > > >Or is this just so that forward-compatible code > >can be written in 2.6? > > Actually, forward compatible code would be easier with something syntactic, > like your 'raise e with t' idea. It would allow the implementation to be > different in 2.6 and 3.0, while using the same syntax. (In 2.6 it could > use the existing machinery, while in 3.0 it could call the > .with_traceback() method. > > Hm. Actually, that's not necessary. We could include .with_traceback(T) > in 2.6, and just have old-style except: clauses delete the traceback from > the returned objects. New-style except: clauses would work just as they > would in 3.0. > > To summarize, in 2.6 we could support .with_traceback() and create > exception instances with traceback attributes, but the old-style except: > clauses could discard them to prevent cycles. Raising an exception > instance with a __traceback__ attribute would get some special handling so > that it's equivalent to 3-argument raise in today's Python. Likewise, > generator.throw() would need the same special handling in 2.6. Meanwhile, > sys.exc_info() still lives in both versions. > > To write 3.0-compatible code, you just use the 3.0 spellings of raise, > throw(), and except. Sounds like a plan! Can't see anything wrong with this either. Collin, do you have enough to update your PEPs? I wonder if we should try to keep PEP 344 up to date, or if we should just do this in the Py3k PEPs; I'm okay with adding some notes about 2.6 to Py3k PEPs so I guess the latter would work. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From collinw at gmail.com Sat Feb 10 02:27:14 2007 From: collinw at gmail.com (Collin Winter) Date: Fri, 9 Feb 2007 19:27:14 -0600 Subject: [Python-3000] Pre-peps on raise and except changes (was: Warning for 2.6 and greater) In-Reply-To: References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com> <43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com> <43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com> <43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com> <5.1.1.6.0.20070209185225.029f5c50@sparrow.telecommunity.com> <43aa6ff70702091609h3a8a8fc8w875bad847190e4d7@mail.gmail.com> Message-ID: <43aa6ff70702091727g51ea4cccpff5ac1843f7c04f7@mail.gmail.com> On 2/9/07, Guido van Rossum wrote: > On 2/9/07, Collin Winter wrote: > > sys.exc_info() will be kept, while the sys.exc_{type,value,traceback} > > attributes will be dropped. > > I understand why, but that doesn't make me comfortable with keeping > it. Maybe in "3.0 compatibility mode" 2.6 could attach tracebacks to > exception objects so we could be weened off it in 2.6? That would imply that 2.6's "3.0 compatibility mode" would also activate the cleanup semantics for "except" clauses. Switching that kind of deep, subtle functionality on or off based on a command-line switch makes me uncomfortable. There would also have to be a way of distinguishing .pyc files produced by 2.6 versus those produced by 2.6 in 3.0-mode (since the cleanup semantics are implemented by emitting extra bytecode for the implicit inner try/finally block). > > As an aside, should sys.exc_clear() be added to the to-drop list? Is > > there still a need for it given Python 3's exception cleanup > > semantics? > > I don't think so -- AFAIK the same use case is handled well enough by > the cleanup semantics of the except clause. I've added sys.exc_clear()'s demise to PEP 3100. Collin Winter From collinw at gmail.com Sat Feb 10 02:35:36 2007 From: collinw at gmail.com (Collin Winter) Date: Fri, 9 Feb 2007 19:35:36 -0600 Subject: [Python-3000] Pre-peps on raise and except changes In-Reply-To: References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com> <43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com> <43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com> <43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com> <45CD12E3.9050803@canterbury.ac.nz> <5.1.1.6.0.20070209194256.03894be0@sparrow.telecommunity.com> Message-ID: <43aa6ff70702091735p2f281678mecaf416cc71ca361@mail.gmail.com> On 2/9/07, Guido van Rossum wrote: > On 2/9/07, Phillip J. Eby wrote: > > At 01:33 PM 2/10/2007 +1300, Greg Ewing wrote: > > >Collin Winter wrote: > > > > > > > I believe the original proposal was something like > > > > > > > > raise E(V).with_traceback(T) > > > > > >Does this mean you're not intending to have any syntactic > > >variant of the raise statement that includes a traceback > > >in 3.0? > > > > That *is* the variant. ;) > > > > > > >Or is this just so that forward-compatible code > > >can be written in 2.6? > > > > Actually, forward compatible code would be easier with something syntactic, > > like your 'raise e with t' idea. It would allow the implementation to be > > different in 2.6 and 3.0, while using the same syntax. (In 2.6 it could > > use the existing machinery, while in 3.0 it could call the > > .with_traceback() method. > > > > Hm. Actually, that's not necessary. We could include .with_traceback(T) > > in 2.6, and just have old-style except: clauses delete the traceback from > > the returned objects. New-style except: clauses would work just as they > > would in 3.0. > > > > To summarize, in 2.6 we could support .with_traceback() and create > > exception instances with traceback attributes, but the old-style except: > > clauses could discard them to prevent cycles. Raising an exception > > instance with a __traceback__ attribute would get some special handling so > > that it's equivalent to 3-argument raise in today's Python. Likewise, > > generator.throw() would need the same special handling in 2.6. Meanwhile, > > sys.exc_info() still lives in both versions. > > > > To write 3.0-compatible code, you just use the 3.0 spellings of raise, > > throw(), and except. Sounds like a plan! > > Can't see anything wrong with this either. Collin, do you have enough > to update your PEPs? I think so. I've already got language ready for the section on using BaseException.with_traceback() in the 2->3 raise translations, and I'll work up additional language for the transition plan sometime this weekend. > I wonder if we should try to keep PEP 344 up to date, or if we should > just do this in the Py3k PEPs; I'm okay with adding some notes about > 2.6 to Py3k PEPs so I guess the latter would work. If with_traceback() is going to be added in 2.6, I think at least that much should go in PEP 344. The rest falls under "transitioning to 3.0", so it should probably go in PEP 3109. Collin Winter From pje at telecommunity.com Sat Feb 10 04:44:22 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri, 09 Feb 2007 22:44:22 -0500 Subject: [Python-3000] Pre-peps on raise and except changes (was: Warning for 2.6 and greater) In-Reply-To: References: <43aa6ff70702091609h3a8a8fc8w875bad847190e4d7@mail.gmail.com> <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com> <43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com> <43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com> <43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com> <5.1.1.6.0.20070209185225.029f5c50@sparrow.telecommunity.com> <43aa6ff70702091609h3a8a8fc8w875bad847190e4d7@mail.gmail.com> Message-ID: <5.1.1.6.0.20070209223738.044ed8d0@sparrow.telecommunity.com> At 05:03 PM 2/9/2007 -0800, Guido van Rossum wrote: >On 2/9/07, Collin Winter wrote: > > sys.exc_info() will be kept, while the sys.exc_{type,value,traceback} > > attributes will be dropped. > >I understand why, but that doesn't make me uncomfortable with keeping >it. Maybe in "3.0 compatibility mode" 2.6 could attach tracebacks to >exception objects so we could be weened off it in 2.6? I notice that neither PEP addresses PEP 343 compatibility. Do we plan to make __exit__() only get one argument? Right now the protocol demands all three. I suppose we could pass one argument in 3.0, and if you want to support 2.6 you would have to add default arguments. Such code would be ugly as sin, but workable. I'm not 100% certain we *can't* ditch sys.exc_info(), but if we do, we still need *some* way to get the "current exception" and have it include a traceback, that will also work in 2.6. I don't believe there's any proposal for such an API currently outstanding. WSGI still uses sys.exc_info tuples, but we could always add a wsgiref.exc_info() that gets the current exception and turns it into such a tuple. ;-) Anyway, I suggest we either decide to deal with that sort of ugliness, or decide to live with sys.exc_info(), and then get on with whichever of those two choices you decide to make. :) From collinw at gmail.com Sat Feb 10 05:52:15 2007 From: collinw at gmail.com (Collin Winter) Date: Fri, 9 Feb 2007 22:52:15 -0600 Subject: [Python-3000] Pre-peps on raise and except changes (was: Warning for 2.6 and greater) In-Reply-To: <5.1.1.6.0.20070209223738.044ed8d0@sparrow.telecommunity.com> References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com> <43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com> <43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com> <43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com> <5.1.1.6.0.20070209185225.029f5c50@sparrow.telecommunity.com> <43aa6ff70702091609h3a8a8fc8w875bad847190e4d7@mail.gmail.com> <5.1.1.6.0.20070209223738.044ed8d0@sparrow.telecommunity.com> Message-ID: <43aa6ff70702092052o56a83545je2c7d570ddfdbb1b@mail.gmail.com> On 2/9/07, Phillip J. Eby wrote: > At 05:03 PM 2/9/2007 -0800, Guido van Rossum wrote: > >On 2/9/07, Collin Winter wrote: > > > sys.exc_info() will be kept, while the sys.exc_{type,value,traceback} > > > attributes will be dropped. > > > >I understand why, but that doesn't make me uncomfortable with keeping > >it. Maybe in "3.0 compatibility mode" 2.6 could attach tracebacks to > >exception objects so we could be weened off it in 2.6? > > I notice that neither PEP addresses PEP 343 compatibility. Do we plan to > make __exit__() only get one argument? Right now the protocol demands all > three. I suppose we could pass one argument in 3.0, and if you want to > support 2.6 you would have to add default arguments. Such code would be > ugly as sin, but workable. Couldn't __exit__() be passed (type(e), e, e.__traceback__) instead of *sys.exc_info()? That is, the source translation given in PEP 343 becomes mgr = (EXPR) exit = mgr.__exit__ # Not calling it yet value = mgr.__enter__() exc = True try: try: VAR = value # Only if "as VAR" is present BLOCK except Exception as e: # The exceptional case is handled here exc = False if not exit(type(e), e, e.__traceback__): raise # The exception is swallowed if exit() returns true finally: # The normal and non-local-goto cases are handled here if exc: exit(None, None, None) Collin Winter From ncoghlan at gmail.com Sat Feb 10 09:08:09 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 10 Feb 2007 18:08:09 +1000 Subject: [Python-3000] Pre-peps on raise and except changes In-Reply-To: <43aa6ff70702091735p2f281678mecaf416cc71ca361@mail.gmail.com> References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com> <43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com> <43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com> <43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com> <45CD12E3.9050803@canterbury.ac.nz> <5.1.1.6.0.20070209194256.03894be0@sparrow.telecommunity.com> <43aa6ff70702091735p2f281678mecaf416cc71ca361@mail.gmail.com> Message-ID: <45CD7D69.4090709@gmail.com> Collin Winter wrote: > I think so. I've already got language ready for the section on using > BaseException.with_traceback() in the 2->3 raise translations, and > I'll work up additional language for the transition plan sometime this > weekend. If with_traceback() is an instance method, does it mutate the existing exception or create a new one? To avoid any confusion, perhaps it should instead be a class method equivalent to the following: @classmethod def with_traceback(*args, **kwds): cls = args[0] tb = args[1] args = args[2:] exc = cls(*args, **kwds) exc.__traceback__ = tb return exc Usage would look like: raise E.with_traceback(T, V) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From pje at telecommunity.com Sat Feb 10 18:02:17 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Sat, 10 Feb 2007 12:02:17 -0500 Subject: [Python-3000] Pre-peps on raise and except changes (was: Warning for 2.6 and greater) In-Reply-To: <43aa6ff70702092052o56a83545je2c7d570ddfdbb1b@mail.gmail.co m> References: <5.1.1.6.0.20070209223738.044ed8d0@sparrow.telecommunity.com> <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com> <43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com> <43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com> <43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com> <5.1.1.6.0.20070209185225.029f5c50@sparrow.telecommunity.com> <43aa6ff70702091609h3a8a8fc8w875bad847190e4d7@mail.gmail.com> <5.1.1.6.0.20070209223738.044ed8d0@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20070210120058.0271e800@sparrow.telecommunity.com> At 10:52 PM 2/9/2007 -0600, Collin Winter wrote: >On 2/9/07, Phillip J. Eby wrote: >>At 05:03 PM 2/9/2007 -0800, Guido van Rossum wrote: >> >On 2/9/07, Collin Winter wrote: >> > > sys.exc_info() will be kept, while the sys.exc_{type,value,traceback} >> > > attributes will be dropped. >> > >> >I understand why, but that doesn't make me uncomfortable with keeping >> >it. Maybe in "3.0 compatibility mode" 2.6 could attach tracebacks to >> >exception objects so we could be weened off it in 2.6? >> >>I notice that neither PEP addresses PEP 343 compatibility. Do we plan to >>make __exit__() only get one argument? Right now the protocol demands all >>three. I suppose we could pass one argument in 3.0, and if you want to >>support 2.6 you would have to add default arguments. Such code would be >>ugly as sin, but workable. > >Couldn't __exit__() be passed (type(e), e, e.__traceback__) instead of >*sys.exc_info()? Sure, but *why*? After all, we're changing gen.throw() in the same way. My thought is, 2.6 would pass all three arguments, 3.0 just one. From guido at python.org Sat Feb 10 18:09:00 2007 From: guido at python.org (Guido van Rossum) Date: Sat, 10 Feb 2007 09:09:00 -0800 Subject: [Python-3000] Pre-peps on raise and except changes In-Reply-To: <45CD7D69.4090709@gmail.com> References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com> <43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com> <43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com> <45CD12E3.9050803@canterbury.ac.nz> <5.1.1.6.0.20070209194256.03894be0@sparrow.telecommunity.com> <43aa6ff70702091735p2f281678mecaf416cc71ca361@mail.gmail.com> <45CD7D69.4090709@gmail.com> Message-ID: Why don't you want it to mutate the instance? On 2/10/07, Nick Coghlan wrote: > Collin Winter wrote: > > I think so. I've already got language ready for the section on using > > BaseException.with_traceback() in the 2->3 raise translations, and > > I'll work up additional language for the transition plan sometime this > > weekend. > > If with_traceback() is an instance method, does it mutate the existing > exception or create a new one? > > To avoid any confusion, perhaps it should instead be a class method > equivalent to the following: > > @classmethod > def with_traceback(*args, **kwds): > cls = args[0] > tb = args[1] > args = args[2:] > exc = cls(*args, **kwds) > exc.__traceback__ = tb > return exc > > Usage would look like: > > raise E.with_traceback(T, V) > > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > --------------------------------------------------------------- > http://www.boredomandlaziness.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sat Feb 10 18:09:45 2007 From: guido at python.org (Guido van Rossum) Date: Sat, 10 Feb 2007 09:09:45 -0800 Subject: [Python-3000] Pre-peps on raise and except changes (was: Warning for 2.6 and greater) In-Reply-To: <5.1.1.6.0.20070210120058.0271e800@sparrow.telecommunity.com> References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com> <43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com> <43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com> <43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com> <5.1.1.6.0.20070209185225.029f5c50@sparrow.telecommunity.com> <43aa6ff70702091609h3a8a8fc8w875bad847190e4d7@mail.gmail.com> <5.1.1.6.0.20070209223738.044ed8d0@sparrow.telecommunity.com> <5.1.1.6.0.20070210120058.0271e800@sparrow.telecommunity.com> Message-ID: WFM. On 2/10/07, Phillip J. Eby wrote: > At 10:52 PM 2/9/2007 -0600, Collin Winter wrote: > >On 2/9/07, Phillip J. Eby wrote: > >>At 05:03 PM 2/9/2007 -0800, Guido van Rossum wrote: > >> >On 2/9/07, Collin Winter wrote: > >> > > sys.exc_info() will be kept, while the sys.exc_{type,value,traceback} > >> > > attributes will be dropped. > >> > > >> >I understand why, but that doesn't make me uncomfortable with keeping > >> >it. Maybe in "3.0 compatibility mode" 2.6 could attach tracebacks to > >> >exception objects so we could be weened off it in 2.6? > >> > >>I notice that neither PEP addresses PEP 343 compatibility. Do we plan to > >>make __exit__() only get one argument? Right now the protocol demands all > >>three. I suppose we could pass one argument in 3.0, and if you want to > >>support 2.6 you would have to add default arguments. Such code would be > >>ugly as sin, but workable. > > > >Couldn't __exit__() be passed (type(e), e, e.__traceback__) instead of > >*sys.exc_info()? > > Sure, but *why*? After all, we're changing gen.throw() in the same way. > > My thought is, 2.6 would pass all three arguments, 3.0 just one. > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From collinw at gmail.com Sat Feb 10 23:31:48 2007 From: collinw at gmail.com (Collin Winter) Date: Sat, 10 Feb 2007 16:31:48 -0600 Subject: [Python-3000] Pre-peps on raise and except changes In-Reply-To: <45CD7D69.4090709@gmail.com> References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com> <43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com> <43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com> <45CD12E3.9050803@canterbury.ac.nz> <5.1.1.6.0.20070209194256.03894be0@sparrow.telecommunity.com> <43aa6ff70702091735p2f281678mecaf416cc71ca361@mail.gmail.com> <45CD7D69.4090709@gmail.com> Message-ID: <43aa6ff70702101431j555dd7f0v383d06e08d389529@mail.gmail.com> On 2/10/07, Nick Coghlan wrote: > Collin Winter wrote: > > I think so. I've already got language ready for the section on using > > BaseException.with_traceback() in the 2->3 raise translations, and > > I'll work up additional language for the transition plan sometime this > > weekend. > > If with_traceback() is an instance method, does it mutate the existing > exception or create a new one? I say it mutates the instance. > To avoid any confusion, perhaps it should instead be a class method > equivalent to the following: > [snip] > > Usage would look like: > > raise E.with_traceback(T, V) What confusion do you foresee? Collin Winter From brett at python.org Sun Feb 11 00:07:59 2007 From: brett at python.org (Brett Cannon) Date: Sat, 10 Feb 2007 15:07:59 -0800 Subject: [Python-3000] Pre-peps on raise and except changes (was: Warning for 2.6 and greater) In-Reply-To: References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com> <43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com> <43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com> <5.1.1.6.0.20070209185225.029f5c50@sparrow.telecommunity.com> <43aa6ff70702091609h3a8a8fc8w875bad847190e4d7@mail.gmail.com> <5.1.1.6.0.20070209223738.044ed8d0@sparrow.telecommunity.com> <5.1.1.6.0.20070210120058.0271e800@sparrow.telecommunity.com> Message-ID: On 2/10/07, Guido van Rossum wrote: > WFM. > Wow, I think that is the shortest way you can OK an idea, Guido, without just leaving off the period. =) And for what it's worth, I'm +1 on adding default args and passing a single argument in Py3K and all three in 2.6 as well. -Brett From collinw at gmail.com Sun Feb 11 00:14:33 2007 From: collinw at gmail.com (Collin Winter) Date: Sat, 10 Feb 2007 17:14:33 -0600 Subject: [Python-3000] Pre-peps on raise and except changes (was: Warning for 2.6 and greater) In-Reply-To: <5.1.1.6.0.20070210120058.0271e800@sparrow.telecommunity.com> References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com> <43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com> <43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com> <43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com> <5.1.1.6.0.20070209185225.029f5c50@sparrow.telecommunity.com> <43aa6ff70702091609h3a8a8fc8w875bad847190e4d7@mail.gmail.com> <5.1.1.6.0.20070209223738.044ed8d0@sparrow.telecommunity.com> <5.1.1.6.0.20070210120058.0271e800@sparrow.telecommunity.com> Message-ID: <43aa6ff70702101514r478086aenf43dfb33c2e56458@mail.gmail.com> On 2/10/07, Phillip J. Eby wrote: > At 10:52 PM 2/9/2007 -0600, Collin Winter wrote: > >On 2/9/07, Phillip J. Eby wrote: > >>At 05:03 PM 2/9/2007 -0800, Guido van Rossum wrote: > >> >On 2/9/07, Collin Winter wrote: > >> > > sys.exc_info() will be kept, while the sys.exc_{type,value,traceback} > >> > > attributes will be dropped. > >> > > >> >I understand why, but that doesn't make me uncomfortable with keeping > >> >it. Maybe in "3.0 compatibility mode" 2.6 could attach tracebacks to > >> >exception objects so we could be weened off it in 2.6? > >> > >>I notice that neither PEP addresses PEP 343 compatibility. Do we plan to > >>make __exit__() only get one argument? Right now the protocol demands all > >>three. I suppose we could pass one argument in 3.0, and if you want to > >>support 2.6 you would have to add default arguments. Such code would be > >>ugly as sin, but workable. > > > >Couldn't __exit__() be passed (type(e), e, e.__traceback__) instead of > >*sys.exc_info()? > > Sure, but *why*? After all, we're changing gen.throw() in the same way. > > My thought is, 2.6 would pass all three arguments, 3.0 just one. My only concern was that keeping the three-argument signature means one less thing to change when transitioning to 3.0. Anyone really concerned about their context managers working in 2.6 and 3.0 could just use a decorator to ensure compatibility, though, so count me in. Collin Winter From ncoghlan at gmail.com Sun Feb 11 01:31:26 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 11 Feb 2007 10:31:26 +1000 Subject: [Python-3000] Pre-peps on raise and except changes (was: Warning for 2.6 and greater) In-Reply-To: <43aa6ff70702101514r478086aenf43dfb33c2e56458@mail.gmail.com> References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com> <43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com> <43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com> <43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com> <5.1.1.6.0.20070209185225.029f5c50@sparrow.telecommunity.com> <43aa6ff70702091609h3a8a8fc8w875bad847190e4d7@mail.gmail.com> <5.1.1.6.0.20070209223738.044ed8d0@sparrow.telecommunity.com> <5.1.1.6.0.20070210120058.0271e800@sparrow.telecommunity.com> <43aa6ff70702101514r478086aenf43dfb33c2e56458@mail.gmail.com> Message-ID: <45CE63DE.60400@gmail.com> Collin Winter wrote: > On 2/10/07, Phillip J. Eby wrote: >> My thought is, 2.6 would pass all three arguments, 3.0 just one. > > My only concern was that keeping the three-argument signature means > one less thing to change when transitioning to 3.0. Anyone really > concerned about their context managers working in 2.6 and 3.0 could > just use a decorator to ensure compatibility, though, so count me in. A lot of context managers will also adjust automatically when contextlib.contextmanager is updated to handle the change. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ncoghlan at gmail.com Sun Feb 11 01:35:36 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 11 Feb 2007 10:35:36 +1000 Subject: [Python-3000] Pre-peps on raise and except changes In-Reply-To: References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com> <43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com> <43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com> <45CD12E3.9050803@canterbury.ac.nz> <5.1.1.6.0.20070209194256.03894be0@sparrow.telecommunity.com> <43aa6ff70702091735p2f281678mecaf416cc71ca361@mail.gmail.com> <45CD7D69.4090709@gmail.com> Message-ID: <45CE64D8.1030704@gmail.com> Guido van Rossum wrote: > Why don't you want it to mutate the instance? The recent repeat of the API discussion about list.sort() & list.reversed() (mutate instance & return None) vs sorted() and reversed() (return new instance). I'm trying to see why mutating & returning self would be OK here, when it's not OK for a list to do the same thing. An alternate constructor as a class method ducks the question entirely. Cheers, Nick. > > On 2/10/07, Nick Coghlan wrote: >> Collin Winter wrote: >> > I think so. I've already got language ready for the section on using >> > BaseException.with_traceback() in the 2->3 raise translations, and >> > I'll work up additional language for the transition plan sometime this >> > weekend. >> >> If with_traceback() is an instance method, does it mutate the existing >> exception or create a new one? >> >> To avoid any confusion, perhaps it should instead be a class method >> equivalent to the following: >> >> @classmethod >> def with_traceback(*args, **kwds): >> cls = args[0] >> tb = args[1] >> args = args[2:] >> exc = cls(*args, **kwds) >> exc.__traceback__ = tb >> return exc >> >> Usage would look like: >> >> raise E.with_traceback(T, V) >> >> >> Cheers, >> Nick. >> >> -- >> Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia >> --------------------------------------------------------------- >> http://www.boredomandlaziness.org >> > > -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From guido at python.org Sun Feb 11 05:08:26 2007 From: guido at python.org (Guido van Rossum) Date: Sat, 10 Feb 2007 20:08:26 -0800 Subject: [Python-3000] Pre-peps on raise and except changes In-Reply-To: <45CE64D8.1030704@gmail.com> References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com> <43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com> <45CD12E3.9050803@canterbury.ac.nz> <5.1.1.6.0.20070209194256.03894be0@sparrow.telecommunity.com> <43aa6ff70702091735p2f281678mecaf416cc71ca361@mail.gmail.com> <45CD7D69.4090709@gmail.com> <45CE64D8.1030704@gmail.com> Message-ID: Somehow it seems that exceptions keep getting permission to violate the rules... (E.g. the insistence on a fixed base class is also considered unpythonic in other contexts.) Maybe it's because they're "exceptions" ? :-) Anyway, I believe there's a use case for re-raising an existing exception with an added traceback. After all the __traceback__ attribute is mutable. Returning the mutated object is acceptable here because the *dominant* use case is creating and raising an exception in one go: raise FooException().with_traceback() --Guido On 2/10/07, Nick Coghlan wrote: > Guido van Rossum wrote: > > Why don't you want it to mutate the instance? > > The recent repeat of the API discussion about list.sort() & > list.reversed() (mutate instance & return None) vs sorted() and > reversed() (return new instance). > > I'm trying to see why mutating & returning self would be OK here, when > it's not OK for a list to do the same thing. > > An alternate constructor as a class method ducks the question entirely. > > Cheers, > Nick. > > > > > On 2/10/07, Nick Coghlan wrote: > >> Collin Winter wrote: > >> > I think so. I've already got language ready for the section on using > >> > BaseException.with_traceback() in the 2->3 raise translations, and > >> > I'll work up additional language for the transition plan sometime this > >> > weekend. > >> > >> If with_traceback() is an instance method, does it mutate the existing > >> exception or create a new one? > >> > >> To avoid any confusion, perhaps it should instead be a class method > >> equivalent to the following: > >> > >> @classmethod > >> def with_traceback(*args, **kwds): > >> cls = args[0] > >> tb = args[1] > >> args = args[2:] > >> exc = cls(*args, **kwds) > >> exc.__traceback__ = tb > >> return exc > >> > >> Usage would look like: > >> > >> raise E.with_traceback(T, V) > >> > >> > >> Cheers, > >> Nick. > >> > >> -- > >> Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > >> --------------------------------------------------------------- > >> http://www.boredomandlaziness.org > >> > > > > > > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > --------------------------------------------------------------- > http://www.boredomandlaziness.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sun Feb 11 07:26:28 2007 From: guido at python.org (Guido van Rossum) Date: Sat, 10 Feb 2007 22:26:28 -0800 Subject: [Python-3000] how should we handle changes to the C API? In-Reply-To: References: <45BD94CB.6060107@canterbury.ac.nz> Message-ID: On 1/29/07, Brett Cannon wrote: > I was more generally wondering what the plan was for transitioning any > C API changes (if we were even going to do that level of transition). It's too early for much of a plan IMO. I'm not making radical changes (yet) but I'm mercilessly deleting APIs as they become obsolete. I expect that we need to wait until we've implemented the new I/O library and the str/unicode unification before we can say much about what to do about C APIs. But there's one thing we can do: not change existing APIs in incompatible ways. If you delete an API, code that uses it gets a compile-time error, and that should make it relatively simple to fix (assuming there's a replacement). But if you change the signature it's more questionable, and if you change the semantics (e.g. returning a different kind of PyObject*) it's painful. So let's commit to not changing signatures or semantics, but delete obsolete APIs in favor of new ones (with a different name). I guess this means some of the new names will be ugly. So what, it's C. :-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From brett at python.org Sun Feb 11 18:50:34 2007 From: brett at python.org (Brett Cannon) Date: Sun, 11 Feb 2007 09:50:34 -0800 Subject: [Python-3000] how should we handle changes to the C API? In-Reply-To: References: <45BD94CB.6060107@canterbury.ac.nz> Message-ID: On 2/10/07, Guido van Rossum wrote: > On 1/29/07, Brett Cannon wrote: > > I was more generally wondering what the plan was for transitioning any > > C API changes (if we were even going to do that level of transition). > > It's too early for much of a plan IMO. I'm not making radical changes > (yet) but I'm mercilessly deleting APIs as they become obsolete. I > expect that we need to wait until we've implemented the new I/O > library and the str/unicode unification before we can say much about > what to do about C APIs. > OK, fair enough. I know Neal has some ideas on this so I can let him sweat some of the details when it comes time. =) > But there's one thing we can do: not change existing APIs in > incompatible ways. If you delete an API, code that uses it gets a > compile-time error, and that should make it relatively simple to fix > (assuming there's a replacement). But if you change the signature it's > more questionable, and if you change the semantics (e.g. returning a > different kind of PyObject*) it's painful. > > So let's commit to not changing signatures or semantics, but delete > obsolete APIs in favor of new ones (with a different name). I guess > this means some of the new names will be ugly. So what, it's C. :-) Thank goodness for documentation and the C API index then. =) Then I will probably try to come up with a reasonable name for something to replace PyErr_GivenExceptionMatches(), add it to 2.6, and delete PyErr_GivenExceptionMatches() in 3.0. -Brett From martin at v.loewis.de Sun Feb 11 19:18:35 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 11 Feb 2007 19:18:35 +0100 Subject: [Python-3000] the types module In-Reply-To: References: <1d85506f0702010343j26ddb0eeub63dafad8a83cf78@mail.gmail.com> Message-ID: <45CF5DFB.7090609@v.loewis.de> Brett Cannon schrieb: > This has come up before on python-dev, IIRC. Double-check the archives. More specifically, see PEP 294. It claims the types module will be removed in Python 3000. Regards, Martin From martin at v.loewis.de Sun Feb 11 19:20:12 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 11 Feb 2007 19:20:12 +0100 Subject: [Python-3000] how should we handle changes to the C API? In-Reply-To: References: Message-ID: <45CF5E5C.1050703@v.loewis.de> Brett Cannon schrieb: > My specific need is that PyErr_GivenExceptionMatches() does not have > an exception return value. This sucks for me in 2.6 for deprecating > catching string exceptions, but it sucks more in 3.0 since only > subclasses of BaseException can be raised. But not allowing -1 to > represent that an error occurred is a pain for anyone who wants to > properly use the function. I don't understand what exceptional value you are talking about. If the given object cannot be an exception, it clearly doesn't match, so the outcome should be zero (not an error). Regards, Martin From brett at python.org Sun Feb 11 20:30:24 2007 From: brett at python.org (Brett Cannon) Date: Sun, 11 Feb 2007 11:30:24 -0800 Subject: [Python-3000] how should we handle changes to the C API? In-Reply-To: <45CF5E5C.1050703@v.loewis.de> References: <45CF5E5C.1050703@v.loewis.de> Message-ID: On 2/11/07, "Martin v. L?wis" wrote: > Brett Cannon schrieb: > > My specific need is that PyErr_GivenExceptionMatches() does not have > > an exception return value. This sucks for me in 2.6 for deprecating > > catching string exceptions, but it sucks more in 3.0 since only > > subclasses of BaseException can be raised. But not allowing -1 to > > represent that an error occurred is a pain for anyone who wants to > > properly use the function. > > I don't understand what exceptional value you are talking about. > If the given object cannot be an exception, it clearly doesn't > match, so the outcome should be zero (not an error). > Right, but I wanted to be able to raise a warning. If that warning is supposed to be treated as an exception the caller needs to let that propagate. RIght now PyErr_GivenExceptionMatches() can in no way let the caller know that fact; the caller need to use PyErr_Occurred() after the call. I checked and no one does that in the core or in 3rd party libraries from a Google Code search I did. -Brett From collinw at gmail.com Mon Feb 12 01:43:24 2007 From: collinw at gmail.com (Collin Winter) Date: Sun, 11 Feb 2007 18:43:24 -0600 Subject: [Python-3000] the types module In-Reply-To: <45CF5DFB.7090609@v.loewis.de> References: <1d85506f0702010343j26ddb0eeub63dafad8a83cf78@mail.gmail.com> <45CF5DFB.7090609@v.loewis.de> Message-ID: <43aa6ff70702111643w57d8c8f1kd104c82d34d95551@mail.gmail.com> On 2/11/07, "Martin v. L?wis" wrote: > Brett Cannon schrieb: > > This has come up before on python-dev, IIRC. Double-check the archives. > > More specifically, see PEP 294. It claims the types module will be > removed in Python 3000. Is removing the types module still a goal? It's not mentioned in either PEP 3100 or 3108. Collin Winter From guido at python.org Mon Feb 12 03:26:01 2007 From: guido at python.org (Guido van Rossum) Date: Sun, 11 Feb 2007 18:26:01 -0800 Subject: [Python-3000] the types module In-Reply-To: <43aa6ff70702111643w57d8c8f1kd104c82d34d95551@mail.gmail.com> References: <1d85506f0702010343j26ddb0eeub63dafad8a83cf78@mail.gmail.com> <45CF5DFB.7090609@v.loewis.de> <43aa6ff70702111643w57d8c8f1kd104c82d34d95551@mail.gmail.com> Message-ID: Well, I would surely love to see it replaced by something more reasonable. Collecting type objects together just on the basis that they are all built-in type objects was a bad idea. I still hope to do something about Bill Janssen's ABC proposal. But that will have to wait until after PyCon. --Guido On 2/11/07, Collin Winter wrote: > On 2/11/07, "Martin v. L?wis" wrote: > > Brett Cannon schrieb: > > > This has come up before on python-dev, IIRC. Double-check the archives. > > > > More specifically, see PEP 294. It claims the types module will be > > removed in Python 3000. > > Is removing the types module still a goal? It's not mentioned in > either PEP 3100 or 3108. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Mon Feb 12 07:55:39 2007 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Mon, 12 Feb 2007 07:55:39 +0100 Subject: [Python-3000] how should we handle changes to the C API? In-Reply-To: References: <45CF5E5C.1050703@v.loewis.de> Message-ID: <45D00F6B.7020207@v.loewis.de> Brett Cannon schrieb: > Right, but I wanted to be able to raise a warning. If that warning is > supposed to be treated as an exception the caller needs to let that > propagate. RIght now PyErr_GivenExceptionMatches() can in no way let > the caller know that fact I'm unclear why you want to warn in PyErr_GivenExceptionMatches: shouldn't you rather warn when the exception is raised? Regards, Martin From ncoghlan at gmail.com Mon Feb 12 10:35:29 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 12 Feb 2007 19:35:29 +1000 Subject: [Python-3000] Pre-peps on raise and except changes In-Reply-To: References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com> <43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com> <45CD12E3.9050803@canterbury.ac.nz> <5.1.1.6.0.20070209194256.03894be0@sparrow.telecommunity.com> <43aa6ff70702091735p2f281678mecaf416cc71ca361@mail.gmail.com> <45CD7D69.4090709@gmail.com> <45CE64D8.1030704@gmail.com> Message-ID: <45D034E1.2090506@gmail.com> Guido van Rossum wrote: > Somehow it seems that exceptions keep getting permission to violate > the rules... (E.g. the insistence on a fixed base class is also > considered unpythonic in other contexts.) Maybe it's because they're > "exceptions" ? :-) > > Anyway, I believe there's a use case for re-raising an existing > exception with an added traceback. After all the __traceback__ > attribute is mutable. Returning the mutated object is acceptable here > because the *dominant* use case is creating and raising an exception > in one go: > > raise FooException().with_traceback() Works for me. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From brett at python.org Mon Feb 12 23:11:05 2007 From: brett at python.org (Brett Cannon) Date: Mon, 12 Feb 2007 14:11:05 -0800 Subject: [Python-3000] how should we handle changes to the C API? In-Reply-To: <45D00F6B.7020207@v.loewis.de> References: <45CF5E5C.1050703@v.loewis.de> <45D00F6B.7020207@v.loewis.de> Message-ID: On 2/11/07, "Martin v. L?wis" wrote: > Brett Cannon schrieb: > > Right, but I wanted to be able to raise a warning. If that warning is > > supposed to be treated as an exception the caller needs to let that > > propagate. RIght now PyErr_GivenExceptionMatches() can in no way let > > the caller know that fact > > I'm unclear why you want to warn in PyErr_GivenExceptionMatches: > shouldn't you rather warn when the exception is raised? > Guido wants both so that you don't end up with useless values in the 'except' clause. So yes, things are checked at the time of raising an exception, but that does not prevent someone from putting something in an 'except' clause that is useless. -Brett From guido at python.org Mon Feb 12 23:55:21 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 12 Feb 2007 14:55:21 -0800 Subject: [Python-3000] how should we handle changes to the C API? In-Reply-To: References: <45CF5E5C.1050703@v.loewis.de> <45D00F6B.7020207@v.loewis.de> Message-ID: But I only want the latter in Py3k, and I don't mind using a different API there, even potentially a separate check after evaluating 'E' but before checking whether it matches. I think it's fine not to catch this in 2.6; after all it's a bug anyway so we're not expecting many occurrences. I don't think the 3.0 mode in 2.6 needs to catch existing bugs; it only needs to catch code that *works* in 2.6 but won' in 3.0. On 2/12/07, Brett Cannon wrote: > On 2/11/07, "Martin v. L?wis" wrote: > > Brett Cannon schrieb: > > > Right, but I wanted to be able to raise a warning. If that warning is > > > supposed to be treated as an exception the caller needs to let that > > > propagate. RIght now PyErr_GivenExceptionMatches() can in no way let > > > the caller know that fact > > > > I'm unclear why you want to warn in PyErr_GivenExceptionMatches: > > shouldn't you rather warn when the exception is raised? > > > > Guido wants both so that you don't end up with useless values in the > 'except' clause. So yes, things are checked at the time of raising an > exception, but that does not prevent someone from putting something in > an 'except' clause that is useless. > > -Brett > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg.ewing at canterbury.ac.nz Tue Feb 13 00:08:09 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 13 Feb 2007 12:08:09 +1300 Subject: [Python-3000] Pre-peps on raise and except changes In-Reply-To: <45D034E1.2090506@gmail.com> References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com> <43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com> <45CD12E3.9050803@canterbury.ac.nz> <5.1.1.6.0.20070209194256.03894be0@sparrow.telecommunity.com> <43aa6ff70702091735p2f281678mecaf416cc71ca361@mail.gmail.com> <45CD7D69.4090709@gmail.com> <45CE64D8.1030704@gmail.com> <45D034E1.2090506@gmail.com> Message-ID: <45D0F359.2040802@canterbury.ac.nz> Nick Coghlan wrote: > Guido van Rossum wrote: > Someone else wrote: > > > raise FooException().with_traceback() > > Works for me. I don't like that somehow -- it looks too clever. Also it violates the general principle of mutating methods not returning things. I know Guido said he's willing to waive that rule for exceptions, but it still bothers me. -- Greg From martin at v.loewis.de Tue Feb 13 06:52:29 2007 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Tue, 13 Feb 2007 06:52:29 +0100 Subject: [Python-3000] how should we handle changes to the C API? In-Reply-To: References: <45CF5E5C.1050703@v.loewis.de> <45D00F6B.7020207@v.loewis.de> Message-ID: <45D1521D.7000903@v.loewis.de> Brett Cannon schrieb: >> I'm unclear why you want to warn in PyErr_GivenExceptionMatches: >> shouldn't you rather warn when the exception is raised? >> > > Guido wants both so that you don't end up with useless values in the > 'except' clause. So yes, things are checked at the time of raising an > exception, but that does not prevent someone from putting something in > an 'except' clause that is useless. Ok: but why does this check need to happen in PyErr_GivenExceptionMatchs? The deprecation of string exceptions already happens in cmp_outcome; if you check for bad base exceptions there also, you would find them all, no? So I still don't see a need to modify GivenExceptionMatches. Regards, Martin From brett at python.org Tue Feb 13 21:46:59 2007 From: brett at python.org (Brett Cannon) Date: Tue, 13 Feb 2007 12:46:59 -0800 Subject: [Python-3000] how should we handle changes to the C API? In-Reply-To: <45D1521D.7000903@v.loewis.de> References: <45CF5E5C.1050703@v.loewis.de> <45D00F6B.7020207@v.loewis.de> <45D1521D.7000903@v.loewis.de> Message-ID: On 2/12/07, "Martin v. L?wis" wrote: > Brett Cannon schrieb: > >> I'm unclear why you want to warn in PyErr_GivenExceptionMatches: > >> shouldn't you rather warn when the exception is raised? > >> > > > > Guido wants both so that you don't end up with useless values in the > > 'except' clause. So yes, things are checked at the time of raising an > > exception, but that does not prevent someone from putting something in > > an 'except' clause that is useless. > > Ok: but why does this check need to happen in PyErr_GivenExceptionMatchs? > It doesn't need to, it just would have been convenient and consistent. It seems odd that C code can compare an exception against other objects that an 'except' clause won't. > The deprecation of string exceptions already happens in cmp_outcome; > if you check for bad base exceptions there also, you would find them > all, no? It wouldn't be checked in both places, just PyErr_GivenExceptionMatches(). -Brett From cvrebert at gmail.com Wed Feb 14 04:25:55 2007 From: cvrebert at gmail.com (Chris Rebert) Date: Tue, 13 Feb 2007 19:25:55 -0800 Subject: [Python-3000] pre-PEP: Default Argument Expressions Message-ID: <45D28143.9010502@gmail.com> Requesting comments on the following pre-PEP. pybench runs both with and without the patch applied would also be appreciated. - Chris R Title: Default Argument Expressions Author: Christopher Rebert Status: Draft Type: Standards Track Requires: 3000 Python-Version: 3.0 Abstract This PEP proposes new semantics for default arguments to remove boilerplate code associated with non-constant default argument values, allowing them to be expressed more clearly and succinctly. Specifically, all default argument expressions are re-evaluated at each call as opposed to just once at definition-time as they are now. Motivation Currently, to write functions using non-constant default arguments, one must use the idiom: def foo(non_const=None): if non_const is None: non_const = some_expr #rest of function or equivalent code. Naive programmers desiring mutable default arguments often make the mistake of writing the following: def foo(mutable=some_expr_producing_mutable): #rest of function However, this does not work as intended, as 'some_expr_producing_mutable' is evaluated only *once* at definition-time, rather than once per call at call-time. This results in all calls to 'foo' using the same default value, which can result in unintended consequences. This necessitates the previously mentioned idiom. This unintuitive behavior is such a frequent stumbling block for newbies that it is present in at least 3 lists of Python's deficiencies [0] [1] [2]. Python's tutorial even mentions the issue explicitly [3]. There are currently few, if any, known good uses of the current behavior of mutable default arguments. The most common one is to preserve function state between calls. However, as one of the lists [2] comments, this purpose is much better served by decorators, classes, or (though less preferred) global variables. Therefore, since the current semantics aren't useful for non-constant default values and an idiom is necessary to work around this deficiency, why not change the semantics so that people can write what they mean more directly, without the tedious boilerplate? Removing this idiom would help make code more readable and self-documenting. Rationale The discussion referenced herein is based on two threads [4] [5] on the python-ideas mailing list. Originally, it was proposed that all default argument values be deep-copied from the original (evaluated at definition-time) at each invocation of the function where the default value was required. However, this doesn't take into account default values that are not literals, e.g. function calls, subscripts, attribute accesses. Thus, the new idea was to re-evaluate the default arguments at each call where they were needed. There was some concern over the possible performance hit this could cause, and whether there should be new syntax so that code could use the existing semantics for performance reasons. Some of the proposed syntaxes were: def foo(bar=): #code def foo(bar=new baz): #code def foo(bar=fresh baz): #code def foo(bar=separate baz): #code def foo(bar=another baz): #code def foo(bar=unique baz): #code def foo(bar or baz): #code where the keyword (or angle brackets) would indicate that the default value 'baz' of parameter 'bar' should use the new semantics. Other parameters would continue to use the old semantics. Alternately, the new semantics could be the default, with the old semantics accessible using: def foo(bar=once baz): #code Where 'once' indicates the old default argument semantics. A similar idea is mentioned in PEP 3103 [6] under "Option 4". However, having two sets of semantics could be confusing, and leaving in the old semantics might be considered premature optimization. So this PEP proposed having just one set of semantics. Refactorings to deal with the possible performance hit from the new semantics are discussed later. A more radical proposed solution was to restrict default arguments to being hash()-able values, thus theoretically restricting default arguments to immutable values only. While this would solve the newbie-confusion issue, it does not suggest a better way to specify that a default value should be recomputed at every function call. Throughout the discussion, several decorators were shown as alternatives to the aforementioned idiom. These do allow the programmer to express their intent more clearly, at the cost of some extra complexity. Also, no one generator could be applied to all situations. The programmer would have to figure out which one to use each time. This PEP's proposed solution would make these decorators unnecessary and allow a more general solution to the issue than these decorators. The question was also raised as to whether the problem this PEP seeks to solve is significant enough to warrant a language change. The statistics in the Compatibility Issues section should help demonstrate the necessity of the changes that this PEP proposes. The next question was exactly how default variable expressions should be scoped. By way of demonstration: a = 42 def foo(b=a): a = 3.14 Now, does the variable 'a' in the default expression for 'b' refer to the lexical variable 'a', or the local variable 'a'? If it refers to a local variable, then this code is basically equivalent to: a = 42 def foo(b=None): if b is None: b = a a = 3.14 in which case, 'a' is being referenced before it's been assigned to in the function, causing an UnboundLocalError. The alternative is to have Python treat 'a' within the function's body differently from the 'a' in the\ default expression. In this case, the code would behave as if it were: a = 42 def foo(b=None): if b is None: b = __a a = 3.14 where __a indicates Python 'magically' treating it as a lexical variable that is distinct from the local variable 'a'. This would increase backward-compatibility, allowing you to use a lexical variable with the same name as a local variable as a default expression, which is more similar to Python's current behavior. However, this would complicate the semantics of default expressions. For simplicity's sake, this PEP endorses treating variables in default expressions as normal function variables. Suggestions for dealing with the incompatibilities this would introduce are discussed later. Specification The current semantics for default arguments are replaced by the following semantics: - Whenever a function is called, and the caller does not provide a value for a parameter with a default expression, the parameter's default expression is evaluated in the function's scope. The resulting value is then assigned to a local variable in the function's scope with the same name as the parameter. - The default argument expressions are evaluated before the body of the function. - The evaluation of default argument expressions proceeds in the same order as that of the parameter list in the function's definition. - Variables in a default expression are be treated like normal function variables (i.e. global/lexical variables unless assigned to in the function). Given these semantics, it makes more sense to refer to default argument expressions rather than default argument values, as the expression is re-evaluated at each call, rather than just once at definition-time. Therefore, we shall do so hereafter. Demonstrative examples: #default argument expressions can refer to #variables in the enclosing scope... CONST = "hi" def foo(a=CONST): print a >>> foo() hi >>> CONST="bye" >>> foo() bye #...or even other arguments def ncopies(container, n=len(container)): return [container for i in range(n)] >>> ncopies([1, 2], 5) [[1, 2], [1, 2], [1, 2], [1, 2], [1, 2]] >>> ncopies([1, 2, 3]) [[1, 2, 3], [1, 2, 3], [1, 2, 3]] >>> #ncopies grabbed n from [1, 2, 3]'s length (3) #default argument expressions are arbitrary expressions def my_sum(lst): cur_sum = lst[0] for i in lst[1:]: cur_sum += i return cur_sum def bar(b=my_sum((["b"] * (2 * 3))[:4])): print b >>> bar() bbbb #default argument expressions are re-evaluated at every call... from random import randint def baz(c=randint(1,3)): print c >>> baz() 2 >>> baz() 3 #...but only when they're required def silly(): print "spam" return 42 def qux(d=silly()): pass >>> qux() spam >>> qux(17) >>> qux(d=17) >>> qux(*[17]) >>> qux(**{'d':17}) >>> #no output since silly() never called >>> #because d's value was specified in the calls #default argument expressions are evaluated in calling sequence order count = 0 def next(): global count count += 1 return count - 1 def frobnicate(g=next(), h=next(), i=next()): print g, h, i >>> frobnicate() 0 1 2 >>> #g, h, and i's default argument expressions are evaluated >>> #in the same order as in the parameter definition #variables in default expressions refer to lexical/global variables... j = "holy grail" def frenchy(k=j): print j #...unless assigned to in the function (or its parameters) def arthur(j="swallow", m=j): print m >>> frenchy() holy grail >>> arthur() swallow Compatibility Issues This change in semantics breaks code which uses mutable default argument expressions and depends on those expressions being evaluated only once. It also will break code that assigns new incompatible values in a parent scope to variables used in default expressions. Code relying on such behavior can be refactored from: def foo(bar=mutable): #code to state = mutable def foo(bar=state): #code or class Baz(object): state = mutable @classmethod def foo(cls, bar=cls.state): #code or from functools import wraps def stateify(states): def _wrap(func): @wraps(func) def _wrapper(*args, **kwds): new_kwargs = states.copy() new_kwargs.update(kwds) return func(*args, **new_kwargs) return _wrapper return _wrap @stateify({'bar' : mutable}) def foo(bar): #code Code such as the following (which was also mentioned in the Rationale): b = 42 #outer b def foo(a=b): #ERROR: refers to local b, not outer b! b = 7 #local b which has default values that refer to variables in enclosing scopes and contains assignments to local variables of the same names will also be incompatible, as the 'b' in the default argument refers to the local 'b' rather than the outer 'b', resulting in an UnboundLocalError because the local variable 'b' has not been assigned to at the time "a"'s default expression is evaluated. Such code will need to rename the affected variables. The changes in this PEP are backwards-compatible with all code whose default argument values are immutable, including code using the idiom mentioned in the 'Motivation' section. However, such values will now be recomputed for each call for which they are required. This may cause performance degradation. If such recomputation is significantly expensive, the same refactoring mentioned above can be used. A survey of the standard library for Python v2.5, produced via a script [7], gave the following statistics for the standard library (608 files, test suites were excluded): total number of non-None immutable default arguments: 1585 (41.5%) total number of mutable default arguments: 186 (4.9%) total number of default arguments with a value of None: 1813 (47.4%) total number of default arguments with unknown mutability: 238 (6.2%) total number of comparisons to None: 940 Note: The number of comparisons to None refers to *all* such comparisons, not necessarily just those used in the idiom mentioned in the Motivation section. Looking more closely at the script's output, it appears that Tix.py and Tkinter.py are the primary users of mutable default arguments in the standard library. Similarly, examination of the unknown default arguments reveals that a significant fraction are functions, classes, or constants, which should, for the most part, not be functionally affected by this proposal Assuming the standard library is indicative of Python code in general, the change in semantics will have comparatively little impact on the correct operation of Python programs. Running pybench with modifications to simulate the proposed semantics [8] shows that Python function/method calls using default arguments run about 4.4%-6.5% slower versus the current semantics. However, as the simulation of the proposed semantics is crude, this should be considered an upper bound for any performance decreases this proposal might cause. In relation to Python 3.0, this PEP's proposal is compatible with those of PEP 3102 [9] and PEP 3107 [10], though it does not depend on the acceptance of either of those PEPs. Reference Implementation All code of the form: def foo(bar=some_expr, baz=other_expr): #body Should be compiled as if it had read (in pseudo-Python): def foo(bar=_undefined, baz=_undefined): if bar is _undefined: bar = some_expr if baz is _undefined: baz = other_expr #body where '_undefined' is the value given to a parameter when the caller didn't specify a value for it. This is not intended to be a literal translation, but rather a demonstration as to how Python's argument-handling machinery should act. Specifically, there should be no Python-level value corresponding to _undefined, nor should a literal translation such as that shown necessarily be used. References [0] 10 Python pitfalls http://zephyrfalcon.org/labs/python_pitfalls.html [1] Python Gotchas http://www.ferg.org/projects/python_gotchas.html#contents_item_6 [2] When Pythons Attack http://www.onlamp.com/pub/a/python/2004/02/05/learn_python.html?page=2 [3] 4. More Control Flow Tools http://docs.python.org/tut/node6.html#SECTION006710000000000000000 [4] [Python-ideas] fixing mutable default argument values http://mail.python.org/pipermail/python-ideas/2007-January/000073.html [5] [Python-ideas] proto-PEP: Fixing Non-constant Default Arguments http://mail.python.org/pipermail/python-ideas/2007-January/000121.html [6] A Switch/Case Statement http://www.python.org/dev/peps/pep-3103/ [7] Script to generate default argument statistics See attachment. [8] Patch to pybench/Calls.py See attachment. [9] Keyword-Only Arguments http://www.python.org/dev/peps/pep-3102/ [10] Function Annotations http://www.python.org/dev/peps/pep-3107/ Copyright This document has been placed in the public domain. -------------- next part -------------- A non-text attachment was scrubbed... Name: defargs.diff Type: text/x-patch Size: 794 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070213/12125965/attachment.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: new_find.py Type: text/x-python Size: 4245 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070213/12125965/attachment.py From mike.klaas at gmail.com Wed Feb 14 05:44:08 2007 From: mike.klaas at gmail.com (Mike Klaas) Date: Tue, 13 Feb 2007 20:44:08 -0800 Subject: [Python-3000] pre-PEP: Default Argument Expressions In-Reply-To: <45D28143.9010502@gmail.com> References: <45D28143.9010502@gmail.com> Message-ID: <3d2ce8cb0702132044r423bbfb5l6a73eb1203081e92@mail.gmail.com> On 2/13/07, Chris Rebert wrote: > This PEP proposes new semantics for default arguments to remove > boilerplate code associated with non-constant default argument values, > allowing them to be expressed more clearly and succinctly. > Specifically, > all default argument expressions are re-evaluated at each call as > opposed > to just once at definition-time as they are now. Seems like a huge barrel of worms. The binding semantics are not only a problem for mutable arguments, as you state in your pep: In [2]: def a(): ...: g = 1 ...: def b(): ...: print g ...: g = 2 ...: return b ...: In [4]: a()() 2 In [5]: def a(): ...: g = 1 ...: def b(g=g): ...: print g ...: g = 2 ...: return b In [6]: a()() 1 Creating closures and define-time local bindings is certainly not as common as a "regular" function definition, it is important part of python when programming in a semi-functional style. Imagine that "def b" is in a for loop. Your presented alternatives either don't work or go to rather extreme effort to duplicate this simple and useful functionality. I agree that newbies stumble over mutable default arguments. I did. If we could improve that learning process, I would be all for it. However, besides this being a significant change in semantics, two main stumbling blocks in my mind are: 1. Scoping. Scoping issues are not minor consequences of changes to default argument behaviour, but are integral. I think that you'd have to come up with a more obvious way to accomplish all the various current behaviours of def args before changing their semantics. This is probably a larger project than the original proposal. 2. Performance. The speed of python is influenced greatly by the performance of function dispatch. This may not show up in pystone. -Mike From anthony at interlink.com.au Wed Feb 14 06:55:45 2007 From: anthony at interlink.com.au (Anthony Baxter) Date: Wed, 14 Feb 2007 16:55:45 +1100 Subject: [Python-3000] pre-PEP: Default Argument Expressions In-Reply-To: <3d2ce8cb0702132044r423bbfb5l6a73eb1203081e92@mail.gmail.com> References: <45D28143.9010502@gmail.com> <3d2ce8cb0702132044r423bbfb5l6a73eb1203081e92@mail.gmail.com> Message-ID: <200702141655.46596.anthony@interlink.com.au> On Wednesday 14 February 2007 15:44, Mike Klaas wrote: > 2. Performance. The speed of python is influenced greatly by the > performance of function dispatch. This may not show up in > pystone. pystone is an utterly useless benchmark. It should not be used, ever. The pre-PEP references pybench, which does a much better job of showing this sort of thing. From jcarlson at uci.edu Wed Feb 14 08:27:39 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Tue, 13 Feb 2007 23:27:39 -0800 Subject: [Python-3000] pre-PEP: Default Argument Expressions In-Reply-To: <45D28143.9010502@gmail.com> References: <45D28143.9010502@gmail.com> Message-ID: <20070213231036.AD24.JCARLSON@uci.edu> Chris Rebert wrote: > Requesting comments on the following pre-PEP. pybench runs both with and > without the patch applied would also be appreciated. > - Chris R One Glyph Lefkowitz posted today [1] in response to dynamic attribute access the following, which is surely applicable here. > I also strongly dislike every syntax that has thus far been proposed, > but even if I loved them, there is just no motivating use-case. New > syntax is not going to make dynamic attribute access easier to > understand, and it *is* going to cause even more version-compatibility > headaches. > > I really, really wish that every feature proposal for Python had to meet > some burden of proof, or submit a cost/benefit analysis. Who is this > going to help? How much is this going to help them? "Who is this going > to hurt" is easy, but should also be included for completeness - > everyone who wants to be able to deploy new code on old Pythons. > > I suspect this would kill 90% of "hey wouldn't this syntax be neat" > proposals on day zero, and the ones that survived would be a lot more > interesting to talk about. Replace "dynamic attribute access" with "default argument expressions". With that said, please provide: 1a) Proof as to what is to be gained over an explicit if statement or conditional expression. or 1b) A cost/benefit analysis of the time it would take to "fix" the standard library and/or user code with any of the provided new syntax/semantics. 2) Who is this going to help (and do we care)? 3) How much is this going to help them? 4) Who is this going to hurt (in addition to everyone who wants to run new code in older Pythons)? As stated by most repondents to the original threads, a conditional statement is generally preferable (which answers #1a). It is really only going to help new users of Python (as seasoned users don't have the issue, and generally don't seem to mind using an additional line to "solve" the "problem") (which answers #2). It isn't going to help very many people terribly much - 2 line addition and 1 line modification *in the worst case*, if you include a new None-like sentinal (which answers #3). Further, it's going to hurt everyone who is used to the 'execute once' default argument semantics currently in place (which answers #4). Using Glyph's requirements, we see that the syntax is just not worthwhile, as stated by most people in the original thread. - Josiah [1] http://mail.python.org/pipermail/python-dev/2007-February/071061.html From martin at v.loewis.de Wed Feb 14 09:17:04 2007 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Wed, 14 Feb 2007 09:17:04 +0100 Subject: [Python-3000] how should we handle changes to the C API? In-Reply-To: References: <45CF5E5C.1050703@v.loewis.de> <45D00F6B.7020207@v.loewis.de> <45D1521D.7000903@v.loewis.de> Message-ID: <45D2C580.6090406@v.loewis.de> Brett Cannon schrieb: > It doesn't need to, it just would have been convenient and consistent. > It seems odd that C code can compare an exception against other > objects that an 'except' clause won't. If you look at the C code, you find that there are very few callers to GivenExceptionMatches (even if you also count ExceptionMatches callers), and they either pass a PyExc_ object (which will automatically be permitted), or one of their own exceptions. If you were to remove PyErr_GivenExceptionMatches, and replace it with something else where a) people have to change the functions in their code, and b) have to check the return value for errors (which they can statically determine to never happen) I think the authors would be unhappy about this gratuitous change. >> The deprecation of string exceptions already happens in cmp_outcome; >> if you check for bad base exceptions there also, you would find them >> all, no? > > It wouldn't be checked in both places, just PyErr_GivenExceptionMatches(). Please don't. Martin From eopadoan at altavix.com Wed Feb 14 14:24:32 2007 From: eopadoan at altavix.com (Eduardo "EdCrypt" O. Padoan) Date: Wed, 14 Feb 2007 11:24:32 -0200 Subject: [Python-3000] fixing test_dict Message-ID: If someone is alread working at this, please ignore this mail: I just picked because it was ease enough not to do while I wait some other code to run at work. I've created two patches to p3yk. They are two alternatives to fix the broken test_dict.py: test_dict_1.patch uses the same approach as test_dictviews.py: transform the dict_view in a set. test_dict_2.patch is an alternative: I'm not sure if the .items(), .values() and .keys() should be covered two times (test_dict.py and test_dictviews.py), so this solves the problem removing this tests from test_dict.py. -- EduardoOPadoan (eopadoan->altavix::com) Bookmarks: http://del.icio.us/edcrypt -------------- next part -------------- A non-text attachment was scrubbed... Name: test_dict_1.patch Type: text/x-patch Size: 1029 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070214/3fafcdd1/attachment.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: test_dict_2.patch Type: text/x-patch Size: 1018 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070214/3fafcdd1/attachment-0001.bin From guido at python.org Wed Feb 14 18:49:27 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 14 Feb 2007 09:49:27 -0800 Subject: [Python-3000] fixing test_dict In-Reply-To: References: Message-ID: Thanks! I decided to use your first approach; one can never have too many unit tests! :-) On 2/14/07, Eduardo EdCrypt O. Padoan wrote: > If someone is alread working at this, please ignore this mail: I just > picked because it was ease enough not to do while I wait some other > code to run at work. > I've created two patches to p3yk. They are two alternatives to fix the > broken test_dict.py: > test_dict_1.patch uses the same approach as test_dictviews.py: > transform the dict_view in a set. > test_dict_2.patch is an alternative: I'm not sure if the .items(), > .values() and .keys() should be covered two times (test_dict.py and > test_dictviews.py), so this solves the problem removing this tests > from test_dict.py. > > -- > EduardoOPadoan (eopadoan->altavix::com) > Bookmarks: http://del.icio.us/edcrypt > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From brett at python.org Wed Feb 14 19:53:57 2007 From: brett at python.org (Brett Cannon) Date: Wed, 14 Feb 2007 10:53:57 -0800 Subject: [Python-3000] how should we handle changes to the C API? In-Reply-To: <45D2C580.6090406@v.loewis.de> References: <45CF5E5C.1050703@v.loewis.de> <45D00F6B.7020207@v.loewis.de> <45D1521D.7000903@v.loewis.de> <45D2C580.6090406@v.loewis.de> Message-ID: On 2/14/07, "Martin v. L?wis" wrote: > Brett Cannon schrieb: > > It doesn't need to, it just would have been convenient and consistent. > > It seems odd that C code can compare an exception against other > > objects that an 'except' clause won't. > > If you look at the C code, you find that there are very few callers > to GivenExceptionMatches (even if you also count ExceptionMatches > callers), and they either pass a PyExc_ object (which will automatically > be permitted), or one of their own exceptions. If you were to remove > PyErr_GivenExceptionMatches, and replace it with something else > where > a) people have to change the functions in their code, and Which is why this was a Py3K question. > b) have to check the return value for errors (which they can > statically determine to never happen) > I think the authors would be unhappy about this gratuitous change. > Well, I happen to not think it is gratuitous, but I think we are just going to agree to disagree on this one. =) > >> The deprecation of string exceptions already happens in cmp_outcome; > >> if you check for bad base exceptions there also, you would find them > >> all, no? > > > > It wouldn't be checked in both places, just PyErr_GivenExceptionMatches(). > > Please don't. I'm not. At this point I am not going to bother to touch anything and just continue forward with how I did things in 2.6. -Brett From bjourne at gmail.com Thu Feb 15 01:36:18 2007 From: bjourne at gmail.com (=?ISO-8859-1?Q?BJ=F6rn_Lindqvist?=) Date: Thu, 15 Feb 2007 01:36:18 +0100 Subject: [Python-3000] pre-PEP: Default Argument Expressions In-Reply-To: <20070213231036.AD24.JCARLSON@uci.edu> References: <45D28143.9010502@gmail.com> <20070213231036.AD24.JCARLSON@uci.edu> Message-ID: <740c3aec0702141636s381b57d8k465e020a2a04d6a2@mail.gmail.com> On 2/14/07, Josiah Carlson wrote: > > Chris Rebert wrote: > > Requesting comments on the following pre-PEP. pybench runs both with and > > without the patch applied would also be appreciated. > > - Chris R > > One Glyph Lefkowitz posted today [1] in response to dynamic attribute > access the following, which is surely applicable here. To be fair, the two ideas are fairly different. Dynamic attribute access was about adding new syntax which makes the language more complex. This idea is more about fine-tuning existing syntax; it does not add to the language, it just makes it different. > > I also strongly dislike every syntax that has thus far been proposed, > > but even if I loved them, there is just no motivating use-case. New > > syntax is not going to make dynamic attribute access easier to > > understand, and it *is* going to cause even more version-compatibility > > headaches. > > > > I really, really wish that every feature proposal for Python had to meet > > some burden of proof, or submit a cost/benefit analysis. Who is this > > going to help? How much is this going to help them? "Who is this going > > to hurt" is easy, but should also be included for completeness - > > everyone who wants to be able to deploy new code on old Pythons. > > > > I suspect this would kill 90% of "hey wouldn't this syntax be neat" > > proposals on day zero, and the ones that survived would be a lot more > > interesting to talk about. > > Replace "dynamic attribute access" with "default argument expressions". > With that said, please provide: > 1a) Proof as to what is to be gained over an explicit if statement or > conditional expression. Two less lines of code? It is hard to grep for it, but I bet there are a few hundred occurrences the following in the standard library: def something(x = None): if x is None: x = [1, 2, 3] # <- default If you remember, it was constructs like this that was one of the big motivations behind the terniary operator. So now you write the above like this: def something(x = None): x = [1, 2, 3] if x is None else x If I remember correctly, the discussion about the terniary operator was sparked by Raymond Hettinger finding a bug in some code that erroneously used the and-or-terniary-trick. But also, often the choice is not between "explicit if statement" and this. It is between having an obscure and hard to find bug and the new semantic. I have many, MANY times written bugged code like this: def something(x = None): if not x: x = 42 # <- Oh noe! or: def something(x = []): x += ["foobar"] # <- Even worse! I guess bugs like these could be explained by stupidity, laziness or some combination of both. :) Or they could, if other programmers experience them, be a sign of a deficiency in the language. > or > 1b) A cost/benefit analysis of the time it would take to "fix" the > standard library and/or user code with any of the provided new > syntax/semantics. I naively think that Python's test suite would discover most of the problems. If not, fix the test suite. :) This idea is for py3k, so one would guess that the allowed cost is higher. > 2) Who is this going to help (and do we care)? Me, newbies, lazy programmers or programmers with not enough attention to details. The last group is fairly big, I think. > 3) How much is this going to help them? I think alot. Especially newbies. As said in the PEP, Python's current default argument evaluation is mentioned in three different lists of Python's deficiencies. > 4) Who is this going to hurt (in addition to everyone who wants to run > new code in older Pythons)? Everyone that is accustomed to the old behavior. Every book author whose books become deprecated. On the other hand, the more changes to the language the more books they can write. :) I agree that the cost probably is "huge," but so is the benefit, IMHO. If Python was created today, I bet that default arguments would be reevaluated at each invocation of the callable. > Using Glyph's requirements, we see that the syntax is just not > worthwhile, as stated by most people in the original thread. Maybe, but it certainly would make some code look much nicer. From cookielib.py: def is_expired(self, now=None): if now is None: now = time.time() if (self.expires is not None) and (self.expires <= now): return True return False With new semantics: def is_expired(self, now = time.time()): if (self.expires is not None) and (self.expires <= now): return True return False -- mvh Bj?rn From greg.ewing at canterbury.ac.nz Thu Feb 15 01:54:29 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 15 Feb 2007 13:54:29 +1300 Subject: [Python-3000] pre-PEP: Default Argument Expressions In-Reply-To: <740c3aec0702141636s381b57d8k465e020a2a04d6a2@mail.gmail.com> References: <45D28143.9010502@gmail.com> <20070213231036.AD24.JCARLSON@uci.edu> <740c3aec0702141636s381b57d8k465e020a2a04d6a2@mail.gmail.com> Message-ID: <45D3AF45.7070402@canterbury.ac.nz> BJ?rn Lindqvist wrote: > I have many, MANY times written bugged code like this: > > def something(x = None): > if not x: > x = 42 # <- Oh noe! You can get exactly the same bug in many other contexts besides default arguments. It's something you need to be on the alert for generally, and if you are, you are no more likely to encounter it here than anywhere else. > def something(x = []): > x += ["foobar"] # <- Even worse! I'm skeptical that people really write functions that do things like that. It smells wrong: Is the function intended to mutate an argument that's passed in? If not, then it shouldn't be touching the argument, in which case it doesn't matter if the default value is evaluated only once. If so, and no argument is passed, it would be more efficient to just skip the code that does the mutation, rather than create a new list, mutate it, and then throw it away. > Me, newbies, lazy programmers or programmers with not enough attention > to details. Anyone who can't pay attention to details is going to have much bigger problems with programming than just dealing with default arguments. -- Greg From jcarlson at uci.edu Thu Feb 15 02:10:21 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Wed, 14 Feb 2007 17:10:21 -0800 Subject: [Python-3000] pre-PEP: Default Argument Expressions In-Reply-To: <740c3aec0702141636s381b57d8k465e020a2a04d6a2@mail.gmail.com> References: <20070213231036.AD24.JCARLSON@uci.edu> <740c3aec0702141636s381b57d8k465e020a2a04d6a2@mail.gmail.com> Message-ID: <20070214165418.AD39.JCARLSON@uci.edu> "BJ?rn Lindqvist" wrote: > On 2/14/07, Josiah Carlson wrote: > > Chris Rebert wrote: > > > Requesting comments on the following pre-PEP. pybench runs both with and > > > without the patch applied would also be appreciated. > > > - Chris R > > > > One Glyph Lefkowitz posted today [1] in response to dynamic attribute > > access the following, which is surely applicable here. > > To be fair, the two ideas are fairly different. Dynamic attribute > access was about adding new syntax which makes the language more > complex. This idea is more about fine-tuning existing syntax; it does > not add to the language, it just makes it different. There are about a dozen different syntax proposals in the pre-PEP to determine whether something is executed at compilation or during call. Re-read it. [snip] > > 1a) Proof as to what is to be gained over an explicit if statement or > > conditional expression. > > Two less lines of code? It is hard to grep for it, but I bet there are > a few hundred occurrences the following in the standard library: > > def something(x = None): > if x is None: > x = [1, 2, 3] # <- default If some 500+ examples of dynamic attribute access in the Python standard library wasn't sufficient, than the 'few hundred' surely isn't, especially without actual counts. Yes, coming up with good counts is hard, but that's one of the requirements Glyph pointed out. If no one is willing to go through and see what it would fix, then it's obviously not worth it. > If you remember, it was constructs like this that was one of the big > motivations behind the terniary operator. So now you write the above like this: > > def something(x = None): > x = [1, 2, 3] if x is None else x That is certainly an *application* of the terniary operator, but they can be used *anywhere* a decision is made to choose a value, not merely in the function signature. [snip] > > or > > 1b) A cost/benefit analysis of the time it would take to "fix" the > > standard library and/or user code with any of the provided new > > syntax/semantics. > > I naively think that Python's test suite would discover most of the > problems. If not, fix the test suite. :) This idea is for py3k, so one > would guess that the allowed cost is higher. The cost of syntax changes are allowed to be higher, *but only if their benefits actually outweigh their costs*. So far, all you or really anyone else has shown in the default argument expressions discussion is that: 1) a few lines 2) on occasion 3) written by new or sloppy Python developers will be: 1a) less buggy 1b) or not buggy 2) at most 2 lines shorter Bite the bullet. Spend the two lines. - Josiah From guido at python.org Thu Feb 15 02:14:35 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 14 Feb 2007 17:14:35 -0800 Subject: [Python-3000] pre-PEP: Default Argument Expressions In-Reply-To: <20070214165418.AD39.JCARLSON@uci.edu> References: <20070213231036.AD24.JCARLSON@uci.edu> <740c3aec0702141636s381b57d8k465e020a2a04d6a2@mail.gmail.com> <20070214165418.AD39.JCARLSON@uci.edu> Message-ID: Nobody has asked me yet, but I'm not going to support this PEP. it's too big a departure from existing semantics. Next are we going to turn class variables initialized with expressions into automatic instance variable initializers implicitly executed in the __init__ code? Newbies are just as likely to run into the aliasing problem there as in the argument default case. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From cvrebert at gmail.com Thu Feb 15 02:35:04 2007 From: cvrebert at gmail.com (Chris Rebert) Date: Wed, 14 Feb 2007 17:35:04 -0800 Subject: [Python-3000] pre-PEP: Default Argument Expressions In-Reply-To: <3d2ce8cb0702132044r423bbfb5l6a73eb1203081e92@mail.gmail.com> References: <45D28143.9010502@gmail.com> <3d2ce8cb0702132044r423bbfb5l6a73eb1203081e92@mail.gmail.com> Message-ID: <45D3B8C8.8080502@gmail.com> Mike Klaas wrote: > On 2/13/07, Chris Rebert wrote: >> This PEP proposes new semantics for default arguments to remove >> boilerplate code associated with non-constant default argument >> values, >> allowing them to be expressed more clearly and succinctly. >> Specifically, >> all default argument expressions are re-evaluated at each call as >> opposed >> to just once at definition-time as they are now. > > Seems like a huge barrel of worms. The binding semantics are not only > a problem for mutable arguments, as you state in your pep: > > In [2]: def a(): > ...: g = 1 > ...: def b(): > ...: print g > ...: g = 2 > ...: return b > ...: > In [4]: a()() > 2 > > In [5]: def a(): > ...: g = 1 > ...: def b(g=g): > ...: print g > ...: g = 2 > ...: return b > In [6]: a()() > 1 > > Creating closures and define-time local bindings is certainly not as > common as a "regular" function definition, it is important part of > python when programming in a semi-functional style. Imagine that "def > b" is in a for loop. Your presented alternatives either don't work or > go to rather extreme effort to duplicate this simple and useful > functionality. The refactorings mentioned in the PEP were specifically for mutable arguments. I didn't consider the case you mentioned. The first snippet you give would be unaffected by the PEP's changes. As for the second case: while whatever: #code g = 1 def b(g=g): print g g = 2 b() #=> 1 #code It could be modified like so: while whatever: #code g = 1 retro_g = g def b(g=retro_g): print g g = 2 b() #=> 1 #code > I agree that newbies stumble over mutable default arguments. I did. > If we could improve that learning process, I would be all for it. > However, besides this being a significant change in semantics, two > main stumbling blocks in my mind are: > > 1. Scoping. Scoping issues are not minor consequences of changes to > default argument behavior, but are integral. I think that you'd have > to come up with a more obvious way to accomplish all the various > current behaviors of def args before changing their semantics. This > is probably a larger project than the original proposal. Well, as the PEP mentions, new syntax could be added to access the old semantics, or alternatively, to enable the new semantics, though I'd prefer to avoid adding syntax. However, finding refactorings for various uses of the current semantics is very relevant to the PEP. I'll be sure to add you case and any others mentioned to the PEP. > 2. Performance. The speed of python is influenced greatly by the > performance of function dispatch. This may not show up in pystone. Clarification: as Anthony Baxter mentioned, I used pybench, not pystone. However, if someone recommends a better benchmark to measure the performance impact of the proposed change, I'd be all for it. - Chris Rebert From cvrebert at gmail.com Thu Feb 15 03:20:31 2007 From: cvrebert at gmail.com (Chris Rebert) Date: Wed, 14 Feb 2007 18:20:31 -0800 Subject: [Python-3000] pre-PEP: Default Argument Expressions In-Reply-To: <20070214165418.AD39.JCARLSON@uci.edu> References: <20070213231036.AD24.JCARLSON@uci.edu> <740c3aec0702141636s381b57d8k465e020a2a04d6a2@mail.gmail.com> <20070214165418.AD39.JCARLSON@uci.edu> Message-ID: <45D3C36F.2050501@gmail.com> Josiah Carlson wrote: > "BJ?rn Lindqvist" wrote: >> On 2/14/07, Josiah Carlson wrote: >>> Chris Rebert wrote: >>>> Requesting comments on the following pre-PEP. pybench runs both with and >>>> without the patch applied would also be appreciated. >>>> - Chris R >>> One Glyph Lefkowitz posted today [1] in response to dynamic attribute >>> access the following, which is surely applicable here. >> To be fair, the two ideas are fairly different. Dynamic attribute >> access was about adding new syntax which makes the language more >> complex. This idea is more about fine-tuning existing syntax; it does >> not add to the language, it just makes it different. > > There are about a dozen different syntax proposals in the pre-PEP to > determine whether something is executed at compilation or during call. > Re-read it. Those syntaxes were only raised during discussion of changing default argument semantics. If you read the PEP, it doesn't endorse any of them. I'm against adding new syntax. However, that could always change based on community feedback. > [snip] >>> 1a) Proof as to what is to be gained over an explicit if statement or >>> conditional expression. >> Two less lines of code? It is hard to grep for it, but I bet there are >> a few hundred occurrences the following in the standard library: >> >> def something(x = None): >> if x is None: >> x = [1, 2, 3] # <- default > > If some 500+ examples of dynamic attribute access in the Python standard > library wasn't sufficient, than the 'few hundred' surely isn't, > especially without actual counts. Yes, coming up with good counts is > hard, but that's one of the requirements Glyph pointed out. If no one > is willing to go through and see what it would fix, then it's obviously > not worth it. Under "Compatibility Issues" in the PEP, I mention that my statistics-generating script found in the standard library (among other things): total number of default arguments with a value of None: 1813 (47.4% of all default arguments) total number of comparisons to None: 940 Yes, these aren't specific counts of uses of the 'x=None...if x is None: x=whatever' idiom, but you can't get much closer without looking over the files manually. > [snip] >>> or >>> 1b) A cost/benefit analysis of the time it would take to "fix" the >>> standard library and/or user code with any of the provided new >>> syntax/semantics. [snip] I'm going to respond to this in the original email that asked these questions. - Chris Rebert From guido at python.org Thu Feb 15 05:51:13 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 14 Feb 2007 20:51:13 -0800 Subject: [Python-3000] UserDict revamp Message-ID: I tried to fix a few more unit tests tonight that had started failing after the introduction of dict views. Looking over UserDict.py, it's clear that this module needs more work -- while I banged it into submission with minimal effort, it would reallly make a lot more sense to redesign UserDict and MixinDict so they are more like dict, even if this means that their users will have to be fixed, too. Perhaps the most egregious example is MixinDict, which currently assumes that keys() is a primitive operation returning a list, and builds __iter__() out of that. Obviously a better approach is to turn this around. (I'd have thought that ever since 2.2 this would have been the better design, but perhaps it was too late then already.) Is someone interested in looking at a redesign and cleanup of these classes? I suppose that they also need a Python implementation of dictionary views -- some of this can be lifted straight out of PEP 3106, fortunately. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg.ewing at canterbury.ac.nz Thu Feb 15 06:29:36 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 15 Feb 2007 18:29:36 +1300 Subject: [Python-3000] pre-PEP: Default Argument Expressions In-Reply-To: <45D28143.9010502@gmail.com> References: <45D28143.9010502@gmail.com> Message-ID: <45D3EFC0.5030401@canterbury.ac.nz> I just noticed that my Thunderbird marked the posting with this PEP in it as spam. Not sure what that says about the proposal... -- Greg From eopadoan at altavix.com Thu Feb 15 13:24:42 2007 From: eopadoan at altavix.com (Eduardo "EdCrypt" O. Padoan) Date: Thu, 15 Feb 2007 10:24:42 -0200 Subject: [Python-3000] UserDict revamp In-Reply-To: References: Message-ID: Ops, sending to the whole list. On 2/15/07, Guido van Rossum wrote: > I tried to fix a few more unit tests tonight that had started failing > after the introduction of dict views. Looking over UserDict.py, it's > clear that this module needs more work -- while I banged it into > submission with minimal effort, it would reallly make a lot more sense > to redesign UserDict and MixinDict so they are more like dict, even if > this means that their users will have to be fixed, too. > > Perhaps the most egregious example is MixinDict, which currently > assumes that keys() is a primitive operation returning a list, and > builds __iter__() out of that. Obviously a better approach is to turn > this around. (I'd have thought that ever since 2.2 this would have > been the better design, but perhaps it was too late then already.) s/MixinDict/DictMixin ? :) > Is someone interested in looking at a redesign and cleanup of these > classes? I suppose that they also need a Python implementation of > dictionary views -- some of this can be lifted straight out of PEP > 3106, fortunately. > I would love to spend my weekend looking into this. I already read the PEP 3106 and I think I understand it. It is carnival, and I'm no fan of samba music. -- EduardoOPadoan (eopadoan->altavix::com) Bookmarks: http://del.icio.us/edcrypt From steven.bethard at gmail.com Thu Feb 15 16:44:24 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Thu, 15 Feb 2007 08:44:24 -0700 Subject: [Python-3000] UserDict revamp In-Reply-To: References: Message-ID: On 2/15/07, Guido van Rossum wrote: > Perhaps the most egregious example is MixinDict, which currently > assumes that keys() is a primitive operation returning a list, and > builds __iter__() out of that. Obviously a better approach is to turn > this around. (I'd have thought that ever since 2.2 this would have > been the better design, but perhaps it was too late then already.) I asked the same thing back in early 2005: http://mail.python.org/pipermail/python-list/2005-January/300042.html Glad to hear I wasn't too out of my mind. ;-) On 2/15/07, Eduardo EdCrypt O. Padoan wrote: > I would love to spend my weekend looking into this. I already read the > PEP 3106 and I think I understand it. Let me know if you need any help with this. STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From jimjjewett at gmail.com Thu Feb 15 17:48:02 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Thu, 15 Feb 2007 11:48:02 -0500 Subject: [Python-3000] pre-PEP: Default Argument Expressions In-Reply-To: <45D28143.9010502@gmail.com> References: <45D28143.9010502@gmail.com> Message-ID: On 2/13/07, Chris Rebert wrote: > There are currently few, if any, known good uses of the current > behavior of mutable default arguments. Then are there *any* good use cases for the proposed semantics? Here are the use cases that I can remember seeing for mutable default arguments. (1) Not really (treated as) mutable. ==> Doesn't care about the mutability semantics. >>> def f(extra_settings={}) ... usually doesn't modify or even store extra_settings; it just wants an empty (and perhaps iterable) mapping. (Sometimes, it doesn't even need that, and is really just providing type information.) (2) Storing state between calls. ==> Keep the current semantics We disagree on how useful this is, and how easily it can be replaced, but agree that the use exists. We also agree that it smells bad -- but the problem isn't mutable arguments. The problem is that the state variable really isn't (intended as) a parameter at all, and it feels wrong to pretend that it is. In theory, we could fix this with something like C's static >>> def f(): ... once state_var={} but in practice, there is some value in leaving it accessible, because of (2a) A test harness may wish to pass in its own state_var to get extra information, or to avoid cluttering the production logs. (3) Collecting results ==> the code is buggy, don't encourage it. >>> def squares(data, results=[]) ... for e in data: ... results.append(e*e) should instead be written as >>> def squares(data) ... results=[] ... for e in data: ... results.append(e*e) to make it clear that results is newly constructed container. (3b) Adding stuff to a container ==> ??? I think this is the real motivation; I've done it myself. But I realized later that it was bad code. For example, I may want a filter to return a list of candidates for further processing. >>> def still_valid(data): ... results = [] ... for e in data: ... if good_enough(e): ... results.append(e) Hey, and maybe I have some other candidates already ... >>> candidates.extend(still_valid(data)) hmm ... but what if I don't usually have any previous candidates? Couldn't I use a default argument? >>> def still_valid(data, results=[]) ... And now I have buggy code. The right answer isn't to force re-evaluation of []; it is to be clear on when your functions will have side effects. If you don't want call sites littered with >>> candidates = [] >>> candidates.extend(still_valid(data)) then write a helper, such as >>> def extra_candidates(data, known_candidates): ... known_candidates.extend(still_valid(data)) -jJ From guido at python.org Thu Feb 15 18:15:56 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 15 Feb 2007 09:15:56 -0800 Subject: [Python-3000] UserDict revamp In-Reply-To: References: Message-ID: On 2/15/07, Steven Bethard wrote: > On 2/15/07, Guido van Rossum wrote: > > Perhaps the most egregious example is MixinDict, which currently > > assumes that keys() is a primitive operation returning a list, and > > builds __iter__() out of that. Obviously a better approach is to turn > > this around. (I'd have thought that ever since 2.2 this would have > > been the better design, but perhaps it was too late then already.) > > I asked the same thing back in early 2005: > > http://mail.python.org/pipermail/python-list/2005-January/300042.html > > Glad to hear I wasn't too out of my mind. ;-) Reading that post, I think that __len__ should also be part of the primitive operations, at least optionally. The dict view code to compare two views (or a view and a set; always excluding the values view which is not a set) for equality makes good use of this since it knows that if the lengths are unequal the objects cannot be equal. In order to determine equality without knowing the legth would double the cost of the operation because you'd end up having to iterate over each side, checking that all its elements are contained in the other side. With a length check, you only have to iterate over one side, and only if the lengths are equal. Another distinction I'd like to make is between mutable and immutable mappings. But maybe this is outside the realm of a *dict* mixin, and belongs in the (more speculative) discussion on abstract base classes. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From python at rcn.com Thu Feb 15 18:38:16 2007 From: python at rcn.com (Raymond Hettinger) Date: Thu, 15 Feb 2007 09:38:16 -0800 Subject: [Python-3000] UserDict revamp References: Message-ID: <00fd01c75128$16a67d60$ea146b0a@RaymondLaptop1> Since I contributed DictMixin and have been responsible for its maintenance, if no one minds, I would like to be the one to migrate it to Py3.0. Raymond ----- Original Message ----- From: "Guido van Rossum" To: "Steven Bethard" Cc: "Python 3000" ; "Eduardo EdCrypt O. Padoan" Sent: Thursday, February 15, 2007 9:15 AM Subject: Re: [Python-3000] UserDict revamp > On 2/15/07, Steven Bethard wrote: >> On 2/15/07, Guido van Rossum wrote: >> > Perhaps the most egregious example is MixinDict, which currently >> > assumes that keys() is a primitive operation returning a list, and >> > builds __iter__() out of that. Obviously a better approach is to turn >> > this around. (I'd have thought that ever since 2.2 this would have >> > been the better design, but perhaps it was too late then already.) >> >> I asked the same thing back in early 2005: >> >> http://mail.python.org/pipermail/python-list/2005-January/300042.html >> >> Glad to hear I wasn't too out of my mind. ;-) > > Reading that post, I think that __len__ should also be part of the > primitive operations, at least optionally. The dict view code to > compare two views (or a view and a set; always excluding the values > view which is not a set) for equality makes good use of this since it > knows that if the lengths are unequal the objects cannot be equal. In > order to determine equality without knowing the legth would double the > cost of the operation because you'd end up having to iterate over each > side, checking that all its elements are contained in the other side. > With a length check, you only have to iterate over one side, and only > if the lengths are equal. > > Another distinction I'd like to make is between mutable and immutable > mappings. But maybe this is outside the realm of a *dict* mixin, and > belongs in the (more speculative) discussion on abstract base classes. > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/python%40rcn.com From bjourne at gmail.com Thu Feb 15 20:08:20 2007 From: bjourne at gmail.com (=?ISO-8859-1?Q?BJ=F6rn_Lindqvist?=) Date: Thu, 15 Feb 2007 20:08:20 +0100 Subject: [Python-3000] pre-PEP: Default Argument Expressions In-Reply-To: References: <45D28143.9010502@gmail.com> Message-ID: <740c3aec0702151108y2232290dqd104bb5609f7bab4@mail.gmail.com> On 2/15/07, Jim Jewett wrote: > On 2/13/07, Chris Rebert wrote: > > There are currently few, if any, known good uses of the current > > behavior of mutable default arguments. > > Then are there *any* good use cases for the proposed semantics? Note that the PEP says _currently_, with the change in semantics the number of use cases increase drastically. See below. > Here are the use cases that I can remember seeing for mutable > default arguments. > > (1) Not really (treated as) mutable. ==> Doesn't care about the > mutability semantics. > > >>> def f(extra_settings={}) ... > > usually doesn't modify or even store extra_settings; it just wants an > empty (and perhaps iterable) mapping. (Sometimes, it doesn't even > need that, and is really just providing type information.) That is dangerous code. Sooner or later someone will modify the extra_settings dict. For me, that is the main attraction of the PEP, it removes that source of bugs (along with the annoying "if blaha is None:" thingy). class Vector: def __init__(self, x, y, z): self.x = x self.y = y self.z = z class Ray: def __init__(self, direction, origin = Vector(0, 0, 0)): self.direction = direction self.origin = origin ray1 = Ray(Vector(0, 0, -1)) ray2 = Ray(Vector(0, 0, 1)) ray3 = Ray(Vector(-1, 0, 0), Vector(2, 3, 4)) The above code looks quite nice, but is wrong. Not that it matters much, Guido has already rejected the PEP. But the use cases does exist and there is a problem with how default argument values are evaluated. Hopefully someone can invent a fix even if this PEP wasn't it. -- mvh Bj?rn From python at rcn.com Fri Feb 16 01:48:51 2007 From: python at rcn.com (Raymond Hettinger) Date: Thu, 15 Feb 2007 16:48:51 -0800 Subject: [Python-3000] Py3.0 Library Ideas References: <00fd01c75128$16a67d60$ea146b0a@RaymondLaptop1> Message-ID: <012901c75164$3b1e00f0$ea146b0a@RaymondLaptop1> * Remove the unreliable empty() and full() methods from Queue.py * Remove jumpahead() from the random API. It is somewhat uncommon for PRNGs to have a closed form solution that jumpsahead N steps. * Make the primative for random be something generating random bytes rather than random floats. Currently to get a random integer, a generator like the Mersenne twister generates two blocks of 4 bytes, which are then turned into a C double and then random.py module converts the float back into an integer in the desired range. The long-->float-->long dance could be abbreviated. This would also make it easier to substitute in other generators without making them responsible for the long-->float step. * Get rid of Cookie.SerialCookie and Cookie.SmartCookie * Modify the heapq.heapreplace() API to compare the new value to the top of the heap. This has come-up more than once. When using a heap for a priority queue, sometimes there is a need to revise the priority of an entry in the middle of the heap. This can be done with heapreplace substituting the new priority/task pair and then running a _siftup operation to restore the heap condition. Raymond From cvrebert at gmail.com Fri Feb 16 06:37:27 2007 From: cvrebert at gmail.com (Chris Rebert) Date: Thu, 15 Feb 2007 21:37:27 -0800 Subject: [Python-3000] pre-PEP: Default Argument Expressions In-Reply-To: <740c3aec0702151108y2232290dqd104bb5609f7bab4@mail.gmail.com> References: <45D28143.9010502@gmail.com> <740c3aec0702151108y2232290dqd104bb5609f7bab4@mail.gmail.com> Message-ID: <45D54317.4020204@gmail.com> Okay, in light of Guido's comments, alternate idea: We require all default values to be hash()-able, thus reasonably ensuring their immutability. This doesn't deal with the 'x=None...' dance, but at least it might stop dangerous code from being written. Or if anyone else has ideas, that's great too. Anything to stop the abuses of mutable default arguments. - Chris Rebert BJ?rn Lindqvist wrote: > On 2/15/07, Jim Jewett wrote: >> On 2/13/07, Chris Rebert wrote: >> > There are currently few, if any, known good uses of the current >> > behavior of mutable default arguments. >> >> Then are there *any* good use cases for the proposed semantics? > > Note that the PEP says _currently_, with the change in semantics the > number of use cases increase drastically. See below. > >> Here are the use cases that I can remember seeing for mutable >> default arguments. >> >> (1) Not really (treated as) mutable. ==> Doesn't care about the >> mutability semantics. >> >> >>> def f(extra_settings={}) ... >> >> usually doesn't modify or even store extra_settings; it just wants an >> empty (and perhaps iterable) mapping. (Sometimes, it doesn't even >> need that, and is really just providing type information.) > > That is dangerous code. Sooner or later someone will modify the > extra_settings dict. For me, that is the main attraction of the PEP, > it removes that source of bugs (along with the annoying "if blaha is > None:" thingy). > > class Vector: > def __init__(self, x, y, z): > self.x = x > self.y = y > self.z = z > > class Ray: > def __init__(self, direction, origin = Vector(0, 0, 0)): > self.direction = direction > self.origin = origin > > ray1 = Ray(Vector(0, 0, -1)) > ray2 = Ray(Vector(0, 0, 1)) > ray3 = Ray(Vector(-1, 0, 0), Vector(2, 3, 4)) > > The above code looks quite nice, but is wrong. > > Not that it matters much, Guido has already rejected the PEP. But the > use cases does exist and there is a problem with how default argument > values are evaluated. Hopefully someone can invent a fix even if this > PEP wasn't it. > From ferringb at gmail.com Fri Feb 16 08:01:04 2007 From: ferringb at gmail.com (Brian Harring) Date: Thu, 15 Feb 2007 23:01:04 -0800 Subject: [Python-3000] pre-PEP: Default Argument Expressions In-Reply-To: <45D54317.4020204@gmail.com> References: <45D28143.9010502@gmail.com> <740c3aec0702151108y2232290dqd104bb5609f7bab4@mail.gmail.com> <45D54317.4020204@gmail.com> Message-ID: <20070216070104.GA22681@seldon> On Thu, Feb 15, 2007 at 09:37:27PM -0800, Chris Rebert wrote: > Okay, in light of Guido's comments, alternate idea: > > We require all default values to be hash()-able, thus reasonably > ensuring their immutability. Offhand, that's a pretty arbitrary restriction- default __hash__ for objects is their address. Majority of objects *are* mutable also, so about all you've managed to block is usage of [] and {}, or objects the specifically castrate their __hash__ > but at least it might stop dangerous code from being written. > Anything to stop the abuses of mutable default arguments. You may not have usage for mutable default args, but others may- namely memoization. Store the cache in the default arg. Upshot of it, the cache isn't sitting out in the global namespace; you can achieve the same with a memoization object/descriptor, but those approaches break down since the key calculation can only be args/kwargs based, rather then generating a key in a simpler way. Further, if there *are* kwargs involved, the memoizer has to know the default args for the target, and slip those in everytime which gets fairly ugly. Personally, I'm -1 on suggestions thus far- further, -1 on trying to block mutables from default args. Would suggest creating a tool to scan for potential issues rather then trying to strip mutable default args from the language. ~harring -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070215/32844399/attachment.pgp From eopadoan at altavix.com Fri Feb 16 12:30:40 2007 From: eopadoan at altavix.com (Eduardo "EdCrypt" O. Padoan) Date: Fri, 16 Feb 2007 09:30:40 -0200 Subject: [Python-3000] [Python-Dev] UserDict revamp In-Reply-To: <002701c75145$75ece080$ea146b0a@RaymondLaptop1> References: <00fd01c75128$16a67d60$ea146b0a@RaymondLaptop1> <002701c75145$75ece080$ea146b0a@RaymondLaptop1> Message-ID: [Steve] > No complaints here. Not that you need my permission of course. ;-) Same here, obviously. [Raymond] > Thanks, I had already started working on this one. > Of course, everyone is welcome to contribute. Ok, you can count on that. -- EduardoOPadoan (eopadoan->altavix::com) Bookmarks: http://del.icio.us/edcrypt From eopadoan at altavix.com Sat Feb 17 04:57:16 2007 From: eopadoan at altavix.com (Eduardo "EdCrypt" O. Padoan) Date: Sat, 17 Feb 2007 01:57:16 -0200 Subject: [Python-3000] PEPs 3xxx status Message-ID: All the 3xxx PEPs except 3100 and the "meta" ones are marked a draft. While I understand that many has some open issues, even the implemented ones (3102, 3105, 3106, 3107, 3110) still run the risk of being withdrawn? -- EduardoOPadoan (eopadoan->altavix::com) Bookmarks: http://del.icio.us/edcrypt From talin at acm.org Sat Feb 17 06:39:01 2007 From: talin at acm.org (Talin) Date: Fri, 16 Feb 2007 21:39:01 -0800 Subject: [Python-3000] PEPs 3xxx status In-Reply-To: References: Message-ID: <45D694F5.50109@acm.org> Eduardo "EdCrypt" O. Padoan wrote: > All the 3xxx PEPs except 3100 and the "meta" ones are marked a draft. > While I understand that many has some open issues, even the > implemented ones (3102, 3105, 3106, 3107, 3110) still run the risk of > being withdrawn? I don't know about the others, however I want to speak to the issue of 3101 and 3102, since I wrote them - the main reason that those PEPs haven't been accepted is that there's no sample implementation to evaluate. (At least, I'm not aware of any implementation of them, unless someone did it while I wasn't looking :) As I stated early on in this process, I don't really enjoy working on the innards of Python as much as I enjoy working *in* Python - and Guido seemed to find this acceptable when I asked him about it. In addition, I've been very busy lately, as my absence from this list illustrates. I have written a Python implementation of 3101 that can be used as a model, but the actual implementation needs to be in C, since it's anticipated that it will be a built-in function. Some of the number formatting operations are best done in C anyway. Several people have put forward tentative offers to implement these two PEPs, however there's been no follow up that I know of. I should also note that these PEPs should really be targeted at the 2.x series, since there's nothing fundamentally "3000-ish" about them, so the 31xx numbering is kind of a misnomer. There's no backwards compatibility impact in either case. -- Talin From eopadoan at altavix.com Sat Feb 17 14:07:59 2007 From: eopadoan at altavix.com (Eduardo "EdCrypt" O. Padoan) Date: Sat, 17 Feb 2007 11:07:59 -0200 Subject: [Python-3000] PEPs 3xxx status In-Reply-To: <45D694F5.50109@acm.org> References: <45D694F5.50109@acm.org> Message-ID: On 2/17/07, Talin wrote: > I don't know about the others, however I want to speak to the issue of > 3101 and 3102, since I wrote them - the main reason that those PEPs > haven't been accepted is that there's no sample implementation to > evaluate. (At least, I'm not aware of any implementation of them, unless > someone did it while I wasn't looking :) At least for 3102, yes, someone did: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1549670&group_id=5470 -- EduardoOPadoan (eopadoan->altavix::com) Bookmarks: http://del.icio.us/edcrypt From guido at python.org Sat Feb 17 19:02:37 2007 From: guido at python.org (Guido van Rossum) Date: Sat, 17 Feb 2007 10:02:37 -0800 Subject: [Python-3000] PEPs 3xxx status In-Reply-To: References: <45D694F5.50109@acm.org> Message-ID: And it's in the p3yk branch, too. The main reason these are all still drafts is that I expect that implementing them may cause a certain amount of redesign, and in some cases the spec isn't entirely clear. The "real" acceptance status (in my head) is all over the map -- 3102 is obviously accepted, 3101 likely, 3103 unlikely, 3104 possibly, 3108 is too early to tell, and the rest (3105, 06, 07, 09, 10) are accepted. --Guido On 2/17/07, Eduardo EdCrypt O. Padoan wrote: > On 2/17/07, Talin wrote: > > I don't know about the others, however I want to speak to the issue of > > 3101 and 3102, since I wrote them - the main reason that those PEPs > > haven't been accepted is that there's no sample implementation to > > evaluate. (At least, I'm not aware of any implementation of them, unless > > someone did it while I wasn't looking :) > > At least for 3102, yes, someone did: > https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1549670&group_id=5470 > > -- > EduardoOPadoan (eopadoan->altavix::com) > Bookmarks: http://del.icio.us/edcrypt > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From g.brandl at gmx.net Sat Feb 17 19:42:52 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 17 Feb 2007 19:42:52 +0100 Subject: [Python-3000] PEPs 3xxx status In-Reply-To: References: <45D694F5.50109@acm.org> Message-ID: Guido van Rossum schrieb: > And it's in the p3yk branch, too. > > The main reason these are all still drafts is that I expect that > implementing them may cause a certain amount of redesign, and in some > cases the spec isn't entirely clear. The "real" acceptance status (in > my head) is all over the map -- 3102 is obviously accepted, 3101 > likely, 3103 unlikely, 3104 possibly, 3108 is too early to tell, and > the rest (3105, 06, 07, 09, 10) are accepted. I updated the PEP index to reflect that. Georg From jimjjewett at gmail.com Sun Feb 18 04:18:53 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Sat, 17 Feb 2007 22:18:53 -0500 Subject: [Python-3000] immutable classes [was: pre-PEP: Default Argument Expressions] Message-ID: I have added python-ideas to the Cc list, and suggest removing python-3000 from additional replies. BJ?rn Lindqvist gave an example explaining why he might want to re-evaluate mutable default arguments. It still looks like like buggy code, but it isn't the error I was expecting -- and I think it comes from the difficulty of declaring something immutable. On 2/15/07, BJ?rn Lindqvist wrote: > On 2/15/07, Jim Jewett wrote: > > Then are there *any* good use cases for [non-persistent mutable defaults] > > (1) Not really (treated as) mutable. ==> Doesn't care > > >>> def f(extra_settings={}) ... > > usually doesn't modify or even store extra_settings; ... > That is dangerous code. Sooner or later someone will modify the > extra_settings dict. How? >>> f.func_defaults[0]['key']=value may be misguided, but it probably isn't an accident. BJ?rn's example does store the mutable directly, but it makes a bit more sense because it looks like a complex object rather than just a mapping. > class Vector: > def __init__(self, x, y, z): > self.x = x > self.y = y > self.z = z > class Ray: > def __init__(self, direction, origin = Vector(0, 0, 0)): > self.direction = direction > self.origin = origin > > ray1 = Ray(Vector(0, 0, -1)) > ray2 = Ray(Vector(0, 0, 1)) > ray3 = Ray(Vector(-1, 0, 0), Vector(2, 3, 4)) > The above code looks quite nice, but is wrong. Why is vector mutable? Is the real problem that it is too hard to declare objects or attributes immutable? My solution is below, but I'll grant that it isn't as straightforward as I would have liked. Is this something that could be solved with a recipe, or a factory to make immutable classes? >>> class Vector3D(tuple): ... def __new__(self, x, y, z): ... return super(Vector3D, self).__new__(self, (x, y, z)) ... x=property(lambda self: self[0]) ... y=property(lambda self: self[1]) ... z=property(lambda self: self[2]) -jJ From andre.roberge at gmail.com Mon Feb 19 19:32:05 2007 From: andre.roberge at gmail.com (Andre Roberge) Date: Mon, 19 Feb 2007 14:32:05 -0400 Subject: [Python-3000] Pre-PEP: Simple input built-in in Python 3000 In-Reply-To: References: <7528bcdd0612220545u147f07a4gb476dd43733dfe46@mail.gmail.com> Message-ID: <7528bcdd0702191032n4347e8c8p6987553deb9be445@mail.gmail.com> Any possibility that (some of) the following can be done before Pycon? Respectfully yours, Andr? Roberge On 12/23/06, Guido van Rossum wrote: [http://mail.python.org/pipermail/python-3000/2006-December/005257.html] > BTW, can someone clean up and check in the proto-PEP and start working > on an implementation or patch? Should be really simple. I'd like to > see a patch for the refactoring tool (sandbox/2to3) as well. This was as a follow up to: http://mail.python.org/pipermail/python-3000/2006-December/005249.html On 12/22/06, Guido van Rossum wrote: > I like the exact proposal made here better than any of the > alternatives mentioned so far. > > - Against naming it readline(): the "real" readline doesn't strip the > \n and returns an empty string for EOF instead of raising EOFError; I > believe the latter is more helpful for true beginners' code. > > - Against naming it ask() and renaming print() to say(): I find those > rather silly names that belong in toy or AI languages. Changing print > from statement to function maintains Pythonicity; renaming it say() > does not. > > - I don't expect there will be much potential confusion with the 2.x > input(); that function is used extremely rarely. It will be trivial to > add rules to the refactoring tool (sandbox/2to3/) that replace input() > with eval(input()) and replace raw_input() with input(). > > --Guido > > On 12/22/06, Andre Roberge wrote: > > A few months ago, there was an active discussion on edu-sig regarding > > the proposed fate of raw_input(). The text below is an attempt at > > summarizing the discussion in the form of a tentative PEP. > > It is respectfully submitted for your consideration. > > > > If it is to be considered, in some form, as an official PEP, I have > > absolutely no objection for a regular python-dev contributor to take over > > the > > ownership/authorship. > > > > Andr? Roberge > > > > ----------------------------------------------------------------- > > PEP: XXX > > Title: Simple input built-in in Python 3000 > > Version: $Revision: 0.2 $ > > Last-Modified: $Date: 2006/12/22 10:00:00 $ > > Author: Andr? Roberge > > Status: Draft > > Type: Standards Track > > Content-Type: text/x-rst > > Created: 13-Sep-2006 > > Python-Version: 3.0 > > Post-History: > > > > Abstract > > ======== > > > > Input and output are core features of computer programs. Currently, > > Python provides a simple means of output through the print keyword > > and two simple means of interactive input through the input() > > and raw_input() built-in functions. > > > > Python 3.0 will introduces various incompatible changes with previous > > Python versions[1]. Among the proposed changes, print will become a > > built-in > > function, print(), while input() and raw_input() would be removed completely > > from the built-in namespace, requiring importing some module to provide > > even the most basic input capability. > > > > This PEP proposes that Python 3.0 retains some simple interactive user > > input capability, equivalent to raw_input(), within the built-in namespace. > > > > Motivation > > ========== > > > > With its easy readability and its support for many programming styles > > (e.g. procedural, object-oriented, etc.) among others, Python is perhaps > > the best computer language to use in introductory programming classes. > > Simple programs often need to provide information to the user (output) > > and to obtain information from the user (interactive input). > > Any computer language intended to be used in an educational setting should > > provide straightforward methods for both output and interactive input. > > > > The current proposals for Python 3.0 [1] include a simple output pathway > > via a built-in function named print(), but a more complicated method for > > input [e.g. via sys.stdin.readline()], one that requires importing an > > external > > module. Current versions of Python (pre-3.0) include raw_input() as a > > built-in function. With the availability of such a function, programs that > > require simple input/output can be written from day one, without requiring > > discussions of importing modules, streams, etc. > > > > Rationale > > ========= > > > > Current built-in functions, like input() and raw_input(), are found to be > > extremely useful in traditional teaching settings. (For more details, > > see [2] and the discussion that followed.) > > While the BDFL has clearly stated [3] that input() was not to be kept in > > Python 3000, he has also stated that he was not against revising the > > decision of killing raw_input(). > > > > raw_input() provides a simple mean to ask a question and obtain a response > > from a user. The proposed plans for Python 3.0 would require the > > replacement > > of the single statement > > > > name = raw_input("What is your name?") > > > > by the more complicated > > > > import sys > > print("What is your name?") > > same = sys.stdin.readline() > > > > However, from the point of view of many Python beginners and educators, the > > use of sys.stdin.readline() presents the following problems: > > > > 1. Compared to the name "raw_input", the name "sys.stdin.readline()" > > is clunky and inelegant. > > > > 2. The names "sys" and "stdin" have no meaning for most beginners, > > who are mainly interested in *what* the function does, and not *where* > > in the package structure it is located. The lack of meaning also makes > > it difficult to remember: > > is it "sys.stdin.readline()", or " stdin.sys.readline()"? > > To a programming novice, there is not any obvious reason to prefer > > one over the other. In contrast, functions simple and direct names like > > print, input, and raw_input, and open are easier to remember. > > > > 3. The use of "." notation is unmotivated and confusing to many beginners. > > For example, it may lead some beginners to think "." is a standard > > character that could be used in any identifier. > > > > 4. There is an asymmetry with the print function: why is print not called > > sys.stdout.print()? > > > > > > Specification > > ============= > > > > The built-in input function should be totally equivalent to the existing > > raw_input() function. > > > > Open issues > > =========== > > > > With input() effectively removed from the language, the name raw_input() > > makes much less sense and alternatives should be considered. The > > various possibilities mentioned in various forums include: > > > > ask() > > ask_user() > > get_string() > > input() # rejected by BDFL > > prompt() > > read() > > user_input() > > get_response() > > > > While it has bee rejected by the BDFL, it has been suggested that the most > > direct solution would be to rename "raw_input" to "input" in Python 3000. > > The main objection is that Python 2.x already has a function named "input", > > and, even though it is not going to be included in Python 3000, > > having a built-in function with the same name but different semantics may > > confuse programmers migrating from 2.x to 3000. Certainly, this is no > > problem > > for beginners, and the scope of the problem is unclear for more experienced > > programmers, since raw_input(), while popular with many, is not in > > universal use. In this instance, the good it does for beginners could be > > seen to outweigh the harm it does to experienced programmers - > > although it could cause confusion for people reading older books or > > tutorials. > > > > > > References > > ========== > > > > .. [1] PEP 3100, Miscellaneous Python 3.0 Plans, Kuchling, Cannon > > (http://www.python.org/dev/peps/pep-3100/) > > .. [2] The fate of raw_input() in Python 3000 > > (http://mail.python.org/pipermail/edu-sig/2006-September/006967.html) > > .. [3] Educational aspects of Python 3000 > > ( > > http://mail.python.org/pipermail/python-3000/2006-September/003589.html) > > > > > > Copyright > > ========= > > > > This document has been placed in the public domain. > > > > _______________________________________________ > > Python-3000 mailing list > > Python-3000 at python.org > > http://mail.python.org/mailman/listinfo/python-3000 > > Unsubscribe: > > http://mail.python.org/mailman/options/python-3000/guido%40python.org > > > > > > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > From jan.kanis at phil.uu.nl Mon Feb 19 23:55:45 2007 From: jan.kanis at phil.uu.nl (Jan Kanis) Date: Mon, 19 Feb 2007 23:55:45 +0100 Subject: [Python-3000] [Python-ideas] immutable classes [was: pre-PEP: Default Argument Expressions] In-Reply-To: References: Message-ID: On the 'using old semantics when you really want to' part, that's very well possible with a decorator under the proposed semantics: def caching(**cachevars): def inner(func): def wrapper(**argdict): for var in cachevars: if not var in argdict: argdict[var] = cachevars[var] return func(**argdict) return wrapper return inner @caching(cache={}) def foo(in, cache): result = bar(in) cache[in] = result return result This implementation of caching doesn't handle positional args, but it can be made to. One such decorator would still be a net win of several hundred lines of code in the standard lib. Of course, IMHO, the real fix to this is to 1) have default expressions be evaluated at calltime, and 2) have _all_ lexical variables be bound at definition time and 3) make them immutable. Then something like lst = [] for i in range(10): lst.append(lambda i: i*i) would work. That would be a real win for functional programming. (good thing) Unfortunately Guido's decided not to support (1), and (2) has been proposed some time ago and didn't make it. In both cases because it would be to big a departure from how Python currently works. (3) is quite impossible in a language like python. I just hope if python were designed today it would have done these. - Jan From raymond.hettinger at verizon.net Tue Feb 20 09:55:57 2007 From: raymond.hettinger at verizon.net (Raymond Hettinger) Date: Tue, 20 Feb 2007 00:55:57 -0800 Subject: [Python-3000] Thoughts on dictionary views Message-ID: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1> The Java concept of dictionary views seems to have caught-on here while I wasn't looking. At the risk of covering some old ground, I would like to re-open the question. Here are a few thoughts on the subject to kick-off the discussion: * Maintaining a live (self-updating) view is a bit tricky from an implementation point-of-view. While it is clearly doable for dictionaries, it is not clear that it is a good idea for a general mapping API which can be wrapped around dbms, shelves, elementtrees, b-trees, and other wrascally rabbits. I doubt that the underlying structures of other mapping types support the observer pattern necessary to keep views updated -- this is doubly true if the underlying data is on disk and can be updated by other processes, threads, etc. * One of the purported benefits is to provide set-like behavior without the expense of copying to a new set object. FWIW, I've updated the set implementation to be more interoperable with dictionaries so that the conversion costs are negligible (about the same as a dict resize operation -- one pass, no calls to PyObject_Hash, insertion into a presized, sparse table with very few collisions). * A dict is also one of Python's most basic APIs (along with lists). Ideally, we should keep those two APIs as simple as possible (getting rid of setdefault() and unneeded methods is a step in the right direction). IMO, the views will be the hardest part of the API to explain and interact with when learning the language -- to learn about dicts and lists, you already have to learn about mutability and hashability -- it doesn't help this situation if you then need to learn about self-updating views that can be deleted, have modified values, but cannot be added, and that have their own set-like operations but aren't really sets . . . * ISTM that views offer three benefits: re-iterability, set behavior, and self-updates. IMO, the first is not commonly needed and is trivially served by writing list(mydict.items()) or somesuch. The second is best served by an explicit conversion to a set or frozenset type -- those two types have been enormously successful in that they seem to offer a near zero learning curve -- people seem to intuitively know how to use them right out of the box. As long as that conversion is fast, I think the explicit conversion is the way to go -- it is the way you would do it with any other Python type where you wanted set behavior. Adding a handful of set methods to dict views would only complicate an otherwise simple situation and introduce unnecessary complexity (i.e. what should isinstance(d.d_keys, set) return?). The third benefit (self-updates) is more interesting and does not have a direct analog with existing python tools, so the question is how valuable is self-updating behavior and are there compelling use cases that warrant a more complex API? My recommendation is to take a more conservative route. Let's make dicts as simple as possible and then introduce a new collections module entry with the views bells and whistles. If the collections version proves itself as enormously popular, useful, understandable, and without a good equivalent, then it can ask for a promotion. The collections module is nice place to put in alternate datatypes that meet the more demanding needs of advanced users who know exactly what they want/need in terms of special behaviors or performance. And, if we take the collections module route, there is no reason that it cannot be put into Py2.6 where people will either flock to it or ignore it, with either result providing us with good guidance for Py3.0. my-two-cents, Raymond From ncoghlan at gmail.com Tue Feb 20 14:24:35 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 20 Feb 2007 23:24:35 +1000 Subject: [Python-3000] Thoughts on dictionary views In-Reply-To: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1> References: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1> Message-ID: <45DAF693.1000004@gmail.com> Raymond Hettinger wrote: > The Java concept of dictionary views seems to have caught-on here while I wasn't > looking. At the risk of covering some old ground, I would like to re-open the > question. Here are a few thoughts on the subject to kick-off the discussion: > > * Maintaining a live (self-updating) view is a bit tricky from an implementation > point-of-view. While it is clearly doable for dictionaries, it is not clear > that it is a good idea for a general mapping API which can be wrapped around > dbms, shelves, elementtrees, b-trees, and other wrascally rabbits. I doubt that > the underlying structures of other mapping types support the observer pattern > necessary to keep views updated -- this is doubly true if the underlying data is > on disk and can be updated by other processes, threads, etc. FWIW, the py3k trunk is still somewhat broken from the implementation of this change. Without any test resources enabled, I still get more than half a dozen failures which appear at a glance to be related to unexpected mutation of dictionaries (test_anydbm, test_dumbdbm, test_mutants, test_compile, test_iter, test_iterlen, test_minidom, test_os, test_importhooks, test_unittest) > > * One of the purported benefits is to provide set-like behavior without the > expense of copying to a new set object. FWIW, I've updated the set > implementation to be more interoperable with dictionaries so that the conversion > costs are negligible (about the same as a dict resize operation -- one pass, no > calls to PyObject_Hash, insertion into a presized, sparse table with very few > collisions). The speed costs may become negligible, but I believe the main concern here is memory consumption (minimising memory usage is certainly the only reason I've ever made sure to use the dict.iter* methods). However, the string discussion has given me another view on that front, too... (more on that below) > My recommendation is to take a more conservative route. Let's make dicts as > simple as possible and then introduce a new collections module entry with the > views bells and whistles. If the collections version proves itself as > enormously popular, useful, understandable, and without a good equivalent, then > it can ask for a promotion. The collections module is nice place to put in > alternate datatypes that meet the more demanding needs of advanced users who > know exactly what they want/need in terms of special behaviors or performance. > And, if we take the collections module route, there is no reason that it cannot > be put into Py2.6 where people will either flock to it or ignore it, with either > result providing us with good guidance for Py3.0. One of the things that's been suggested for working with strings in Py3k is a stringview type - a wrapper type around a string where slicing (and similar operations, like partition()) creates objects with a reference & offset into the original string rather than actually copying data around. Standard strings would be entirely unaffected. The concept of use being that a functions would accept a string-like object as an object, wrap the stringview around it, then convert the result back to a normal string before passing it back to the caller. Couldn't something similar serve as a replacement for the iter*() methods on dictionaries? That is, rather than copying the data into a different data structure, instead provide a group of wrapper classes, each of which operate on the wrapped mapping in a different way? The necessary wrapper classes needed would be: - set API exposing keys of the original mapping - multiset API exposing values of the underlying mapping - keyed set API exposing (key, value) pairs of the original mapping - mapping API that uses the above for keys(), values() & items(), but otherwise delegates operations to the original mapping All except the last already exist in the Py3k branch (as the role of the last suggested wrapper type is currently being handled by the dict data type itself) Similar to Raymond's suggestion of a new concrete container type (rather than a wrapper type), this could be included in the standard library for Python 2.6, making it significantly easier to write forward compatible code. Given such a new dict wrapper type (or standalone container type, if Raymond's approach is taken), then it would also be possible to change the basic mapping API to define different return types for the 3 methods that currently return lists: - keys() would be changed to return a set - values() would be changed to return a multiset (non-hash based) - items() would be changed to return a keyed set (ala sort keys) The difference from the current Py3k branch is that these would still involve copying the data from the original dictionary to a new concrete container object which is then returned. The more I've seen of these discussions (the original dict method one, as well as the string concatenation/slicing one), the more leery I become of including any view type behaviour in the basic data types. One factor in this is that I've been getting back into C++ coding lately, and keep getting reminded of the various cases where the C++ standard defaults to behaviours that are faster (use less memory, whatever) when they're valid, but silently do the wrong thing when they're inappropriate (default assignment and copy constructions operators that lead to significant memory double-free problems are a nice example, as is the fact that methods are non-virtual by default). So rather than spend the time to figure out whether or not the default behaviour is safe for each case, it becomes quicker and easier to just stick in the boilerplate to tell the compiler "don't do the default thing, it is probably wrong", thus completely invalidating the supposed performance improvement that was meant to be provided by the default behaviour (and requiring a programmer to put in a comment saying so when the default behaviour really is what they want). Typically, Python doesn't work that way - it defaults to 'safe' behaviour, but provides sufficient flexibility to permit optimisation when it is necessary (e.g., with the size of the data sets the NumPy folks sling around, view-based behaviour is essential, and Python lets them do it that way). My apologies for rambling a bit - I can't currently give a succinct explanation for why the current direction feels wrong, but I felt it was worth supporting Raymond on this point. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ncoghlan at gmail.com Tue Feb 20 14:28:23 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 20 Feb 2007 23:28:23 +1000 Subject: [Python-3000] [Python-ideas] immutable classes [was: pre-PEP: Default Argument Expressions] In-Reply-To: References: Message-ID: <45DAF777.8070708@gmail.com> Jan Kanis wrote: I just hope if > python were designed today it would have done these. If Python had done these, it wouldn't be Python ;) There are many, many programming language design decisions which have good arguments on each side (and some which seem obviously correct may involve hidden costs which aren't appreciated until after it is too late to change them). That's one of the major reasons why there are so many different programming languages out there. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ncoghlan at gmail.com Tue Feb 20 14:59:58 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 20 Feb 2007 23:59:58 +1000 Subject: [Python-3000] Pre-PEP: Simple input built-in in Python 3000 In-Reply-To: <7528bcdd0702191032n4347e8c8p6987553deb9be445@mail.gmail.com> References: <7528bcdd0612220545u147f07a4gb476dd43733dfe46@mail.gmail.com> <7528bcdd0702191032n4347e8c8p6987553deb9be445@mail.gmail.com> Message-ID: <45DAFEDE.4030109@gmail.com> Andre Roberge wrote: > Any possibility that (some of) the following can be done before Pycon? > Respectfully yours, > Andr? Roberge I've added the PEP as 3111. I made a few small modifications (and committed it directly as Accepted) based on Guido's comments in this thread. The actual change still needs to be made, though. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From jason.orendorff at gmail.com Tue Feb 20 15:42:20 2007 From: jason.orendorff at gmail.com (Jason Orendorff) Date: Tue, 20 Feb 2007 09:42:20 -0500 Subject: [Python-3000] Thoughts on dictionary views In-Reply-To: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1> References: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1> Message-ID: On 2/20/07, Raymond Hettinger wrote: > * A dict is also one of Python's most basic APIs (along with lists). Ideally, > we should keep those two APIs as simple as possible (getting rid of setdefault() > and unneeded methods is a step in the right direction). IMO, the views will be > the hardest part of the API to explain and interact with when learning the > language [...] I agree. Views will make dicts harder to learn for newcomers and trickier to use even for experts. The current non-aliasing behavior is a feature. Seen that way, the switch to views seems like a broken optimization. > * ISTM that views offer three benefits: re-iterability, set behavior, and > self-updates. [...] I think the benefit the team really liked was #4, "delete iterkeys(), itervalues(), and iteritems() from the mapping API". But this now seems like false economy to me. -j From steven.bethard at gmail.com Tue Feb 20 16:08:12 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Tue, 20 Feb 2007 08:08:12 -0700 Subject: [Python-3000] Thoughts on dictionary views In-Reply-To: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1> References: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1> Message-ID: On 2/20/07, Raymond Hettinger wrote: > * ISTM that views offer three benefits: re-iterability, set behavior, and > self-updates. IMO, the first is not commonly needed and is trivially served by > writing list(mydict.items()) or somesuch. The second is best served by an > explicit conversion to a set or frozenset type [snip] > My recommendation is to take a more conservative route. Let's make dicts as > simple as possible and then introduce a new collections module entry with the > views bells and whistles. Just to clarfiy, you're suggesting that we still change .keys() .values() and .items() to iterators, right? If so, +1. I was also starting to get a bit nervous about the new complexity of dict(). Putting the view-like behavior into the collections module makes good sense. STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From p.f.moore at gmail.com Tue Feb 20 16:13:57 2007 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 20 Feb 2007 15:13:57 +0000 Subject: [Python-3000] Thoughts on dictionary views In-Reply-To: References: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1> Message-ID: <79990c6b0702200713s69c90510o617595344fd17af@mail.gmail.com> On 20/02/07, Steven Bethard wrote: > On 2/20/07, Raymond Hettinger wrote: > > My recommendation is to take a more conservative route. Let's make dicts as > > simple as possible and then introduce a new collections module entry with the > > views bells and whistles. > > Just to clarfiy, you're suggesting that we still change .keys() > .values() and .items() to iterators, right? > > If so, +1. I was also starting to get a bit nervous about the new > complexity of dict(). Putting the view-like behavior into the > collections module makes good sense. I'm also +1. (I have similar concerns over the "new IO" proposals I've seen, but there's nothing concrete there yet, so I'll save that argument for another day...) Paul From aahz at pythoncraft.com Tue Feb 20 16:42:04 2007 From: aahz at pythoncraft.com (Aahz) Date: Tue, 20 Feb 2007 07:42:04 -0800 Subject: [Python-3000] Thoughts on dictionary views In-Reply-To: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1> References: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1> Message-ID: <20070220154204.GB20369@panix.com> On Tue, Feb 20, 2007, Raymond Hettinger wrote: > > My recommendation is to take a more conservative route. Let's make > dicts as simple as possible and then introduce a new collections > module entry with the views bells and whistles. If the collections > version proves itself as enormously popular, useful, understandable, > and without a good equivalent, then it can ask for a promotion. +1, and thank you for cogently writing up the unease that I was feeling -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "I disrespectfully agree." --SJM From guido at python.org Tue Feb 20 16:51:07 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 20 Feb 2007 07:51:07 -0800 Subject: [Python-3000] Pre-PEP: Simple input built-in in Python 3000 In-Reply-To: <45DAFEDE.4030109@gmail.com> References: <7528bcdd0612220545u147f07a4gb476dd43733dfe46@mail.gmail.com> <7528bcdd0702191032n4347e8c8p6987553deb9be445@mail.gmail.com> <45DAFEDE.4030109@gmail.com> Message-ID: Why do you want this *before* PyCon? It would be much easier to do this as part of the Py3k sprint. On 2/20/07, Nick Coghlan wrote: > Andre Roberge wrote: > > Any possibility that (some of) the following can be done before Pycon? > > Respectfully yours, > > Andr? Roberge > > I've added the PEP as 3111. I made a few small modifications (and > committed it directly as Accepted) based on Guido's comments in this thread. > > The actual change still needs to be made, though. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > --------------------------------------------------------------- > http://www.boredomandlaziness.org > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Feb 20 18:09:16 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 20 Feb 2007 09:09:16 -0800 Subject: [Python-3000] Pre-PEP: Simple input built-in in Python 3000 In-Reply-To: <7528bcdd0702200901r62f8cc4fu7ea7f1e59725e4b6@mail.gmail.com> References: <7528bcdd0612220545u147f07a4gb476dd43733dfe46@mail.gmail.com> <7528bcdd0702191032n4347e8c8p6987553deb9be445@mail.gmail.com> <45DAFEDE.4030109@gmail.com> <7528bcdd0702200901r62f8cc4fu7ea7f1e59725e4b6@mail.gmail.com> Message-ID: Consider the PEP accepted. Regarding the conversion, please do use the sandbox/2to3 framework. Write me if you have trouble understanding the many examples already in fixes/. On 2/20/07, Andre Roberge wrote: > On 2/20/07, Guido van Rossum wrote: > > Why do you want this *before* PyCon? It would be much easier to do > > this as part of the Py3k sprint. > > > > My main interest was to have, prior to Pycon, the PEP recorded as > such; it had been close to 2 months since the last post on this issue > on the list. > > As for the actual work, I'd be willing to volunteer to write the > required code (with test cases) that could be use to do the conversion > input(...) -> eval(input(...)) > raw_input(...) -> input(...) > > Unfortunately, I will not be participating in any sprints. > > Andr? > > > > > On 2/20/07, Nick Coghlan wrote: > > > Andre Roberge wrote: > > > > Any possibility that (some of) the following can be done before Pycon? > > > > Respectfully yours, > > > > Andr? Roberge > > > > > > I've added the PEP as 3111. I made a few small modifications (and > > > committed it directly as Accepted) based on Guido's comments in this thread. > > > > > > The actual change still needs to be made, though. > > > > > > Cheers, > > > Nick. > > > > > > -- > > > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > > > --------------------------------------------------------------- > > > http://www.boredomandlaziness.org > > > _______________________________________________ > > > Python-3000 mailing list > > > Python-3000 at python.org > > > http://mail.python.org/mailman/listinfo/python-3000 > > > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > > > > > > > > > -- > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From andre.roberge at gmail.com Tue Feb 20 17:58:11 2007 From: andre.roberge at gmail.com (Andre Roberge) Date: Tue, 20 Feb 2007 12:58:11 -0400 Subject: [Python-3000] Pre-PEP: Simple input built-in in Python 3000 In-Reply-To: References: <7528bcdd0612220545u147f07a4gb476dd43733dfe46@mail.gmail.com> <7528bcdd0702191032n4347e8c8p6987553deb9be445@mail.gmail.com> <45DAFEDE.4030109@gmail.com> Message-ID: <7528bcdd0702200858j74653284x1d368920c9b34e5e@mail.gmail.com> On 2/20/07, Guido van Rossum wrote: > Why do you want this *before* PyCon? It would be much easier to do > this as part of the Py3k sprint. > My main interest was to have, prior to Pycon, the PEP recorded as such; it had been close to 2 months since the last post on this issue on the list. As for the actual work, if no regular developer is interested, I'd be willing to volunteer to write the required code (with test cases) that could be use to do the conversion input(...) -> eval(input(...)) raw_input(...) -> input(...) Unfortunately, I will not be participating in any sprints. Andr? > On 2/20/07, Nick Coghlan wrote: > > Andre Roberge wrote: > > > Any possibility that (some of) the following can be done before Pycon? > > > Respectfully yours, > > > Andr? Roberge > > > > I've added the PEP as 3111. I made a few small modifications (and > > committed it directly as Accepted) based on Guido's comments in this thread. > > > > The actual change still needs to be made, though. > > > > Cheers, > > Nick. > > > > -- > > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > > --------------------------------------------------------------- > > http://www.boredomandlaziness.org > > _______________________________________________ > > Python-3000 mailing list > > Python-3000 at python.org > > http://mail.python.org/mailman/listinfo/python-3000 > > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > From andre.roberge at gmail.com Tue Feb 20 18:01:38 2007 From: andre.roberge at gmail.com (Andre Roberge) Date: Tue, 20 Feb 2007 13:01:38 -0400 Subject: [Python-3000] Pre-PEP: Simple input built-in in Python 3000 In-Reply-To: References: <7528bcdd0612220545u147f07a4gb476dd43733dfe46@mail.gmail.com> <7528bcdd0702191032n4347e8c8p6987553deb9be445@mail.gmail.com> <45DAFEDE.4030109@gmail.com> Message-ID: <7528bcdd0702200901r62f8cc4fu7ea7f1e59725e4b6@mail.gmail.com> On 2/20/07, Guido van Rossum wrote: > Why do you want this *before* PyCon? It would be much easier to do > this as part of the Py3k sprint. > My main interest was to have, prior to Pycon, the PEP recorded as such; it had been close to 2 months since the last post on this issue on the list. As for the actual work, I'd be willing to volunteer to write the required code (with test cases) that could be use to do the conversion input(...) -> eval(input(...)) raw_input(...) -> input(...) Unfortunately, I will not be participating in any sprints. Andr? > On 2/20/07, Nick Coghlan wrote: > > Andre Roberge wrote: > > > Any possibility that (some of) the following can be done before Pycon? > > > Respectfully yours, > > > Andr? Roberge > > > > I've added the PEP as 3111. I made a few small modifications (and > > committed it directly as Accepted) based on Guido's comments in this thread. > > > > The actual change still needs to be made, though. > > > > Cheers, > > Nick. > > > > -- > > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > > --------------------------------------------------------------- > > http://www.boredomandlaziness.org > > _______________________________________________ > > Python-3000 mailing list > > Python-3000 at python.org > > http://mail.python.org/mailman/listinfo/python-3000 > > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > From greg.ewing at canterbury.ac.nz Wed Feb 21 00:02:16 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 21 Feb 2007 12:02:16 +1300 Subject: [Python-3000] Thoughts on dictionary views In-Reply-To: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1> References: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1> Message-ID: <45DB7DF8.1010109@canterbury.ac.nz> Raymond Hettinger wrote: > * Maintaining a live (self-updating) view is a bit tricky from an implementation > point-of-view. I don't understand what the alternative is. If mutating the underlying object doesn't affect the view, then you don't really have a view, just a copy of the data -- no different from the existing dict keys() etc. If you're saying that you shouldn't be able to mutate the underlying object *through* the view, that's okay -- I don't mind if the views are read-only in some or all cases. > Let's make dicts as > simple as possible and then introduce a new collections module entry with the > views bells and whistles. If the view methods are only available on a special dict subclass and not on ordinary dicts, their usefulness will be severely crippled, so you wouldn't learn much from the experiment. -- Greg From guido at python.org Wed Feb 21 00:25:28 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 20 Feb 2007 15:25:28 -0800 Subject: [Python-3000] Thoughts on dictionary views In-Reply-To: <45DB7DF8.1010109@canterbury.ac.nz> References: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1> <45DB7DF8.1010109@canterbury.ac.nz> Message-ID: On 2/20/07, Greg Ewing wrote: > Raymond Hettinger wrote: > > > * Maintaining a live (self-updating) view is a bit tricky from an implementation > > point-of-view. > > I don't understand what the alternative is. If mutating the > underlying object doesn't affect the view, then you don't > really have a view, just a copy of the data -- no different > from the existing dict keys() etc. FWIW, I didn't find the implementation tricky at all -- the views are very small objects that simply contain a reference to the underlying dict. All operations on the view defer to the dict one way or another. The code in the PEP also shows how simple this is to do generically for any underlying mapping object that implements __getitem__, __contains__, __len__, and __iter__. (Or it will, once I am done updating it. :-) > If you're saying that you shouldn't be able to mutate the > underlying object *through* the view, that's okay -- I don't > mind if the views are read-only in some or all cases. While the PEP has some mutability, the implementation currently has all views be read-only, and I like this enough to want to keep it that way. > > Let's make dicts as > > simple as possible and then introduce a new collections module entry with the > > views bells and whistles. > > If the view methods are only available on a special dict > subclass and not on ordinary dicts, their usefulness will > be severely crippled, so you wouldn't learn much from > the experiment. True. I'm also unclear on what "as simple as possible" would mean. Perhaps delete iterkeys etc. and make keys etc. return iterators? That was the *old* plan, which was never really challenged, and IMO it is in every aspect inferior to the current plan. BTW the PEP was incorrectly marked as accepted. I'll unmark it, and remove the mutability. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Feb 21 00:46:11 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 20 Feb 2007 15:46:11 -0800 Subject: [Python-3000] Thoughts on dictionary views In-Reply-To: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1> References: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1> Message-ID: On 2/20/07, Raymond Hettinger wrote: > The Java concept of dictionary views seems to have caught-on here while I wasn't > looking. At the risk of covering some old ground, I would like to re-open the > question. Because it's coming from you I am reopening the discussion; didn't mean to exclude you. But it is a bit of a pain that you weren't looking while this was discussed. If we decide to roll it back it will be painful (depending on the shape of the roll-back). > Here are a few thoughts on the subject to kick-off the discussion: > > * Maintaining a live (self-updating) view is a bit tricky from an implementation > point-of-view. While it is clearly doable for dictionaries, it is not clear > that it is a good idea for a general mapping API which can be wrapped around > dbms, shelves, elementtrees, b-trees, and other wrascally rabbits. I doubt that > the underlying structures of other mapping types support the observer pattern > necessary to keep views updated -- this is doubly true if the underlying data is > on disk and can be updated by other processes, threads, etc. No observer pattern is required. See the (updated) PEP 3106. > * One of the purported benefits is to provide set-like behavior without the > expense of copying to a new set object. FWIW, I've updated the set > implementation to be more interoperable with dictionaries so that the conversion > costs are negligible (about the same as a dict resize operation -- one pass, no > calls to PyObject_Hash, insertion into a presized, sparse table with very few > collisions). But it is still O(N) in time and space. Creating a dict view is O(1) in both. > * A dict is also one of Python's most basic APIs (along with lists). Ideally, > we should keep those two APIs as simple as possible (getting rid of setdefault() > and unneeded methods is a step in the right direction). IMO, the views will be > the hardest part of the API to explain and interact with when learning the > language -- to learn about dicts and lists, you already have to learn about > mutability and hashability -- it doesn't help this situation if you then need to > learn about self-updating views that can be deleted, have modified values, but > cannot be added, and that have their own set-like operations but aren't really > sets . . . Perhaps it will be more palatable now that the views aren't mutable? Also, I think you may have the wrong semantic model -- it's not a self-updating view, it's just a different way to look at the same underlying mapping. (Did you see PEP 3106? Since you don't quote it this is not clear.) > * ISTM that views offer three benefits: re-iterability, set behavior, and > self-updates. IMO, the first is not commonly needed and is trivially served by > writing list(mydict.items()) or somesuch. The second is best served by an > explicit conversion to a set or frozenset type -- those two types have been > enormously successful in that they seem to offer a near zero learning curve -- > people seem to intuitively know how to use them right out of the box. Yes, Greg Wilson did a super job on the API design. (Though I keep having to remind Googlers about sets; they seem to have lived in a world limited to Python 2.2 for too long. :-( ) > As long > as that conversion is fast, I think the explicit conversion is the way to go -- > it is the way you would do it with any other Python type where you wanted set > behavior. Adding a handful of set methods to dict views would only complicate > an otherwise simple situation and introduce unnecessary complexity (i.e. what > should isinstance(d.d_keys, set) return?). This I hope to address by introducing Abstract Base Classes. Unfortunately that proposal isn't at all worked out, the best we have is a wiki page by Bill Janssen (), but that is quite far removed from what I would like to see. > The third benefit (self-updates) is > more interesting and does not have a direct analog with existing python tools, > so the question is how valuable is self-updating behavior and are there > compelling use cases that warrant a more complex API? Just because it's new doesn't make it suspect does it? It's been very well received in Java. > My recommendation is to take a more conservative route. Let's make dicts as > simple as possible I'd like to see a concrete proposal here before I can judge which is the better proposal. > and then introduce a new collections module entry with the > views bells and whistles. If the collections version proves itself as > enormously popular, useful, understandable, and without a good equivalent, then > it can ask for a promotion. The collections module is nice place to put in > alternate datatypes that meet the more demanding needs of advanced users who > know exactly what they want/need in terms of special behaviors or performance. But that's not what dict views are about. They ar about making the mapping API easier for *all* users. (Anyway, Greg Ewing already shot this down.) > And, if we take the collections module route, there is no reason that it cannot > be put into Py2.6 where people will either flock to it or ignore it, with either > result providing us with good guidance for Py3.0. Dict views can easily be added to 2.6 by using different method names that can be automatically converted by the 2to3 converter. E.g. d.viewkeys(), d.viewitems(), d.viewvalues(). The implementation should plug right in. (Anthony and Thomas also have some more advanced ideas on how to make keys/items/values return views when used in a module declaring "from __future__ import dict_views".) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Feb 21 00:48:50 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 20 Feb 2007 15:48:50 -0800 Subject: [Python-3000] Thoughts on dictionary views In-Reply-To: <79990c6b0702200713s69c90510o617595344fd17af@mail.gmail.com> References: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1> <79990c6b0702200713s69c90510o617595344fd17af@mail.gmail.com> Message-ID: On 2/20/07, Paul Moore wrote: > (I have similar concerns over the "new IO" proposals I've > seen, but there's nothing concrete there yet, so I'll save that > argument for another day...) Then you should also have misgivings about the Unicode/str unification. If you are cool with that, I don't see how we can avoid redoing the I/O library. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Feb 21 00:51:01 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 20 Feb 2007 15:51:01 -0800 Subject: [Python-3000] Thoughts on dictionary views In-Reply-To: References: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1> Message-ID: On 2/20/07, Steven Bethard wrote: > Just to clarfiy, you're suggesting that we still change .keys() > .values() and .items() to iterators, right? But this isn't really easier to explain to noobs than views, is it? What's the advantage of >>> {}.keys() >>> over >>> {}.keys() >>> ??? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From steven.bethard at gmail.com Wed Feb 21 01:10:53 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Tue, 20 Feb 2007 17:10:53 -0700 Subject: [Python-3000] Thoughts on dictionary views In-Reply-To: References: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1> Message-ID: On 2/20/07, Guido van Rossum wrote: > On 2/20/07, Steven Bethard wrote: > > Just to clarfiy, you're suggesting that we still change .keys() > > .values() and .items() to iterators, right? > > But this isn't really easier to explain to noobs than views, is it? > What's the advantage of > > >>> {}.keys() > > >>> > > over > > >>> {}.keys() > > >>> > > ??? No advantage at the interactive prompt of course. ;-) The advantage is only in what you have to explain about the object. In the former case, you can simply say "it's an iterator over the keys" and they can understand it with their existing knowledge of iterators. And if they don't know what iterators are, once they learn about them for this case, they'll also know how iterators work in other situations, e.g. list iterators, set iterators, deque iterators, etc. On the other hand, when they're told "it's a dict key view object", they can't use any existing knowledge. They have to go and look up the API for what exactly a dict key view object does. And once they've learned what API a dict key view object supports, that knowledge is not really helpful in any new situations. They won't see key views on lists, sets or deques, for example. So it's mainly about keeping the mental footprint small. Knowing how iterators work is a useful bit of knowledge that is widely applicable across a variety of Python objects. Knowing how the various dict views work is not so generally useful. STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From guido at python.org Wed Feb 21 01:13:42 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 20 Feb 2007 16:13:42 -0800 Subject: [Python-3000] Thoughts on dictionary views In-Reply-To: <45DAF693.1000004@gmail.com> References: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1> <45DAF693.1000004@gmail.com> Message-ID: On 2/20/07, Nick Coghlan wrote: > Raymond Hettinger wrote: > > The Java concept of dictionary views seems to have caught-on here while I wasn't > > looking. At the risk of covering some old ground, I would like to re-open the > > question. Here are a few thoughts on the subject to kick-off the discussion: > > > > * Maintaining a live (self-updating) view is a bit tricky from an implementation > > point-of-view. While it is clearly doable for dictionaries, it is not clear > > that it is a good idea for a general mapping API which can be wrapped around > > dbms, shelves, elementtrees, b-trees, and other wrascally rabbits. I doubt that > > the underlying structures of other mapping types support the observer pattern > > necessary to keep views updated -- this is doubly true if the underlying data is > > on disk and can be updated by other processes, threads, etc. > > FWIW, the py3k trunk is still somewhat broken from the implementation of > this change. Without any test resources enabled, I still get more than > half a dozen failures which appear at a glance to be related to > unexpected mutation of dictionaries (test_anydbm, test_dumbdbm, > test_mutants, test_compile, test_iter, test_iterlen, test_minidom, > test_os, test_importhooks, test_unittest) Yes, I plan to have those fixed before PyCon (so we won't have to waste time on them at the sprint). But if someone wants to help that would be great! > > * One of the purported benefits is to provide set-like behavior without the > > expense of copying to a new set object. FWIW, I've updated the set > > implementation to be more interoperable with dictionaries so that the conversion > > costs are negligible (about the same as a dict resize operation -- one pass, no > > calls to PyObject_Hash, insertion into a presized, sparse table with very few > > collisions). > > The speed costs may become negligible, but I believe the main concern > here is memory consumption (minimising memory usage is certainly the > only reason I've ever made sure to use the dict.iter* methods). As I said, it's still O(N) time and space, vs. O(1) for creating a view. [...] > One of the things that's been suggested for working with strings in Py3k > is a stringview type - a wrapper type around a string where slicing (and > similar operations, like partition()) creates objects with a reference & > offset into the original string rather than actually copying data > around. That must've been proposed while *I* was away. :-) I'm not at all convinced that this kind of complexity is helpful at all. But note that it's a different case from dict views, since string views can be turned into copies with only some cost (or savings :-) in time and space, but without sematic changes. Dict views have semantics. > Standard strings would be entirely unaffected. The concept of > use being that a functions would accept a string-like object as an > object, wrap the stringview around it, then convert the result back to a > normal string before passing it back to the caller. > > Couldn't something similar serve as a replacement for the iter*() > methods on dictionaries? That is, rather than copying the data into a > different data structure, Um, neither iterkeys() nor dict views do any copying. They just reference the underlying dict in a tiny fixed-size object (literally one pointer for dict views, a bit more for iterkeys(), in order to detect mutations to the dict). > instead provide a group of wrapper classes, > each of which operate on the wrapped mapping in a different way? > > The necessary wrapper classes needed would be: > - set API exposing keys of the original mapping > - multiset API exposing values of the underlying mapping > - keyed set API exposing (key, value) pairs of the original mapping > - mapping API that uses the above for keys(), values() & items(), but > otherwise delegates operations to the original mapping > > All except the last already exist in the Py3k branch (as the role of the > last suggested wrapper type is currently being handled by the dict data > type itself) This is a clear explanation of the implementation; but can you also explain the benefits? > Similar to Raymond's suggestion of a new concrete container type (rather > than a wrapper type), this could be included in the standard library for > Python 2.6, making it significantly easier to write forward compatible code. I doubt that writing forward compatible code will be hard anyways. The most compatible code simply doesn't use any of the six affected methods (keys(), iterkeys(), etc.) and instead relies on directly manipulating or iterating over the dict. This is backwards compatible all the way back to 2.2. Also, the conversion tool will make it easy to write compatible code that uses iterkeys() but not keys(). > Given such a new dict wrapper type (or standalone container type, if > Raymond's approach is taken), then it would also be possible to change > the basic mapping API to define different return types for the 3 methods > that currently return lists: > - keys() would be changed to return a set > - values() would be changed to return a multiset (non-hash based) > - items() would be changed to return a keyed set (ala sort keys) I'm not sure what you mean by "ala sort keys". I hope there's no requirement that items() be sorted. Note that the implementation of a multiset type will be quite tricky (I'm punting on this in the rewrite of PEP 3106 that just got refreshed on python.org). Also, this implementation will make it hard to ensure that list(zip(d.keys(), d.values())) == list(d.items()) as the keys() and values() return different object types. > The difference from the current Py3k branch is that these would still > involve copying the data from the original dictionary to a new concrete > container object which is then returned. But that's the main thing I'm trying to *avoid* with the new API! > The more I've seen of these discussions (the original dict method one, > as well as the string concatenation/slicing one), the more leery I > become of including any view type behaviour in the basic data types. I can't say I see much similarity between the two discussions. The issues are all completely different -- for dicts they focus on API semantics, while for strings they focus on performance in all sorts of odd cases. > One factor in this is that I've been getting back into C++ coding > lately, and keep getting reminded of the various cases where the C++ > standard defaults to behaviours that are faster (use less memory, > whatever) when they're valid, but silently do the wrong thing when > they're inappropriate (default assignment and copy constructions > operators that lead to significant memory double-free problems are a > nice example, as is the fact that methods are non-virtual by default). > So rather than spend the time to figure out whether or not the default > behaviour is safe for each case, it becomes quicker and easier to just > stick in the boilerplate to tell the compiler "don't do the default > thing, it is probably wrong", thus completely invalidating the supposed > performance improvement that was meant to be provided by the default > behaviour (and requiring a programmer to put in a comment saying so when > the default behaviour really is what they want). Well, unless you plan to get rid of dict.__iter__(), that's a default behavior that is wrong whenever you mutate the dict in the loop -- but what are you going to do about it? > Typically, Python doesn't work that way - it defaults to 'safe' > behaviour, but provides sufficient flexibility to permit optimisation > when it is necessary (e.g., with the size of the data sets the NumPy > folks sling around, view-based behaviour is essential, and Python lets > them do it that way). > > My apologies for rambling a bit - I can't currently give a succinct > explanation for why the current direction feels wrong, but I felt it was > worth supporting Raymond on this point. Apologies accepted -- but yes, you did ramble a bit, and I still wish you'd collected your thoughts a bit more. if there are simple clear arguments it's easier for me to accept or reject them than with a bunch of ramblings. Sorry to be grumpy, but given the implementation stage this is in and how long the PEP has been sitting unchanged I'm a bit annoyed that the criticism, valid or not, comes so late. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Feb 21 01:17:55 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 20 Feb 2007 16:17:55 -0800 Subject: [Python-3000] Thoughts on dictionary views In-Reply-To: References: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1> Message-ID: On 2/20/07, Steven Bethard wrote: > On 2/20/07, Guido van Rossum wrote: > > On 2/20/07, Steven Bethard wrote: > > > Just to clarfiy, you're suggesting that we still change .keys() > > > .values() and .items() to iterators, right? > > > > But this isn't really easier to explain to noobs than views, is it? > > What's the advantage of > > > > >>> {}.keys() > > > > >>> > > > > over > > > > >>> {}.keys() > > > > >>> > > > > ??? > > No advantage at the interactive prompt of course. ;-) > > The advantage is only in what you have to explain about the object. In > the former case, you can simply say "it's an iterator over the keys" > and they can understand it with their existing knowledge of iterators. Uhm, you gotta be kidding. You don't seriously expect noobs to have a priori understanding if iterators do you? Those most likely come *after* dict views. > And if they don't know what iterators are, once they learn about them > for this case, they'll also know how iterators work in other > situations, e.g. list iterators, set iterators, deque iterators, etc. Most of which one rarely needs to know about. In fact, i'd say that if it wasn't for dict.iterkeys() we could probably hide iterators quite effectively for a long time from noobs. > On the other hand, when they're told "it's a dict key view object", > they can't use any existing knowledge. They have to go and look up the > API for what exactly a dict key view object does. And once they've > learned what API a dict key view object supports, that knowledge is > not really helpful in any new situations. They won't see key views on > lists, sets or deques, for example. But they will see them (I hope) on other mappings. > So it's mainly about keeping the mental footprint small. Knowing how > iterators work is a useful bit of knowledge that is widely applicable > across a variety of Python objects. Knowing how the various dict views > work is not so generally useful. Since they mostly behave like sets (that you can't mutate directly) they should be very low conceptual overhead. (Raymond already remarked on the success of the set API and I wholeheartedly agree.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From steven.bethard at gmail.com Wed Feb 21 01:35:15 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Tue, 20 Feb 2007 17:35:15 -0700 Subject: [Python-3000] Thoughts on dictionary views In-Reply-To: References: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1> Message-ID: On 2/20/07, Steven Bethard wrote: > Just to clarfiy, you're suggesting that we still change .keys() > .values() and .items() to iterators, right? On 2/20/07, Guido van Rossum wrote: > But this isn't really easier to explain to noobs than views, is it? On 2/20/07, Steven Bethard wrote: > On the other hand, when they're told "it's a dict key view object", > they can't use any existing knowledge. They have to go and look up the > API for what exactly a dict key view object does. And once they've > learned what API a dict key view object supports, that knowledge is > not really helpful in any new situations. They won't see key views on > lists, sets or deques, for example. On 2/20/07, Guido van Rossum wrote: > But they will see them (I hope) on other mappings. Presumably. All I was really pointing out is that your average Python programmer encounters more iterable objects than they do mapping-like objects. (Inevitable, of course, since all mapping-like objects are iterable.) My conclusion was therefore that iterability was a more basic part of Python. IMVHO, the fewer building blocks you have to understand to use the basic Python types, the better. But I'm going to let the discussion go for a while now, because it's a much better use of your time convincing Raymond than it is convincing me. ;-) STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From tdelaney at avaya.com Wed Feb 21 01:33:42 2007 From: tdelaney at avaya.com (Delaney, Timothy (Tim)) Date: Wed, 21 Feb 2007 11:33:42 +1100 Subject: [Python-3000] Thoughts on dictionary views Message-ID: <2773CAC687FD5F4689F526998C7E4E5F074468@au3010avexu1.global.avaya.com> Steven Bethard wrote: > The advantage is only in what you have to explain about the object. In > the former case, you can simply say "it's an iterator over the keys" > and they can understand it with their existing knowledge of iterators. "it's an iterator over the keys" They use their knowledge of iterators (a standard concept in Python 2.2+). > On the other hand, when they're told "it's a dict key view object", > they can't use any existing knowledge. They have to go and look up the "it's a set view of the keys" They use their knowledge of sets (a standard concept in Python 2.3+) and views (a standard concept in Python 2.6+). The standard concept of a view will be something like: A view is a lightweight object that implements an interface by delegating to an underlying object. The underlying object cannot be changed through the view, but could be changed directly, in which case the view will reflect the new contents of the object. Note that some changes to the underlying object may invalidate the view, in which case using it will throw an exception. Note also that there is nothing preventing someone from creating a view-like class that allows changing the underlying object through it, but such a class should probably not be described as a view. Tim Delaney From guido at python.org Wed Feb 21 01:45:24 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 20 Feb 2007 16:45:24 -0800 Subject: [Python-3000] Thoughts on dictionary views In-Reply-To: <2773CAC687FD5F4689F526998C7E4E5F074468@au3010avexu1.global.avaya.com> References: <2773CAC687FD5F4689F526998C7E4E5F074468@au3010avexu1.global.avaya.com> Message-ID: On 2/20/07, Delaney, Timothy (Tim) wrote: > Steven Bethard wrote: > > > The advantage is only in what you have to explain about the object. In > > the former case, you can simply say "it's an iterator over the keys" > > and they can understand it with their existing knowledge of iterators. > > "it's an iterator over the keys" > > They use their knowledge of iterators (a standard concept in Python > 2.2+). > > > On the other hand, when they're told "it's a dict key view object", > > they can't use any existing knowledge. They have to go and look up the > > "it's a set view of the keys" > > They use their knowledge of sets (a standard concept in Python 2.3+) and > views (a standard concept in Python 2.6+). > > The standard concept of a view will be something like: > > A view is a lightweight object that implements an interface by > delegating to an underlying object. The underlying object cannot be > changed through the view, but could be changed directly, in which case > the view will reflect the new contents of the object. > > Note that some changes to the underlying object may invalidate the view, > in which case using it will throw an exception. No, this only invalidates an in-progress iterator. > Note also that there is nothing preventing someone from creating a > view-like class that allows changing the underlying object through it, > but such a class should probably not be described as a view. You can also think of dict views as a straightforward application of the GoF adapter pattern. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From larry at hastings.org Wed Feb 21 01:51:16 2007 From: larry at hastings.org (Larry Hastings) Date: Tue, 20 Feb 2007 16:51:16 -0800 Subject: [Python-3000] Thoughts on dictionary views In-Reply-To: <2773CAC687FD5F4689F526998C7E4E5F074468@au3010avexu1.global.avaya.com> References: <2773CAC687FD5F4689F526998C7E4E5F074468@au3010avexu1.global.avaya.com> Message-ID: <45DB9784.4030903@hastings.org> Delaney, Timothy (Tim) wrote: > A view is a lightweight object that implements an interface by > delegating to an underlying object. The underlying object cannot be > changed through the view, but could be changed directly, in which case > the view will reflect the new contents of the object. It certainly makes sense that views would *usually* be read-only, but is that really a *requirement*? attrview(), recently discussed in Python-Dev, allowed changing the underlying object. See Martin v. Lowis's implementation of attrview() here: http://mail.python.org/pipermail/python-dev/2007-February/071044.html It allowed setting attributes on the underlying object, like this: attrview(self)[method_name] = attrview(self.metadata)[method_name] Cheers, /larry/ From tdelaney at avaya.com Wed Feb 21 02:03:28 2007 From: tdelaney at avaya.com (Delaney, Timothy (Tim)) Date: Wed, 21 Feb 2007 12:03:28 +1100 Subject: [Python-3000] Thoughts on dictionary views Message-ID: <2773CAC687FD5F4689F526998C7E4E5FF1ECAA@au3010avexu1.global.avaya.com> Larry Hastings wrote: > Delaney, Timothy (Tim) wrote: >> A view is a lightweight object that implements an interface by >> delegating to an underlying object. The underlying object cannot be >> changed through the view, but could be changed directly, in which >> case the view will reflect the new contents of the object. > > It certainly makes sense that views would *usually* be read-only, but > is that really a *requirement*? No, but I think it would be worthwhile (and definitely simplest) if the standard concept of a view in python was read-only. Then any non-read-only view becomes the exception, and needs to be flagged as such. Tim Delaney From tdelaney at avaya.com Wed Feb 21 02:08:36 2007 From: tdelaney at avaya.com (Delaney, Timothy (Tim)) Date: Wed, 21 Feb 2007 12:08:36 +1100 Subject: [Python-3000] Thoughts on dictionary views Message-ID: <2773CAC687FD5F4689F526998C7E4E5FF1ECAB@au3010avexu1.global.avaya.com> Guido van Rossum wrote: >> Note that some changes to the underlying object may invalidate the >> view, in which case using it will throw an exception. > > No, this only invalidates an in-progress iterator. Yeah - that's what I meant - just couldn't think if there were any other situations that might (at least with the standard views). >> Note also that there is nothing preventing someone from creating a >> view-like class that allows changing the underlying object through >> it, but such a class should probably not be described as a view. > > You can also think of dict views as a straightforward application of > the GoF adapter pattern. Yep - and I think that would be a good secondary explanation, instantly understandable by anyone with much programming experience. I think it's important though to set the expectations of what a view will normally be used for, so that any unqualified use of the term "view" will have a common understanding. And I think that the unqualified "view" should mean read-only. Tim Delaney From jcarlson at uci.edu Wed Feb 21 03:07:57 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Tue, 20 Feb 2007 18:07:57 -0800 Subject: [Python-3000] Thoughts on dictionary views In-Reply-To: <2773CAC687FD5F4689F526998C7E4E5FF1ECAB@au3010avexu1.global.avaya.com> References: <2773CAC687FD5F4689F526998C7E4E5FF1ECAB@au3010avexu1.global.avaya.com> Message-ID: <20070220174008.ADCE.JCARLSON@uci.edu> (merging a few replies to reduce traffic) "Delaney, Timothy (Tim)" wrote: > Guido van Rossum wrote: > > You can also think of dict views as a straightforward application of > > the GoF adapter pattern. > > Yep - and I think that would be a good secondary explanation, instantly > understandable by anyone with much programming experience. Not necessarily. I have been programming for almost a decade (most of it in Python), and I haven't yet taken a software engineering course (none were offered or required during my undergrad, and I didn't take any in my masters program); so the whole "patterns" thing is generally opaque to me. Some of them are self-evident in the name (observer, visitor, etc.), but "GoF adapter pattern" is Greek to me. "Guido van Rossum" wrote: > Perhaps it will be more palatable now that the views aren't mutable? > Also, I think you may have the wrong semantic model -- it's not a > self-updating view, it's just a different way to look at the same > underlying mapping. (Did you see PEP 3106? Since you don't quote it > this is not clear.) I was "eh, why bother?" prior to reading the updated PEP 3106, but now can see the benefit to keys(), values(), and items() returning views. I'm not sure I would use the added features (I don't believe I've ever compared the equalities of keys or values of two dictionaries separately, and I tend to stick to .iter*() methods), but I can also see how it would be useful to some users. "Guido van Rossum" wrote: > On 2/20/07, Nick Coghlan wrote: > > The speed costs may become negligible, but I believe the main concern > > here is memory consumption (minimising memory usage is certainly the > > only reason I've ever made sure to use the dict.iter* methods). > > As I said, it's still O(N) time and space, vs. O(1) for creating a view. But that's only if the .keys(), .values(), .items() produced actual sets versus producing a view. In the case of .values(), I don't see how one can do *any* better than O(n) for the a.values() == b.values() (or really O(nlogn) for comparable objects, and O(n^2) when they are not). There are going to be special cases that ruin performance with all 3 options (use Python 2.x equivalent .iter*(), use a view, use a set variant). While I can *see* the use of views, I can also see the benefit with just renaming .iter*() as .*() . - Josiah From jcarlson at uci.edu Wed Feb 21 03:10:37 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Tue, 20 Feb 2007 18:10:37 -0800 Subject: [Python-3000] Thoughts on dictionary views In-Reply-To: References: <79990c6b0702200713s69c90510o617595344fd17af@mail.gmail.com> Message-ID: <20070220171429.ADC5.JCARLSON@uci.edu> "Guido van Rossum" wrote: > On 2/20/07, Paul Moore wrote: > > (I have similar concerns over the "new IO" proposals I've > > seen, but there's nothing concrete there yet, so I'll save that > > argument for another day...) > > Then you should also have misgivings about the Unicode/str > unification. If you are cool with that, I don't see how we can avoid > redoing the I/O library. I'm not so sure. The return type on socket.recv and os.read could be changed to bytes (seemingly without much difficulty), and likely could even be changed to *take* a bytes object as the destination buffer (ditto for files opened as 'raw'). From there, aside from updating the standard library to handle socket, os.read, etc., for incoming data expecting a bytes object, and raising an exception when trying to write a unicode object, that is the limit to the changes. Of course, even with the proposed updated I/O library, every one of those modules would have to be changed anyways. Then again, I've been "eh?" on the whole I/O library thing, and generally annoyed at the "everything is unicode" idea. Converting all libraries that currently deal with IO is going to be a pain, especially if it does any sort of parsing of mixed binary and non-unicode textual data (like http headers combined with binary posted data or a utf-8 encoded stream). As a heavy user of quite a few of the current standard library IO modules (SocketServer, asyncore, urllib, socket, etc.) and as someone who has the "opportunity" to write line-level protocols, I'd be quite happy with the following... 1) add bytes (or add features to array) 2) rename unicode to text (or str) 3) renaming str to bin (or some other sufficiently clear name) 4) making string literals 'hello' be unicode 5) allow for b'constant' be the renamed str 6) add a mandatory 3rd argument to file/open which is the codec to use for reading 7) offer a new function for opening 'binary' files (which are opened as 'rb' or 'wb' whenever 'r' or 'w' are passed, respectively), which will remove confusion on Windows platforms Indeed, it isn't as revolutionary as "everything is unicode", but it would allow the standard library to be updated with a relative minimum of fuss and muss, without needing to intermix... x = bytes.decode('latin-1').USEFUL_UNICODE_METHOD(...) or sock.send(unicode.encode('latin-1')) - Josiah From guido at python.org Wed Feb 21 04:32:19 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 20 Feb 2007 19:32:19 -0800 Subject: [Python-3000] Thoughts on dictionary views In-Reply-To: <20070220174008.ADCE.JCARLSON@uci.edu> References: <2773CAC687FD5F4689F526998C7E4E5FF1ECAB@au3010avexu1.global.avaya.com> <20070220174008.ADCE.JCARLSON@uci.edu> Message-ID: On 2/20/07, Josiah Carlson wrote: > I was "eh, why bother?" prior to reading the updated PEP 3106, but now > can see the benefit to keys(), values(), and items() returning views. > I'm not sure I would use the added features (I don't believe I've ever > compared the equalities of keys or values of two dictionaries separately, > and I tend to stick to .iter*() methods), but I can also see how it > would be useful to some users. Thanks for the (relative) vote of confidence. Was it an update I made to the PEP, or did you not read it at all before? > "Guido van Rossum" wrote: > > On 2/20/07, Nick Coghlan wrote: > > > The speed costs may become negligible, but I believe the main concern > > > here is memory consumption (minimising memory usage is certainly the > > > only reason I've ever made sure to use the dict.iter* methods). > > > > As I said, it's still O(N) time and space, vs. O(1) for creating a view. > > But that's only if the .keys(), .values(), .items() produced actual sets > versus producing a view. Which (producing an actual set) is Nick's (and Raymond's) proposal. > In the case of .values(), I don't see how one > can do *any* better than O(n) for the a.values() == b.values() (or > really O(nlogn) for comparable objects, and O(n^2) when they are not). The PEP's algorithm for comparing values in O(N**2); I'm not sure it's worth attempting to optimize it, since I'm not aware of any use case; but it still seems better to do this than to compare values views by object identity. > There are going to be special cases that ruin performance with all 3 > options (use Python 2.x equivalent .iter*(), use a view, use a set > variant). While I can *see* the use of views, I can also see the > benefit with just renaming .iter*() as .*() . Name a special case that ruins performance with either PEP 3106 or renaming .iter*() to .*()? Methinks that both views and iterators can be optimal, at least for .keys() and .items()), certainly in terms of O() notation; there may be edge cases where many many lookups are done, where making a copy into a set could be faster, but that requires the total # of lookups to be much larger than N. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Feb 21 04:44:22 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 20 Feb 2007 19:44:22 -0800 Subject: [Python-3000] Thoughts on new I/O library and bytecode Message-ID: [Note: changed subject] On 2/20/07, Josiah Carlson wrote: > > "Guido van Rossum" wrote: > > On 2/20/07, Paul Moore wrote: > > > (I have similar concerns over the "new IO" proposals I've > > > seen, but there's nothing concrete there yet, so I'll save that > > > argument for another day...) > > > > Then you should also have misgivings about the Unicode/str > > unification. If you are cool with that, I don't see how we can avoid > > redoing the I/O library. > > I'm not so sure. The return type on socket.recv and os.read could be > changed to bytes (seemingly without much difficulty), Yes, that's the plan anyway. > and likely could > even be changed to *take* a bytes object as the destination buffer > (ditto for files opened as 'raw'). This already works -- bytes support the buffer API. > From there, aside from updating the > standard library to handle socket, os.read, etc., for incoming data > expecting a bytes object, and raising an exception when trying to write > a unicode object, that is the limit to the changes. Sure. > Of course, even with the proposed updated I/O library, every one of > those modules would have to be changed anyways. Right. But I expect the higher-level APIs (sock.makefile()) to be relatively stable. > Then again, I've been "eh?" on the whole I/O library thing, and > generally annoyed at the "everything is unicode" idea. Well, unless you remove the str type, how are you going to get rid of the endless problems with unicode where mixing unicode and str sometimes works and sometimes doesn't? > Converting all > libraries that currently deal with IO is going to be a pain, especially > if it does any sort of parsing of mixed binary and non-unicode textual > data (like http headers combined with binary posted data or a utf-8 > encoded stream). Yeah, I'm not looking forward to that, but I expect it'll be relatively straightforward once we figure out the right patterns; there's just a lot of code to convert. But that's the whole Py3k plan. > As a heavy user of quite a few of the current standard library IO > modules (SocketServer, asyncore, urllib, socket, etc.) and as someone > who has the "opportunity" to write line-level protocols, I'd be quite > happy with the following... > > 1) add bytes (or add features to array) > 2) rename unicode to text (or str) > 3) renaming str to bin (or some other sufficiently clear name) So you'd have THREE types (bytes, text, bin)? Or are you proposing bin instead of bytes, contrary to what you suggested above? > 4) making string literals 'hello' be unicode > 5) allow for b'constant' be the renamed str > 6) add a mandatory 3rd argument to file/open which is the codec to use > for reading And how does that help users or compatibility? > 7) offer a new function for opening 'binary' files (which are opened as > 'rb' or 'wb' whenever 'r' or 'w' are passed, respectively), which will > remove confusion on Windows platforms This is a red herring. Or I'm not sure I understand this part of your proposal. What's wrong with 'rb'? > Indeed, it isn't as revolutionary as "everything is unicode", but it > would allow the standard library to be updated with a relative minimum > of fuss and muss, without needing to intermix... > x = bytes.decode('latin-1').USEFUL_UNICODE_METHOD(...) > or > sock.send(unicode.encode('latin-1')) Actually, with the renamings and everything, it's just about as disruptive as the current proposal, so I'm unclear why you think this is so different. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jcarlson at uci.edu Wed Feb 21 06:52:08 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Tue, 20 Feb 2007 21:52:08 -0800 Subject: [Python-3000] Thoughts on new I/O library and bytecode In-Reply-To: References: Message-ID: <20070220205651.ADD7.JCARLSON@uci.edu> "Guido van Rossum" wrote: > [Note: changed subject] > On 2/20/07, Josiah Carlson wrote: > > I'm not so sure. The return type on socket.recv and os.read could be > > changed to bytes (seemingly without much difficulty), > > Yes, that's the plan anyway. Better than returning unicode, but not as good as returning "binary". > > and likely could > > even be changed to *take* a bytes object as the destination buffer > > (ditto for files opened as 'raw'). > > This already works -- bytes support the buffer API. I was thinking of... buff = bytes(4096*[0]) received = sock.recv(buff) It's really only useful when you have a known protocol with fixed size blocks, but need it to run more or less forever. By fixing the buffer size, you can have significantly reduced memory fragmentation. > > Then again, I've been "eh?" on the whole I/O library thing, and > > generally annoyed at the "everything is unicode" idea. > > Well, unless you remove the str type, how are you going to get rid of > the endless problems with unicode where mixing unicode and str > sometimes works and sometimes doesn't? Ooh, one of my favorite games! * Explicit is better than implicit. * In the face of ambiguity, refuse the temptation to guess . * Errors should never pass silently. There are at least two approaches to solving the problem: 1) make everything unicode 2) make all implicit conversions an error. Adding strings to unicode should produce an exception. The fact that it doesn't right now, I believe, is both a result of implementation details getting in the way of what should happen. Remove the ambiguity, codec guessing, etc., raise a TypeError("cannot concatenate str and unicode objects"), and move on. Don't allow up-casting in u''.join() or ''.join() (or their equivalents in py3k). > > Converting all > > libraries that currently deal with IO is going to be a pain, especially > > if it does any sort of parsing of mixed binary and non-unicode textual > > data (like http headers combined with binary posted data or a utf-8 > > encoded stream). > > Yeah, I'm not looking forward to that, but I expect it'll be > relatively straightforward once we figure out the right patterns; > there's just a lot of code to convert. But that's the whole Py3k plan. No offense, but the plan to convert it all to use bytes, stinks. Starting with the API defined in PEP 358, I started converting smtpd (as an example), and I found myself *wanting* to use unicode because the whole numeric constants and/or bytes('unicode', 'latin-1') got really old really fast. > > As a heavy user of quite a few of the current standard library IO > > modules (SocketServer, asyncore, urllib, socket, etc.) and as someone > > who has the "opportunity" to write line-level protocols, I'd be quite > > happy with the following... > > > > 1) add bytes (or add features to array) > > 2) rename unicode to text (or str) > > 3) renaming str to bin (or some other sufficiently clear name) > > So you'd have THREE types (bytes, text, bin)? Or are you proposing bin > instead of bytes, contrary to what you suggested above? While I would have some personal uses for bytes, all of them could be fulfilled with an expanded array type. If I could have my way I'd rename string and unicode, fold some of the features of bytes into array, and make socket, etc., return the renamed string type. In the case of the standard library that deal with sockets, the only changes would generally be a replacing of 'const' to b'const'. That could *almost* be automatic, and would be significantly faster (for a computer + human) than converting all of the .split(), .find(), etc., uses in the ftplib, *Server, smtplib, smtpd, etc. to bytes eqivalents (or converting to and from unicode). It would take me perhaps 20 minutes to update asyncore, asynchat and smtpd with the b'binary' semantic. Based on the last list of methods I saw for bytes in PEP 358, I would be, more or less, doing bytes.decode ('latin-1') instead of trying to deal with the *crippled* interface that bytes offers. Regardless, the performance of those modules would likely suffer when confronted with bytes rather than a renamed str, as the current bytes type lacks a large number of convenience methods, that I previously complained about it not having (which is why I brought up the string view and sample implementation in late August/early September 2006). > > 4) making string literals 'hello' be unicode > > 5) allow for b'constant' be the renamed str > > 6) add a mandatory 3rd argument to file/open which is the codec to use > > for reading > > And how does that help users or compatibility? Users who need binary literals (like every socket module in the standard library, anyone who does processing of any non-unicode disk/socket/pipe data, like marshal or pickle, etc.) wouldn't go insane and add bugs trying to switch to the bytes type, or add performance overhead trying to convert the received bytes to unicode to get a useful API. > > 7) offer a new function for opening 'binary' files (which are opened as > > 'rb' or 'wb' whenever 'r' or 'w' are passed, respectively), which will > > remove confusion on Windows platforms > > This is a red herring. Or I'm not sure I understand this part of your > proposal. What's wrong with 'rb'? Presumption: a = open(filename, 'r' or 'w' ['+'], codec) will open a file as unicode in Py3k (if I am wrong, please correct me). Proposal: b = somename(filename, 'r' or 'w' ['+']) will be equivalent to: b = open(filename, 'rb' or 'wb' ['+']) today. This prevents the confusion over different argument values resulting in different types being returned and accepted by certain methods. > > Indeed, it isn't as revolutionary as "everything is unicode", but it > > would allow the standard library to be updated with a relative minimum > > of fuss and muss, without needing to intermix... > > x = bytes.decode('latin-1').USEFUL_UNICODE_METHOD(...) > > or > > sock.send(unicode.encode('latin-1')) > > Actually, with the renamings and everything, it's just about as > disruptive as the current proposal, so I'm unclear why you think this > is so different. sock.send(b'Header: value\r\n') ^ The above change can be more or less automatic. The below? sock.send(bytes('Header: value\r\n', 'latin-1')) sock.send('Header: value\r\n'.encode('latin-1')) Either of the above is 17 characters of noise that really shouldn't need to be there. - Josiah From guido at python.org Wed Feb 21 07:33:27 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 20 Feb 2007 22:33:27 -0800 Subject: [Python-3000] Thoughts on new I/O library and bytecode In-Reply-To: <20070220205651.ADD7.JCARLSON@uci.edu> References: <20070220205651.ADD7.JCARLSON@uci.edu> Message-ID: On 2/20/07, Josiah Carlson wrote: > > "Guido van Rossum" wrote: > > [Note: changed subject] > > On 2/20/07, Josiah Carlson wrote: > > > I'm not so sure. The return type on socket.recv and os.read could be > > > changed to bytes (seemingly without much difficulty), > > > > Yes, that's the plan anyway. > > Better than returning unicode, but not as good as returning "binary". It never was the plan to have this return unicode BTW. What's the difference between "binary" and "bytes"? To me, bytes *means* binary. > > > and likely could > > > even be changed to *take* a bytes object as the destination buffer > > > (ditto for files opened as 'raw'). > > > > This already works -- bytes support the buffer API. > > I was thinking of... > > buff = bytes(4096*[0]) > received = sock.recv(buff) > > It's really only useful when you have a known protocol with fixed size > blocks, but need it to run more or less forever. By fixing the buffer > size, you can have significantly reduced memory fragmentation. You can do that already with recv_into(), which takes anything that supports the writable buffer API. > > > Then again, I've been "eh?" on the whole I/O library thing, and > > > generally annoyed at the "everything is unicode" idea. > > > > Well, unless you remove the str type, how are you going to get rid of > > the endless problems with unicode where mixing unicode and str > > sometimes works and sometimes doesn't? > > Ooh, one of my favorite games! > > * Explicit is better than implicit. > * In the face of ambiguity, refuse the temptation to guess to use to decode the string>. > * Errors should never pass silently. > > There are at least two approaches to solving the problem: > 1) make everything unicode > 2) make all implicit conversions an error. The plan is both. > Adding strings to unicode should produce an exception. The fact that it > doesn't right now, I believe, is both a result of implementation details > getting in the way of what should happen. No, it was by design to make things more compatible. I think we can say that was a mistake; but it was done for that reason, not for reasons of implementation details. > Remove the ambiguity, codec > guessing, etc., raise a TypeError("cannot concatenate str and unicode > objects"), and move on. > > Don't allow up-casting in u''.join() or ''.join() (or their equivalents > in py3k). So what would you use the str type for? > > > Converting all > > > libraries that currently deal with IO is going to be a pain, especially > > > if it does any sort of parsing of mixed binary and non-unicode textual > > > data (like http headers combined with binary posted data or a utf-8 > > > encoded stream). > > > > Yeah, I'm not looking forward to that, but I expect it'll be > > relatively straightforward once we figure out the right patterns; > > there's just a lot of code to convert. But that's the whole Py3k plan. > > No offense, but the plan to convert it all to use bytes, stinks. > Starting with the API defined in PEP 358, I started converting smtpd (as > an example), and I found myself *wanting* to use unicode because the > whole numeric constants and/or bytes('unicode', 'latin-1') got really > old really fast. Have you actually looked at the Py3k implementation? It's quite different from that PEP. But nevertheless, it's a good experiment; I'll have a look at this myself. > > > As a heavy user of quite a few of the current standard library IO > > > modules (SocketServer, asyncore, urllib, socket, etc.) and as someone > > > who has the "opportunity" to write line-level protocols, I'd be quite > > > happy with the following... > > > > > > 1) add bytes (or add features to array) > > > 2) rename unicode to text (or str) > > > 3) renaming str to bin (or some other sufficiently clear name) > > > > So you'd have THREE types (bytes, text, bin)? Or are you proposing bin > > instead of bytes, contrary to what you suggested above? > > While I would have some personal uses for bytes, all of them could be > fulfilled with an expanded array type. Well, that's what it is, but without the baggage of being able how it maps to Python objects (that's up to the encode/decode operations instead). > If I could have my way > I'd rename string and unicode, fold some of the features of > bytes into array, and make socket, etc., return the renamed string > type. But which of the two renamed string types? The 8-bit or the unicode string? > In the case of the standard library that deal with > sockets, the only changes would generally be a replacing of 'const' to > b'const'. That could *almost* be automatic, and would be significantly > faster (for a computer + human) than converting all of the .split(), > .find(), etc., uses in the ftplib, *Server, smtplib, smtpd, etc. to > bytes eqivalents (or converting to and from unicode). Actually, while they don't exist now, I plan for the bytes type to have .split() and .find() and most other string methods *except* .lower() and .islower() and everything else that interprets bytes as characters. > It would take me perhaps 20 minutes to update asyncore, asynchat and > smtpd with the b'binary' semantic. Based on the last list of methods I > saw for bytes in PEP 358, I would be, more or less, doing bytes.decode > ('latin-1') instead of trying to deal with the *crippled* interface that > bytes offers. So forget that PEP and help adding these methods to the bytes type in the p3yk branch. The b"..." literal proposal is not unpleasant, as long as we can limit it to ASCII characters and hex/octal escapes. > Regardless, the performance of those modules would likely suffer when > confronted with bytes rather than a renamed str, as the current bytes > type lacks a large number of convenience methods, that I previously > complained about it not having (which is why I brought up the string > view and sample implementation in late August/early September 2006). I think you misunderstood the plans for bytes. The plan is for the performance with bytes to scream, in part because they are immutable so one would occasionally save copying a buffer an extra time. > > > 4) making string literals 'hello' be unicode > > > 5) allow for b'constant' be the renamed str > > > 6) add a mandatory 3rd argument to file/open which is the codec to use > > > for reading > > > > And how does that help users or compatibility? > > Users who need binary literals (like every socket module in the standard > library, anyone who does processing of any non-unicode disk/socket/pipe > data, like marshal or pickle, etc.) wouldn't go insane and add bugs > trying to switch to the bytes type, or add performance overhead trying > to convert the received bytes to unicode to get a useful API. Let's drop the hyperbole. > > > 7) offer a new function for opening 'binary' files (which are opened as > > > 'rb' or 'wb' whenever 'r' or 'w' are passed, respectively), which will > > > remove confusion on Windows platforms > > > > This is a red herring. Or I'm not sure I understand this part of your > > proposal. What's wrong with 'rb'? > > Presumption: > a = open(filename, 'r' or 'w' ['+'], codec) > will open a file as unicode in Py3k (if I am wrong, please correct me). Right. > Proposal: > b = somename(filename, 'r' or 'w' ['+']) > will be equivalent to: > b = open(filename, 'rb' or 'wb' ['+']) > today. This prevents the confusion over different argument values > resulting in different types being returned and accepted by certain > methods. Possibly. Though if we keep the 'rb' semantics for open() and this is just an alias, I'm not sure what we gain except Two Ways To Do It. In your view, what *do* we gain by using separate factories for binary and text files? (Except some opportunity for static typechecking, as binary files don't have the same API!) > > > Indeed, it isn't as revolutionary as "everything is unicode", but it > > > would allow the standard library to be updated with a relative minimum > > > of fuss and muss, without needing to intermix... > > > x = bytes.decode('latin-1').USEFUL_UNICODE_METHOD(...) > > > or > > > sock.send(unicode.encode('latin-1')) > > > > Actually, with the renamings and everything, it's just about as > > disruptive as the current proposal, so I'm unclear why you think this > > is so different. > > sock.send(b'Header: value\r\n') > ^ > The above change can be more or less automatic. The below? > > sock.send(bytes('Header: value\r\n', 'latin-1')) > > sock.send('Header: value\r\n'.encode('latin-1')) > > Either of the above is 17 characters of noise that really shouldn't need > to be there. If the spelling of a bytes string with an ASCII character value is all you are complaining about, you should have said so right away. IMO the hard part with automatically converting sock.send('abc') to either alternative is to know when when to convert and when not to convert; the conversion itself is trivial using the sandbox/2to3 refactoring tool. You really should have a look at that. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jcarlson at uci.edu Wed Feb 21 09:22:56 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Wed, 21 Feb 2007 00:22:56 -0800 Subject: [Python-3000] Thoughts on new I/O library and bytecode In-Reply-To: References: <20070220205651.ADD7.JCARLSON@uci.edu> Message-ID: <20070220231135.ADDD.JCARLSON@uci.edu> "Guido van Rossum" wrote: > On 2/20/07, Josiah Carlson wrote: > > Better than returning unicode, but not as good as returning "binary". > > It never was the plan to have this return unicode BTW. > > What's the difference between "binary" and "bytes"? To me, bytes *means* binary. Bytes as the type defined in PEP 358 and in the p3yk branch. Binary is a renamed Python 2.x str. > > Ooh, one of my favorite games! > > > > * Explicit is better than implicit. > > * In the face of ambiguity, refuse the temptation to guess > to use to decode the string>. > > * Errors should never pass silently. > > > > There are at least two approaches to solving the problem: > > 1) make everything unicode > > 2) make all implicit conversions an error. > > The plan is both. Indeed, but this train of thought was more or less along the lines of 'rename str to binary, rename unicode to text, make adding binary and text raise an exception'. > > Adding strings to unicode should produce an exception. The fact that it > > doesn't right now, I believe, is both a result of implementation details > > getting in the way of what should happen. > > No, it was by design to make things more compatible. I think we can > say that was a mistake; but it was done for that reason, not for > reasons of implementation details. Fair enough. I didn't start using unicode until Python 2.3. > > Remove the ambiguity, codec > > guessing, etc., raise a TypeError("cannot concatenate str and unicode > > objects"), and move on. > > > > Don't allow up-casting in u''.join() or ''.join() (or their equivalents > > in py3k). > > So what would you use the str type for? The bytes API as defined in PEP 358 is crap. Using that API for anything involving sockets, file IO, marshal/pickle, etc., is worse than writing in pure C. But I'll get into how happy I am with that later. > > No offense, but the plan to convert it all to use bytes, stinks. > > Starting with the API defined in PEP 358, I started converting smtpd (as > > an example), and I found myself *wanting* to use unicode because the > > whole numeric constants and/or bytes('unicode', 'latin-1') got really > > old really fast. > > Have you actually looked at the Py3k implementation? It's quite > different from that PEP. Really? The source tells me that it's more or less the same: http://svn.python.org/view/python/branches/p3yk/Objects/bytesobject.c?rev=53064&view=auto About the only thing it has gained is a .join() method, but seems to have lost append, count, extend, index, insert, pop, remove. From your later comments, it seems as though the methods I'm looking for just haven't been implemented yet, but are going in. > > While I would have some personal uses for bytes, all of them could be > > fulfilled with an expanded array type. > > Well, that's what it is, but without the baggage of being able how it > maps to Python objects (that's up to the encode/decode operations > instead). Except that bytes(...)[0] is an integer in range(256). That smells like array.array('B', ...) to me. > > If I could have my way > > I'd rename string and unicode, fold some of the features of > > bytes into array, and make socket, etc., return the renamed string > > type. > > But which of the two renamed string types? The 8-bit or the unicode string? 8-bit; unicode strings being returned from sockets, os.read(), etc., would be a waste of time and memory. > > In the case of the standard library that deal with > > sockets, the only changes would generally be a replacing of 'const' to > > b'const'. That could *almost* be automatic, and would be significantly > > faster (for a computer + human) than converting all of the .split(), > > .find(), etc., uses in the ftplib, *Server, smtplib, smtpd, etc. to > > bytes eqivalents (or converting to and from unicode). > > Actually, while they don't exist now, I plan for the bytes type to > have .split() and .find() and most other string methods *except* > .lower() and .islower() and everything else that interprets bytes as > characters. Thank Guido. If bytes gets those methods, then 30% of my concerns regarding the unicode conversion go out the window. > > It would take me perhaps 20 minutes to update asyncore, asynchat and > > smtpd with the b'binary' semantic. Based on the last list of methods I > > saw for bytes in PEP 358, I would be, more or less, doing bytes.decode > > ('latin-1') instead of trying to deal with the *crippled* interface that > > bytes offers. > > So forget that PEP and help adding these methods to the bytes type in > the p3yk branch. > > The b"..." literal proposal is not unpleasant, as long as we can limit > it to ASCII characters and hex/octal escapes. With a b"..." literal producing bytes (or even a renamed 8-bit string type), another 30% of my concerns regarding the unicode conversion go out the window. Limiting it to ascii and hex\octal escapes is perfectly reasonable to me, though I don't know enough about the underlying parser to know if such restrictions are possible, with or without a defined coding: directive at the beginning of the file. > > Regardless, the performance of those modules would likely suffer when > > confronted with bytes rather than a renamed str, as the current bytes > > type lacks a large number of convenience methods, that I previously > > complained about it not having (which is why I brought up the string > > view and sample implementation in late August/early September 2006). > > I think you misunderstood the plans for bytes. The plan is for the > performance with bytes to scream, in part because they are immutable > so one would occasionally save copying a buffer an extra time. ...mutable, but yeah - prior to your above statements saying 'we are going to add find, split, and a bunch of other goodies', I was under the impression that PEP 358 was more or less the API that we would be getting - which just about made me cry, until I remembered Python 2.x . > > > > 4) making string literals 'hello' be unicode > > > > 5) allow for b'constant' be the renamed str > > > > 6) add a mandatory 3rd argument to file/open which is the codec to use > > > > for reading > > > > > > And how does that help users or compatibility? > > > > Users who need binary literals (like every socket module in the standard > > library, anyone who does processing of any non-unicode disk/socket/pipe > > data, like marshal or pickle, etc.) wouldn't go insane and add bugs > > trying to switch to the bytes type, or add performance overhead trying > > to convert the received bytes to unicode to get a useful API. > > Let's drop the hyperbole. If bytes didn't get .find(), .split(), (hopefully .partition()), etc., that isn't hyperbole. The PEP 358 API is horrible. With bytes getting those methods, the above statements are no longer relevant. > > Presumption: > > a = open(filename, 'r' or 'w' ['+'], codec) > > will open a file as unicode in Py3k (if I am wrong, please correct me). > > Right. > > > Proposal: > > b = somename(filename, 'r' or 'w' ['+']) > > will be equivalent to: > > b = open(filename, 'rb' or 'wb' ['+']) > > today. This prevents the confusion over different argument values > > resulting in different types being returned and accepted by certain > > methods. > > Possibly. Though if we keep the 'rb' semantics for open() and this is > just an alias, I'm not sure what we gain except Two Ways To Do It. Well, if we moved bytes reading/writing off to the alternate constructor, then there would be one way to open a file containing unicode, and another way to open a file containing binary data, which by definition isn't text, so we should be able to ignore '\r\n' conversions (though I would miss it, it may be a good idea). > In your view, what *do* we gain by using separate factories for binary > and text files? (Except some opportunity for static typechecking, as > binary files don't have the same API!) At one time there was a fairly substantial argument over foo(a, b) returning different types if the *value* of b changed, or in the case of a.foo(b). For example... def decode_codec(a, b): return a.decode(b) decode_codec('68656c6c6f20776f726c64', 'hex') -> 'hello world' decode_codec('hello world', 'latin-1') -> u'hello world' By offering a secondary function that *only* dealt with the reading and writing of bytes (or 8-bit renamed str), then we wouldn't have to worry about... open(filename, 'r', 'latin-1').read() open(filename, 'r').read() returning different types. The latter would be spelled... somename(filename, 'r') And it would be obvious to all readers that one is opening a binary file and should expect to have .read() return bytes. > > sock.send(b'Header: value\r\n') > > ^ > > The above change can be more or less automatic. The below? > > > > sock.send(bytes('Header: value\r\n', 'latin-1')) > > > > sock.send('Header: value\r\n'.encode('latin-1')) > > > > Either of the above is 17 characters of noise that really shouldn't need > > to be there. > > If the spelling of a bytes string with an ASCII character value is all > you are complaining about, you should have said so right away. Not just bytes with ascii character values, but not needing to jump through hoops to send, write, etc., more or less 'fixed' data to a handle. > IMO the hard part with automatically converting sock.send('abc') to > either alternative is to know when when to convert and when not to > convert; the conversion itself is trivial using the sandbox/2to3 > refactoring tool. You really should have a look at that. In a few weeks when I'm done with my thesis defense. If one adds my "concerns are reduced by X%" statements above, one will notice that it only adds to 60%. The remaining 40% of my concerns are more or less related to the pain of conversion. b"..." and a usable bytes API do help things significantly, but all conversions are a pain, especially with a standard library the size of Python's. A pessimist would say, "leave everything as it is, but make str+unicode raise an exception" - and aside from pointing to the half-dozen "you are so wrong" posts in response to my "unicode is easy" claim some time last year, it would be hard for me to disagree with the "don't change" position. I'm sure I can get along *after* the changes, but the changes aren't going to be pleasant. Speaking of which, do all of the modules have maintainers? Make the maintainers convert them! - Josiah From ncoghlan at gmail.com Wed Feb 21 13:41:15 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 21 Feb 2007 22:41:15 +1000 Subject: [Python-3000] Thoughts on dictionary views In-Reply-To: References: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1> <45DAF693.1000004@gmail.com> Message-ID: <45DC3DEB.1060702@gmail.com> Guido van Rossum wrote: > On 2/20/07, Nick Coghlan wrote: >> My apologies for rambling a bit - I can't currently give a succinct >> explanation for why the current direction feels wrong, but I felt it was >> worth supporting Raymond on this point. > > Apologies accepted -- but yes, you did ramble a bit, and I still wish > you'd collected your thoughts a bit more. if there are simple clear > arguments it's easier for me to accept or reject them than with a > bunch of ramblings. Sorry to be grumpy, but given the implementation > stage this is in and how long the PEP has been sitting unchanged I'm a > bit annoyed that the criticism, valid or not, comes so late. Views that don't allow you to modify the contents of the original mapping make me *much* happier - I hadn't realised you'd left that aspect out of the implementation. Simply having different views of the underlying object is something with a strong precedent in normal iterators (and, in fact, it would be perfectly possible to *teach* keys(), values() and items() that way, leaving the introduction of their other features until later in the learning process). Using multiple access points to edit the same data set, while a powerful idea, can be pretty difficult to keep straight while writing code - and I think having such a feature in the basic dict API is what was really bothering me. (It bothers me significantly less in more advanced API's, like NumPy, or the attrview wrapper class recipe) As penance for my doubts, I've committed fixes for various dict-related test failures in the py3k branch :) Cheers, Nick. P.S. I don't have bsddb in my devel tree, so I couldn't fix that, and there are a couple of other failures that require further investigation to figure out what is going on. I updated the BROKEN file to reflect the current status (that seems like a good way to avoid cluttering the SF tracker until we get the tree into a state where we want to start running buildbots on it) -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From guido at python.org Wed Feb 21 17:01:40 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 21 Feb 2007 08:01:40 -0800 Subject: [Python-3000] Thoughts on dictionary views In-Reply-To: <45DC3DEB.1060702@gmail.com> References: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1> <45DAF693.1000004@gmail.com> <45DC3DEB.1060702@gmail.com> Message-ID: On 2/21/07, Nick Coghlan wrote: > As penance for my doubts, I've committed fixes for various dict-related > test failures in the py3k branch :) Thanks!!! That was well beyond penance. :-) > Cheers, > Nick. > > P.S. I don't have bsddb in my devel tree, so I couldn't fix that, and > there are a couple of other failures that require further investigation > to figure out what is going on. I updated the BROKEN file to reflect the > current status (that seems like a good way to avoid cluttering the SF > tracker until we get the tree into a state where we want to start > running buildbots on it) Right -- thanks for this too. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jimjjewett at gmail.com Wed Feb 21 19:03:10 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Wed, 21 Feb 2007 13:03:10 -0500 Subject: [Python-3000] Thoughts on new I/O library and bytecode In-Reply-To: References: <20070220205651.ADD7.JCARLSON@uci.edu> Message-ID: On 2/21/07, Guido van Rossum wrote: > If the spelling of a bytes string with an ASCII character value is all > you are complaining about, you should have said so right away. That is my main objection. A literal form does clear it up, though I'm not sure "b" is the right prefix. (I keep wanting to read "binary" or "boolean", rather than "ASCII") To be honest, it would probably be enough if there were an ascii builtin, or if the example uses of the bytes constructor showed bytes(text) # no encoding just copying the low-order byte, and raising exceptions if any high-order bytes were non-zero. -jJ From jimjjewett at gmail.com Wed Feb 21 19:11:17 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Wed, 21 Feb 2007 13:11:17 -0500 Subject: [Python-3000] Thoughts on new I/O library and bytecode In-Reply-To: References: <20070220205651.ADD7.JCARLSON@uci.edu> Message-ID: Are bytes supposed to be mutable? Josiah: > even be changed to *take* a bytes object as the destination buffer Guido: > This already works -- bytes support the buffer API. but later: > I think you misunderstood the plans for bytes. The plan is for the > performance with bytes to scream, in part because they are immutable > so one would occasionally save copying a buffer an extra time. Or did you mean that (C code only?) could pass a newly constructed bytes object to be filled in? Josiah mentioned several dropped methods (append, extend, remove, pop) that don't really make sense with an immutable. Was this just a set-difference observation, or are those methods you actually need on a bytes type? -jJ From guido at python.org Wed Feb 21 19:13:34 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 21 Feb 2007 10:13:34 -0800 Subject: [Python-3000] Thoughts on new I/O library and bytecode In-Reply-To: References: <20070220205651.ADD7.JCARLSON@uci.edu> Message-ID: Sorry, that was an unfortunate typo. bytes are Mutable. (It's the same as in Java, really.) On 2/21/07, Jim Jewett wrote: > Are bytes supposed to be mutable? > > Josiah: > > even be changed to *take* a bytes object as the destination buffer > > Guido: > > This already works -- bytes support the buffer API. > > but later: > > > I think you misunderstood the plans for bytes. The plan is for the > > performance with bytes to scream, in part because they are immutable > > so one would occasionally save copying a buffer an extra time. > > Or did you mean that (C code only?) could pass a newly constructed > bytes object to be filled in? > > Josiah mentioned several dropped methods (append, extend, remove, pop) > that don't really make sense with an immutable. Was this just a > set-difference observation, or are those methods you actually need on > a bytes type? Even though bytes are mutable sequences, I'm not sure that they need to support every method that lists have. I expect an in-place += operator solves most needs. Slices copy. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jcarlson at uci.edu Wed Feb 21 19:32:50 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Wed, 21 Feb 2007 10:32:50 -0800 Subject: [Python-3000] Thoughts on new I/O library and bytecode In-Reply-To: References: Message-ID: <20070221102243.ADF1.JCARLSON@uci.edu> "Jim Jewett" wrote: > > On 2/21/07, Guido van Rossum wrote: > > If the spelling of a bytes string with an ASCII character value is all > > you are complaining about, you should have said so right away. > > That is my main objection. > > A literal form does clear it up, though I'm not sure "b" is the right > prefix. (I keep wanting to read "binary" or "boolean", rather than > "ASCII") > > To be honest, it would probably be enough if there were an ascii > builtin, or if the example uses of the bytes constructor showed > > bytes(text) # no encoding > > just copying the low-order byte, and raising exceptions if any > high-order bytes were non-zero. That's more or less changing the signature of bytes to be bytes(, codec='ascii'), but it breaks when faced with hex or octal escapes greater than 127. Making it codec='latin-1' is marginally better, but having a default, regardless of the default, is begging for trouble (especially when dealing with unicode). - Josiah From guido at python.org Wed Feb 21 20:22:50 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 21 Feb 2007 11:22:50 -0800 Subject: [Python-3000] Thoughts on new I/O library and bytecode In-Reply-To: <20070221102243.ADF1.JCARLSON@uci.edu> References: <20070221102243.ADF1.JCARLSON@uci.edu> Message-ID: Right. The b"..." literal doesn't have this problem because problems always show up in the bytecode compilation stage; that's the beauty of b"...". Patch anyone? On 2/21/07, Josiah Carlson wrote: > > "Jim Jewett" wrote: > > > > On 2/21/07, Guido van Rossum wrote: > > > If the spelling of a bytes string with an ASCII character value is all > > > you are complaining about, you should have said so right away. > > > > That is my main objection. > > > > A literal form does clear it up, though I'm not sure "b" is the right > > prefix. (I keep wanting to read "binary" or "boolean", rather than > > "ASCII") > > > > To be honest, it would probably be enough if there were an ascii > > builtin, or if the example uses of the bytes constructor showed > > > > bytes(text) # no encoding > > > > just copying the low-order byte, and raising exceptions if any > > high-order bytes were non-zero. > > That's more or less changing the signature of bytes to be bytes(, > codec='ascii'), but it breaks when faced with hex or octal escapes > greater than 127. Making it codec='latin-1' is marginally better, but > having a default, regardless of the default, is begging for trouble > (especially when dealing with unicode). > > - Josiah > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg.ewing at canterbury.ac.nz Thu Feb 22 01:21:23 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 22 Feb 2007 13:21:23 +1300 Subject: [Python-3000] Thoughts on new I/O library and bytecode In-Reply-To: References: <20070220205651.ADD7.JCARLSON@uci.edu> Message-ID: <45DCE203.2010804@canterbury.ac.nz> Jim Jewett wrote: > A literal form does clear it up, though I'm not sure "b" is the right > prefix. (I keep wanting to read "binary" or "boolean", rather than > "ASCII") It means "bytes". The ASCII part is that you've written characters in quotes after it. -- Greg From guido at python.org Fri Feb 23 01:01:42 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 22 Feb 2007 16:01:42 -0800 Subject: [Python-3000] Thoughts on new I/O library and bytecode In-Reply-To: <45DCE203.2010804@canterbury.ac.nz> References: <20070220205651.ADD7.JCARLSON@uci.edu> <45DCE203.2010804@canterbury.ac.nz> Message-ID: FWIW, I've updated PEP 358 (the bytes object) to more closely reflect my plans for it, showing the preservation of most string methods. It should be updated on the website in a few minutes. If someone would like to volunteer a small PEP on the b"..." literal I would appreciate it. The main concern here is that bytes objects are mutable; I think the right semantics will be that each time a b"..." literal is evaluated a *new* bytes object is created, just like [1, 2, 3] constructs a new list each time it is evaluated. The alternative would be a literal that could be modified in place, which reminds me of the worst of Fortran. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From tjreedy at udel.edu Fri Feb 23 02:11:36 2007 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 22 Feb 2007 20:11:36 -0500 Subject: [Python-3000] Thoughts on new I/O library and bytecode References: <20070220205651.ADD7.JCARLSON@uci.edu><45DCE203.2010804@canterbury.ac.nz> Message-ID: "Guido van Rossum" wrote in message news:ca471dc20702221601s703a9fe4i8fc69810b8fb8da6 at mail.gmail.com... | FWIW, I've updated PEP 358 (the bytes object) to more closely reflect | my plans for it, showing the preservation of most string methods. It | should be updated on the website in a few minutes. | | If someone would like to volunteer a small PEP on the b"..." literal I | would appreciate it. The main concern here is that bytes objects are | mutable; I think the right semantics will be that each time a b"..." | literal is evaluated a *new* bytes object is created, just like [1, 2, | 3] constructs a new list each time it is evaluated. I always thought that aliasing of immutable objects was an implementation-dependent optimization, so that seems right. Certainly, a=[] for i in range(3): a.append(b'bytes') had better append three separate objects. Someone who wants just one can write a = 3*[b'bytes'] Is a separate PEP really needed, rather that a few lines in PEP358? tjr From jason.orendorff at gmail.com Fri Feb 23 17:29:02 2007 From: jason.orendorff at gmail.com (Jason Orendorff) Date: Fri, 23 Feb 2007 11:29:02 -0500 Subject: [Python-3000] Thoughts on new I/O library and bytecode In-Reply-To: References: <20070220205651.ADD7.JCARLSON@uci.edu> <45DCE203.2010804@canterbury.ac.nz> Message-ID: On 2/22/07, Guido van Rossum wrote: > If someone would like to volunteer a small PEP on the b"..." literal I > would appreciate it. I'll do this, unless someone tells me not to. A few questions. The grammar for string literals is already changing in py3k (removing the tolerance of bogus escape sequences and the u"" prefix, I think). Is the new grammar documented anywhere? p3yk/Doc/ref/ref2.tex seems to still have the 2.x grammar, and I didn't see anything in the PEPs. How do you feel about raw byte-strings (br'a\b\c') and long byte-strings (b'''...''')? > The main concern here is that bytes objects are > mutable; I think the right semantics will be that each time a b"..." > literal is evaluated a *new* bytes object is created, just like [1, 2, > 3] constructs a new list each time it is evaluated. The alternative > would be a literal that could be modified in place, which reminds me > of the worst of Fortran. Yes, that seems clear. -j From bwinton at latte.ca Fri Feb 23 17:48:58 2007 From: bwinton at latte.ca (Blake Winton) Date: Fri, 23 Feb 2007 11:48:58 -0500 Subject: [Python-3000] Thoughts on new I/O library and bytecode In-Reply-To: References: <20070220205651.ADD7.JCARLSON@uci.edu> <45DCE203.2010804@canterbury.ac.nz> Message-ID: <45DF1AFA.2090909@latte.ca> Jason Orendorff wrote: > On 2/22/07, Guido van Rossum wrote: >> If someone would like to volunteer a small PEP on the b"..." literal I >> would appreciate it. > How do you feel about raw byte-strings (br'a\b\c') Not that my opinion particularly matters, but I would say "sure" to this one. On the other hand, I really don't use raw strings that often, and the places I do are pretty much solely regexes, which shouldn't really be passed bytes. > and long byte-strings (b'''...''')? What would: b"""abc def""" translate into, exactly? [ 97, 98, 99, 10, 100, 101, 102 ]? [ 97, 98, 99, 13, 10, 100, 101, 102 ]? Platform-dependent? (Ewwww!) Later, Blake. From g.brandl at gmx.net Fri Feb 23 18:09:20 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 23 Feb 2007 18:09:20 +0100 Subject: [Python-3000] Thoughts on new I/O library and bytecode In-Reply-To: <45DF1AFA.2090909@latte.ca> References: <20070220205651.ADD7.JCARLSON@uci.edu> <45DCE203.2010804@canterbury.ac.nz> <45DF1AFA.2090909@latte.ca> Message-ID: Blake Winton schrieb: > Jason Orendorff wrote: >> On 2/22/07, Guido van Rossum wrote: >>> If someone would like to volunteer a small PEP on the b"..." literal I >>> would appreciate it. >> How do you feel about raw byte-strings (br'a\b\c') > > Not that my opinion particularly matters, but I would say "sure" to this > one. On the other hand, I really don't use raw strings that often, and > the places I do are pretty much solely regexes, which shouldn't really > be passed bytes. > > > and long byte-strings (b'''...''')? > > What would: > b"""abc > def""" > translate into, exactly? > [ 97, 98, 99, 10, 100, 101, 102 ]? > [ 97, 98, 99, 13, 10, 100, 101, 102 ]? > Platform-dependent? (Ewwww!) The same that """abc def""" translates to today, which is, "abc\ndef" on every platform. Georg From thomas at python.org Fri Feb 23 18:17:08 2007 From: thomas at python.org (Thomas Wouters) Date: Fri, 23 Feb 2007 09:17:08 -0800 Subject: [Python-3000] Thoughts on new I/O library and bytecode In-Reply-To: References: <20070220205651.ADD7.JCARLSON@uci.edu> <45DCE203.2010804@canterbury.ac.nz> Message-ID: <9e804ac0702230917j2112c083ld49704657365d77e@mail.gmail.com> I'm not telling you not to do this, but I already wrote a preliminary patch (well, it's not actually *working* yet, but the hard part, the grammar changes, are working ;) Of course, it may be fun to compare implementations. On 2/23/07, Jason Orendorff wrote: > > On 2/22/07, Guido van Rossum wrote: > > If someone would like to volunteer a small PEP on the b"..." literal I > > would appreciate it. > > I'll do this, unless someone tells me not to. A few questions. > > The grammar for string literals is already changing in py3k (removing > the tolerance of bogus escape sequences and the u"" prefix, I think). > Is the new grammar documented anywhere? p3yk/Doc/ref/ref2.tex seems > to still have the 2.x grammar, and I didn't see anything in the PEPs. > > How do you feel about raw byte-strings (br'a\b\c') and long > byte-strings (b'''...''')? > > > The main concern here is that bytes objects are > > mutable; I think the right semantics will be that each time a b"..." > > literal is evaluated a *new* bytes object is created, just like [1, 2, > > 3] constructs a new list each time it is evaluated. The alternative > > would be a literal that could be modified in place, which reminds me > > of the worst of Fortran. > > Yes, that seems clear. > > -j > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/thomas%40python.org > -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070223/5d7a36aa/attachment.html From brett at python.org Fri Feb 23 18:16:52 2007 From: brett at python.org (Brett Cannon) Date: Fri, 23 Feb 2007 09:16:52 -0800 Subject: [Python-3000] Thoughts on new I/O library and bytecode In-Reply-To: References: <20070220205651.ADD7.JCARLSON@uci.edu> <45DCE203.2010804@canterbury.ac.nz> Message-ID: On 2/23/07, Jason Orendorff wrote: > On 2/22/07, Guido van Rossum wrote: > > If someone would like to volunteer a small PEP on the b"..." literal I > > would appreciate it. > > I'll do this, unless someone tells me not to. A few questions. > Thomas Wouters has been working on it while here at PyCon. I would wait to see what he says before you dive into it. -Brett > The grammar for string literals is already changing in py3k (removing > the tolerance of bogus escape sequences and the u"" prefix, I think). > Is the new grammar documented anywhere? p3yk/Doc/ref/ref2.tex seems > to still have the 2.x grammar, and I didn't see anything in the PEPs. > > How do you feel about raw byte-strings (br'a\b\c') and long > byte-strings (b'''...''')? > > > The main concern here is that bytes objects are > > mutable; I think the right semantics will be that each time a b"..." > > literal is evaluated a *new* bytes object is created, just like [1, 2, > > 3] constructs a new list each time it is evaluated. The alternative > > would be a literal that could be modified in place, which reminds me > > of the worst of Fortran. > > Yes, that seems clear. > > -j > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/brett%40python.org > From thomas at python.org Fri Feb 23 19:02:53 2007 From: thomas at python.org (Thomas Wouters) Date: Fri, 23 Feb 2007 10:02:53 -0800 Subject: [Python-3000] Thoughts on new I/O library and bytecode In-Reply-To: References: <20070221102243.ADF1.JCARLSON@uci.edu> Message-ID: <9e804ac0702231002p58f99b7bg557116c3168de633@mail.gmail.com> On 2/21/07, Guido van Rossum wrote: > > Patch anyone? See attachement. It's preliminary -- it just calls the global name 'bytes' currently (and not even using the 'right' AST concretion mechanism) which means you can override what the bytes literal creates by assigning to 'bytes' (although I'm sure there's people out there that would love to keep it that way ;-P) It should probably get its own bytecode (no pun intended.) On 2/21/07, Josiah Carlson wrote: > > > > "Jim Jewett" wrote: > > > > > > On 2/21/07, Guido van Rossum wrote: > > > > If the spelling of a bytes string with an ASCII character value is > all > > > > you are complaining about, you should have said so right away. > > > > > > That is my main objection. > > > > > > A literal form does clear it up, though I'm not sure "b" is the right > > > prefix. (I keep wanting to read "binary" or "boolean", rather than > > > "ASCII") > > > > > > To be honest, it would probably be enough if there were an ascii > > > builtin, or if the example uses of the bytes constructor showed > > > > > > bytes(text) # no encoding > > > > > > just copying the low-order byte, and raising exceptions if any > > > high-order bytes were non-zero. > > > > That's more or less changing the signature of bytes to be bytes(, > > codec='ascii'), but it breaks when faced with hex or octal escapes > > greater than 127. Making it codec='latin-1' is marginally better, but > > having a default, regardless of the default, is begging for trouble > > (especially when dealing with unicode). > > > > - Josiah > > > > _______________________________________________ > > Python-3000 mailing list > > Python-3000 at python.org > > http://mail.python.org/mailman/listinfo/python-3000 > > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/guido%40python.org > > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/thomas%40python.org > -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070223/4a12c812/attachment-0001.htm -------------- next part -------------- A non-text attachment was scrubbed... Name: bytesliteral.diff Type: text/x-patch Size: 7011 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070223/4a12c812/attachment-0001.bin From jason.orendorff at gmail.com Fri Feb 23 19:49:15 2007 From: jason.orendorff at gmail.com (Jason Orendorff) Date: Fri, 23 Feb 2007 13:49:15 -0500 Subject: [Python-3000] Thoughts on new I/O library and bytecode In-Reply-To: <9e804ac0702231002p58f99b7bg557116c3168de633@mail.gmail.com> References: <20070221102243.ADF1.JCARLSON@uci.edu> <9e804ac0702231002p58f99b7bg557116c3168de633@mail.gmail.com> Message-ID: On 2/23/07, Thomas Wouters wrote: > On 2/21/07, Guido van Rossum wrote: > > Patch anyone? > > See attachement. It's preliminary -- it just calls the global name 'bytes' > currently (and not even using the 'right' AST concretion mechanism) which > means you can override what the bytes literal creates by assigning to > 'bytes' (although I'm sure there's people out there that would love to keep > it that way ;-P) It should probably get its own bytecode (no pun intended.) Cool! I finished writing up the PEP about the same time I got this, but the PEP isn't executable. :) I would attach it, but I have a feeling the PEP is probably unnecessary at this point...? -j From g.brandl at gmx.net Fri Feb 23 22:22:16 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 23 Feb 2007 22:22:16 +0100 Subject: [Python-3000] Thoughts on new I/O library and bytecode In-Reply-To: References: <20070221102243.ADF1.JCARLSON@uci.edu> <9e804ac0702231002p58f99b7bg557116c3168de633@mail.gmail.com> Message-ID: Jason Orendorff schrieb: > On 2/23/07, Thomas Wouters wrote: >> On 2/21/07, Guido van Rossum wrote: >> > Patch anyone? >> >> See attachement. It's preliminary -- it just calls the global name 'bytes' >> currently (and not even using the 'right' AST concretion mechanism) which >> means you can override what the bytes literal creates by assigning to >> 'bytes' (although I'm sure there's people out there that would love to keep >> it that way ;-P) It should probably get its own bytecode (no pun intended.) > > Cool! I finished writing up the PEP about the same time I got this, > but the PEP isn't executable. :) I would attach it, but I have a > feeling the PEP is probably unnecessary at this point...? Not really - I wrote one for the print function too, when most of the semantics were already fixed - but I think this could be added to the existing bytes PEP, as a new section. Georg From greg.ewing at canterbury.ac.nz Sat Feb 24 00:40:10 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 24 Feb 2007 12:40:10 +1300 Subject: [Python-3000] Thoughts on new I/O library and bytecode In-Reply-To: <45DF1AFA.2090909@latte.ca> References: <20070220205651.ADD7.JCARLSON@uci.edu> <45DCE203.2010804@canterbury.ac.nz> <45DF1AFA.2090909@latte.ca> Message-ID: <45DF7B5A.7080503@canterbury.ac.nz> Blake Winton wrote: > What would: > b"""abc > def""" > translate into, exactly? > [ 97, 98, 99, 10, 100, 101, 102 ]? > [ 97, 98, 99, 13, 10, 100, 101, 102 ]? > Platform-dependent? (Ewwww!) No, presumably it would always translate the newline into "\n" regardless of platform, as with current strings. -- Greg From thomas at python.org Sat Feb 24 01:09:20 2007 From: thomas at python.org (Thomas Wouters) Date: Fri, 23 Feb 2007 16:09:20 -0800 Subject: [Python-3000] Thoughts on new I/O library and bytecode In-Reply-To: <45DF7B5A.7080503@canterbury.ac.nz> References: <20070220205651.ADD7.JCARLSON@uci.edu> <45DCE203.2010804@canterbury.ac.nz> <45DF1AFA.2090909@latte.ca> <45DF7B5A.7080503@canterbury.ac.nz> Message-ID: <9e804ac0702231609t3f463c60xa8bee6e736c3bb@mail.gmail.com> That's exactly what it does in current p3yk: Python 3.0x (p3yk:53867M, Feb 23 2007, 20:06:03) [GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> b"""abc ... def""" bytes([0x61, 0x62, 0x63, 0x0a, 0x64, 0x65, 0x66]) On 2/23/07, Greg Ewing wrote: > > Blake Winton wrote: > > > What would: > > b"""abc > > def""" > > translate into, exactly? > > [ 97, 98, 99, 10, 100, 101, 102 ]? > > [ 97, 98, 99, 13, 10, 100, 101, 102 ]? > > Platform-dependent? (Ewwww!) > > No, presumably it would always translate the newline > into "\n" regardless of platform, as with current strings. > > -- > Greg > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/thomas%40python.org > -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070223/653a8b6b/attachment.html From g.brandl at gmx.net Sat Feb 24 11:20:46 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 24 Feb 2007 11:20:46 +0100 Subject: [Python-3000] Thoughts on new I/O library and bytecode In-Reply-To: <9e804ac0702231609t3f463c60xa8bee6e736c3bb@mail.gmail.com> References: <20070220205651.ADD7.JCARLSON@uci.edu> <45DCE203.2010804@canterbury.ac.nz> <45DF1AFA.2090909@latte.ca> <45DF7B5A.7080503@canterbury.ac.nz> <9e804ac0702231609t3f463c60xa8bee6e736c3bb@mail.gmail.com> Message-ID: Thomas Wouters schrieb: > > That's exactly what it does in current p3yk: > > Python 3.0x (p3yk:53867M, Feb 23 2007, 20:06:03) > [GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> b"""abc > ... def""" > bytes([0x61, 0x62, 0x63, 0x0a, 0x64, 0x65, 0x66]) Seeing that, I made a patch that makes bytes_repr output a bytes literal, see attached diff. Happy PyCon-ing, Georg -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: bytes-repr.diff Url: http://mail.python.org/pipermail/python-3000/attachments/20070224/cac06f37/attachment.diff From rasky at develer.com Sat Feb 24 12:21:04 2007 From: rasky at develer.com (Giovanni Bajo) Date: Sat, 24 Feb 2007 12:21:04 +0100 Subject: [Python-3000] Thoughts on new I/O library and bytecode In-Reply-To: References: <20070220205651.ADD7.JCARLSON@uci.edu> <45DCE203.2010804@canterbury.ac.nz> <45DF1AFA.2090909@latte.ca> <45DF7B5A.7080503@canterbury.ac.nz> <9e804ac0702231609t3f463c60xa8bee6e736c3bb@mail.gmail.com> Message-ID: On 24/02/2007 11.20, Georg Brandl wrote: > Thomas Wouters schrieb: >> >> That's exactly what it does in current p3yk: >> >> Python 3.0x (p3yk:53867M, Feb 23 2007, 20:06:03) >> [GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2 >> Type "help", "copyright", "credits" or "license" for more information. >> >>> b"""abc >> ... def""" >> bytes([0x61, 0x62, 0x63, 0x0a, 0x64, 0x65, 0x66]) > > Seeing that, I made a patch that makes bytes_repr output a bytes literal, > see attached diff. I thought that the repr format of bytes was a deliberate choice to make life harder to people trying to use bytes to handle text. -- Giovanni Bajo From g.brandl at gmx.net Sat Feb 24 13:50:23 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 24 Feb 2007 13:50:23 +0100 Subject: [Python-3000] Thoughts on new I/O library and bytecode In-Reply-To: References: <20070220205651.ADD7.JCARLSON@uci.edu> <45DCE203.2010804@canterbury.ac.nz> <45DF1AFA.2090909@latte.ca> <45DF7B5A.7080503@canterbury.ac.nz> <9e804ac0702231609t3f463c60xa8bee6e736c3bb@mail.gmail.com> Message-ID: Giovanni Bajo schrieb: > On 24/02/2007 11.20, Georg Brandl wrote: > >> Thomas Wouters schrieb: >>> >>> That's exactly what it does in current p3yk: >>> >>> Python 3.0x (p3yk:53867M, Feb 23 2007, 20:06:03) >>> [GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2 >>> Type "help", "copyright", "credits" or "license" for more information. >>> >>> b"""abc >>> ... def""" >>> bytes([0x61, 0x62, 0x63, 0x0a, 0x64, 0x65, 0x66]) >> >> Seeing that, I made a patch that makes bytes_repr output a bytes literal, >> see attached diff. > > I thought that the repr format of bytes was a deliberate choice to make life > harder to people trying to use bytes to handle text. That contradicts the "consenting adults" mantra. If a bytes object contains readable text (and that's not going to be exceptional), it should not be obscured -- in any case, I can just call str() on it and get my text. PEP 358 now states "Now that a b"..." literal exists, shouldn't repr() return one?" which suggests that the repr was the most canonical way to represent the bytes object at a time when there was no literal. Georg From guido at python.org Sat Feb 24 15:25:59 2007 From: guido at python.org (Guido van Rossum) Date: Sat, 24 Feb 2007 08:25:59 -0600 Subject: [Python-3000] Thoughts on new I/O library and bytecode In-Reply-To: References: <45DCE203.2010804@canterbury.ac.nz> <45DF1AFA.2090909@latte.ca> <45DF7B5A.7080503@canterbury.ac.nz> <9e804ac0702231609t3f463c60xa8bee6e736c3bb@mail.gmail.com> Message-ID: Georg is channeling me well. Also, my thinking has evolved some after talking to various folks here at PyCon. Georg, please check it in! Feel free to update the PEP if you will. --Guido On 2/24/07, Georg Brandl wrote: > Giovanni Bajo schrieb: > > On 24/02/2007 11.20, Georg Brandl wrote: > > > >> Thomas Wouters schrieb: > >>> > >>> That's exactly what it does in current p3yk: > >>> > >>> Python 3.0x (p3yk:53867M, Feb 23 2007, 20:06:03) > >>> [GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2 > >>> Type "help", "copyright", "credits" or "license" for more information. > >>> >>> b"""abc > >>> ... def""" > >>> bytes([0x61, 0x62, 0x63, 0x0a, 0x64, 0x65, 0x66]) > >> > >> Seeing that, I made a patch that makes bytes_repr output a bytes literal, > >> see attached diff. > > > > I thought that the repr format of bytes was a deliberate choice to make life > > harder to people trying to use bytes to handle text. > > That contradicts the "consenting adults" mantra. If a bytes object contains > readable text (and that's not going to be exceptional), it should not be > obscured -- in any case, I can just call str() on it and get my text. > > PEP 358 now states "Now that a b"..." literal exists, shouldn't repr() > return one?" which suggests that the repr was the most canonical way > to represent the bytes object at a time when there was no literal. > > Georg > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From python-dev at zesty.ca Sat Feb 24 20:07:54 2007 From: python-dev at zesty.ca (Ka-Ping Yee) Date: Sat, 24 Feb 2007 13:07:54 -0600 (CST) Subject: [Python-3000] Bytes <-> string conversion methods Message-ID: Hi Guido, I'm in your keynote and looking at a slide right now that says * bytes has .encode() method returning a string * str has a .decode() method returning bytes Should the names of those two methods be swapped? I think it makes more sense to say that an encoding is something that transforms a string into a sequence of bytes. -- ?!ng From g.brandl at gmx.net Sat Feb 24 20:44:46 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 24 Feb 2007 20:44:46 +0100 Subject: [Python-3000] Thoughts on new I/O library and bytecode In-Reply-To: References: <45DCE203.2010804@canterbury.ac.nz> <45DF1AFA.2090909@latte.ca> <45DF7B5A.7080503@canterbury.ac.nz> <9e804ac0702231609t3f463c60xa8bee6e736c3bb@mail.gmail.com> Message-ID: Guido van Rossum schrieb: > Georg is channeling me well. Also, my thinking has evolved some after > talking to various folks here at PyCon. > > Georg, please check it in! Feel free to update the PEP if you will. I will, if you answer me one question: in Python 2.6, should the repr() return "bytes()" or still "bytes()"? Georg From collinw at gmail.com Sat Feb 24 22:27:43 2007 From: collinw at gmail.com (Collin Winter) Date: Sat, 24 Feb 2007 15:27:43 -0600 Subject: [Python-3000] Transition to Python 3's raise syntax Message-ID: <43aa6ff70702241327m67a70812odd414ad2c0428db2@mail.gmail.com> (Finally getting back around to this) On 2/9/07, Phillip J. Eby wrote: [snip] > Hm. Actually, that's not necessary. We could include .with_traceback(T) > in 2.6, and just have old-style except: clauses delete the traceback from > the returned objects. New-style except: clauses would work just as they > would in 3.0. What do you mean by "new-style" and "old-style except: clauses"? Are "new-style" except clauses the ones spelled "except E as NAME" while "old-style" ones are spelled "except E, NAME"? > To summarize, in 2.6 we could support .with_traceback() and create > exception instances with traceback attributes, but the old-style except: > clauses could discard them to prevent cycles. Clear enough. > Raising an exception > instance with a __traceback__ attribute would get some special handling so > that it's equivalent to 3-argument raise in today's Python. Likewise, > generator.throw() would need the same special handling in 2.6. What happens in this case: e = Exception() e.__traceback__ = T1 raise Exception, e, T2 Which traceback takes precedence? My preference would be to raise an exception in this case. Collin Winter From pje at telecommunity.com Sat Feb 24 23:23:53 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Sat, 24 Feb 2007 17:23:53 -0500 Subject: [Python-3000] Transition to Python 3's raise syntax In-Reply-To: <43aa6ff70702241327m67a70812odd414ad2c0428db2@mail.gmail.co m> Message-ID: <5.1.1.6.0.20070224172003.01be64d8@sparrow.telecommunity.com> At 03:27 PM 2/24/2007 -0600, Collin Winter wrote: >Are "new-style" except clauses the ones spelled "except E as NAME" while >"old-style" ones are spelled "except E, NAME"? Yes. >What happens in this case: > >e = Exception() >e.__traceback__ = T1 >raise Exception, e, T2 > >Which traceback takes precedence? My preference would be to raise an >exception in this case. Hm. How would you get that case in normal code? I guess if you had a new-style except: that then used a 3-argument raise, you could end up with that. I'm not sure if that's really a problem though. In 2.6, we'll still be using the old exception machinery, so the raise will "do the right thing", it's just that the exception instance will have a redundant __traceback__. I guess, if anything, my inclination would be to have the three-argument "raise" delete e.__traceback__. T2 will get put on it if it's caught by a new-style except: clause. From oliphant.travis at ieee.org Sat Feb 24 23:47:12 2007 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Sat, 24 Feb 2007 15:47:12 -0700 Subject: [Python-3000] Pre-PEP: Altering buffer protocol (tp_as_buffer) In-Reply-To: References: Message-ID: Hi everybody, It was great to see so many of you at PyCon --- even if we saw each other for too long during the PSF meeting. After hearing Guido's keynote talk today and realizing that the alpha release of Python 3.0 is so soon, I decided that the right approach I should take in pushing the array protocol/interface is to actually propose it being used as a replacement/enhancement of the buffer protocol for Python 3.0 I will write a PEP and the implementation but I would like to start a discussion about what concerns or issues developers have with a proposal like this and try to weed most of these out. I've started a Wiki page where we can document the issues that are raised. Perhaps this Wiki can become the PEP within a few weeks. The Wiki is at http://wiki.python.org/moin/ArrayInterface The basic idea is given below (a copy of the web-page). Thanks for any and all feedback. Best regards, -Travis Oliphant This pre-PEP proposes enhancing the buffer protocol in Python 3000 to implement the array interface (protocol). = Overview = The buffer protocol allows different Python types to exchange a pointer to a sequence of internal buffers. This functionality is '''extremely''' useful for sharing large segments of memory between different high-level objects, but it's too limited and has issues. 1. There is the little used "sequence-of-segments" option. 2. There is no way for a consumer to tell the protocol-exporting object it is "finished" with its view of the memory and therefore no way for the object to be sure that it can reallocate the pointer to the memory that it owns (the array object reallocating its memory after sharing it with the buffer object led to the infamous buffer-object problem). 3. Memory is just a pointer. There is no way to describe what's "in" the memory (float, int, C-structure, etc.) 4. There is no shape information provided for the memory. But, several array-like Python types could make use of a standard way to describe the shape of the memory (!wxPython, GTK, CVXOPT, !PyVox, Audio and Video Libraries, ctypes, !NumPy) = Proposal = 1. Replace the buffer protocol that allows sharing of a single pointer to memory 2. Have the protocol define a way to describe what's in the memory location (this should unify what is done now in struct, array, ctypes, and NumPy) 3. Have the protocol be able to share information about shape (and striding if any) 4. Allow exporting objects to define some function that should be called when the consumer object is "done" with the view. = Idea = All that is needed is to create a Python "memory_view" object that can contain all the information needed and be returned when the buffer protocol is called --- when it is garbage-collected, the "bp_release_view" function is called on the exporting object. This "memory_view" is essentially the old Numeric C-structure (including the fact that the data-format is described by another C-structure). This object is what the buffer protocol should return. From guido at python.org Sun Feb 25 05:46:11 2007 From: guido at python.org (Guido van Rossum) Date: Sat, 24 Feb 2007 22:46:11 -0600 Subject: [Python-3000] Bytes <-> string conversion methods In-Reply-To: References: Message-ID: Yup, that was a typo. Someone else noticed it too. It's fixed in the version of the slides I'll post to python.org. --Guido On 2/24/07, Ka-Ping Yee wrote: > Hi Guido, > > I'm in your keynote and looking at a slide right now that says > > * bytes has .encode() method returning a string > * str has a .decode() method returning bytes > > Should the names of those two methods be swapped? I think it > makes more sense to say that an encoding is something that > transforms a string into a sequence of bytes. > > > -- ?!ng > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sun Feb 25 05:52:00 2007 From: guido at python.org (Guido van Rossum) Date: Sat, 24 Feb 2007 22:52:00 -0600 Subject: [Python-3000] Thoughts on new I/O library and bytecode In-Reply-To: References: <45DF1AFA.2090909@latte.ca> <45DF7B5A.7080503@canterbury.ac.nz> <9e804ac0702231609t3f463c60xa8bee6e736c3bb@mail.gmail.com> Message-ID: Why not add the literal to 2.6 too? If that's deemed undesirable, make it bytes(), as long as that actually works when read back. --Guido On 2/24/07, Georg Brandl wrote: > Guido van Rossum schrieb: > > Georg is channeling me well. Also, my thinking has evolved some after > > talking to various folks here at PyCon. > > > > Georg, please check it in! Feel free to update the PEP if you will. > > I will, if you answer me one question: in Python 2.6, should the repr() > return "bytes()" or still "bytes()"? > > Georg > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg.ewing at canterbury.ac.nz Sun Feb 25 22:21:42 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 26 Feb 2007 10:21:42 +1300 Subject: [Python-3000] Thoughts on new I/O library and bytecode In-Reply-To: References: <20070220205651.ADD7.JCARLSON@uci.edu> <45DCE203.2010804@canterbury.ac.nz> <45DF1AFA.2090909@latte.ca> <45DF7B5A.7080503@canterbury.ac.nz> <9e804ac0702231609t3f463c60xa8bee6e736c3bb@mail.gmail.com> Message-ID: <45E1FDE6.6060200@canterbury.ac.nz> Georg Brandl wrote: > Seeing that, I made a patch that makes bytes_repr output a bytes literal, I'm not sure that's a good idea. Any given bytes object is as likely to have been constructed using bytes(...) as using b"...". There's no way of being sure whether displaying it as a string is appropriate or not. I suppose you could scan it for non-ascii codes or something, but that seems a bit dwimish. -- Greg From thomas at python.org Sun Feb 25 22:29:09 2007 From: thomas at python.org (Thomas Wouters) Date: Sun, 25 Feb 2007 13:29:09 -0800 Subject: [Python-3000] Thoughts on new I/O library and bytecode In-Reply-To: <45E1FDE6.6060200@canterbury.ac.nz> References: <45DCE203.2010804@canterbury.ac.nz> <45DF1AFA.2090909@latte.ca> <45DF7B5A.7080503@canterbury.ac.nz> <9e804ac0702231609t3f463c60xa8bee6e736c3bb@mail.gmail.com> <45E1FDE6.6060200@canterbury.ac.nz> Message-ID: <9e804ac0702251329k216d69d4odb96a8feee53250f@mail.gmail.com> I'm not sure what makes you say that. There isn't anyone actually using bytes() right now, so what makes you think how it's created? Besides, lists can be created with list("foo") too, but they still repr() as ['f', 'o', 'o']. On 2/25/07, Greg Ewing wrote: > > Georg Brandl wrote: > > > Seeing that, I made a patch that makes bytes_repr output a bytes > literal, > > I'm not sure that's a good idea. Any given bytes object > is as likely to have been constructed using bytes(...) > as using b"...". There's no way of being sure whether > displaying it as a string is appropriate or not. > > I suppose you could scan it for non-ascii codes or > something, but that seems a bit dwimish. > > -- > Greg > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/thomas%40python.org > -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070225/7ed74f3f/attachment.html From greg.ewing at canterbury.ac.nz Sun Feb 25 22:46:55 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 26 Feb 2007 10:46:55 +1300 Subject: [Python-3000] Pre-PEP: Altering buffer protocol (tp_as_buffer) In-Reply-To: References: Message-ID: <45E203CF.2030307@canterbury.ac.nz> Travis Oliphant wrote: > 2. There is no way for a consumer to tell the protocol-exporting > object it is "finished" with its view of the memory and therefore no way > for the object to be sure that it can reallocate the pointer to the > memory that it owns (the array object reallocating its memory after > sharing it with the buffer object led to the infamous buffer-object > problem). I'm not sure I'd categorise this problem that way -- it was more the buffer object's fault for assuming that it could hold on to a C pointer to the memory long-term. I'm a bit worried about having a get/release kind of thing in the protocol, because it risks forcing all objects which implement the protocol to provide some kind of refcounting and locking mechanism for their data. Some objects may not be able to do that easily or efficiently, especially if they're wrapping some external library that has no such notion. > All that is needed is to create a Python "memory_view" object that can > contain all the information needed and be returned when the buffer > protocol is called --- when it is garbage-collected, the > "bp_release_view" function is called on the exporting object. That sounds too heavyweight. Getting a memory view through this protocol should be a very lightweight operation -- ideally it shouldn't require allocating any memory at all, and it certainly shouldn't require creating a Python object. -- Greg From greg.ewing at canterbury.ac.nz Sun Feb 25 23:15:20 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 26 Feb 2007 11:15:20 +1300 Subject: [Python-3000] Thoughts on new I/O library and bytecode In-Reply-To: <9e804ac0702251329k216d69d4odb96a8feee53250f@mail.gmail.com> References: <45DCE203.2010804@canterbury.ac.nz> <45DF1AFA.2090909@latte.ca> <45DF7B5A.7080503@canterbury.ac.nz> <9e804ac0702231609t3f463c60xa8bee6e736c3bb@mail.gmail.com> <45E1FDE6.6060200@canterbury.ac.nz> <9e804ac0702251329k216d69d4odb96a8feee53250f@mail.gmail.com> Message-ID: <45E20A78.60902@canterbury.ac.nz> Thomas Wouters wrote: > > I'm not sure what makes you say that. There isn't anyone actually using > bytes() right now, so what makes you think how it's created? That's my point -- you *don't* know how any given bytes object was created, so there's no reason to display it in anything other than the most generic way. Another thing is that the idea of displaying a mutable object in a way that closely resembles a non-mutable literal makes me uncomfortable. Actually, writing that sort of literal makes me uncomfortable too, but I'm not sure what to do about that. -- Greg From tdelaney at avaya.com Sun Feb 25 23:28:20 2007 From: tdelaney at avaya.com (Delaney, Timothy (Tim)) Date: Mon, 26 Feb 2007 09:28:20 +1100 Subject: [Python-3000] Thoughts on new I/O library and bytecode Message-ID: <2773CAC687FD5F4689F526998C7E4E5F07446B@au3010avexu1.global.avaya.com> Greg Ewing wrote: > Another thing is that the idea of displaying a mutable > object in a way that closely resembles a non-mutable > literal makes me uncomfortable. Actually, writing that > sort of literal makes me uncomfortable too, but I'm > not sure what to do about that. We obviously need another quote character. I think we're going to have to dip into unicode to get it ;) Just to get it out of the way - as a totally unfeasible and bad idea ... why not use double quotes for unicode literals, and single quotes for byte literals? "this is a unicode literal" 'this is a byte literal' There - got it out of my system. Although, now something else is suggesting that backticks could be repurposed for the job ... `this is a byte literal` Tim Delaney From exarkun at divmod.com Sun Feb 25 23:28:47 2007 From: exarkun at divmod.com (Jean-Paul Calderone) Date: Sun, 25 Feb 2007 17:28:47 -0500 Subject: [Python-3000] Thoughts on new I/O library and bytecode In-Reply-To: <45E20A78.60902@canterbury.ac.nz> Message-ID: <20070225222847.25807.213332275.divmod.quotient.32012@ohm> On Mon, 26 Feb 2007 11:15:20 +1300, Greg Ewing wrote: >Thomas Wouters wrote: >> >> I'm not sure what makes you say that. There isn't anyone actually using >> bytes() right now, so what makes you think how it's created? > >That's my point -- you *don't* know how any given bytes >object was created, so there's no reason to display it >in anything other than the most generic way. > >Another thing is that the idea of displaying a mutable >object in a way that closely resembles a non-mutable >literal makes me uncomfortable. Actually, writing that >sort of literal makes me uncomfortable too, but I'm >not sure what to do about that. > [1, 2, 3] (1, 2, 3) :) Jean-Paul From thomas at python.org Sun Feb 25 23:29:11 2007 From: thomas at python.org (Thomas Wouters) Date: Sun, 25 Feb 2007 14:29:11 -0800 Subject: [Python-3000] Thoughts on new I/O library and bytecode In-Reply-To: <45E20A78.60902@canterbury.ac.nz> References: <45DF1AFA.2090909@latte.ca> <45DF7B5A.7080503@canterbury.ac.nz> <9e804ac0702231609t3f463c60xa8bee6e736c3bb@mail.gmail.com> <45E1FDE6.6060200@canterbury.ac.nz> <9e804ac0702251329k216d69d4odb96a8feee53250f@mail.gmail.com> <45E20A78.60902@canterbury.ac.nz> Message-ID: <9e804ac0702251429i36623d1bg7ea69d0ad7945429@mail.gmail.com> I think you're confused. There isn't anything 'less generic' about the bytes literal. Both bytes([...]) and b"..." can express the full 256 value range. On 2/25/07, Greg Ewing wrote: > > Thomas Wouters wrote: > > > > I'm not sure what makes you say that. There isn't anyone actually using > > bytes() right now, so what makes you think how it's created? > > That's my point -- you *don't* know how any given bytes > object was created, so there's no reason to display it > in anything other than the most generic way. > > Another thing is that the idea of displaying a mutable > object in a way that closely resembles a non-mutable > literal makes me uncomfortable. Actually, writing that > sort of literal makes me uncomfortable too, but I'm > not sure what to do about that. > > -- > Greg > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/thomas%40python.org > -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070225/dda372c5/attachment.htm From nas at arctrix.com Sun Feb 25 23:37:09 2007 From: nas at arctrix.com (Neil Schemenauer) Date: Sun, 25 Feb 2007 22:37:09 +0000 (UTC) Subject: [Python-3000] Thoughts on new I/O library and bytecode References: <45DCE203.2010804@canterbury.ac.nz> <45DF1AFA.2090909@latte.ca> <45DF7B5A.7080503@canterbury.ac.nz> <9e804ac0702231609t3f463c60xa8bee6e736c3bb@mail.gmail.com> <45E1FDE6.6060200@canterbury.ac.nz> <9e804ac0702251329k216d69d4odb96a8feee53250f@mail.gmail.com> <45E20A78.60902@canterbury.ac.nz> Message-ID: Greg Ewing wrote: > That's my point -- you *don't* know how any given bytes > object was created, so there's no reason to display it > in anything other than the most generic way. Practicality beats purity here, I think. For example, if I'm debugging a network protocol, I'd prefer b"EHLO ...\x0d\x0a" over bytes([69, 72, 76, 79, ..., 13, 10]) Cheers, Neil From nas at arctrix.com Mon Feb 26 00:25:38 2007 From: nas at arctrix.com (Neil Schemenauer) Date: Sun, 25 Feb 2007 23:25:38 +0000 (UTC) Subject: [Python-3000] Weird error message from bytes type Message-ID: >>> x = b'a' >>> x[0] = b'a' Traceback (most recent call last): File "", line 1, in TypeError: 'bytes' object cannot be interpreted as an index Huh? 0 is not a 'bytes' object and I don't see how the RHS is being used as an index. Obviously I wanted something like: >>> x[0] = ord(b'a') From guido at python.org Mon Feb 26 00:26:36 2007 From: guido at python.org (Guido van Rossum) Date: Sun, 25 Feb 2007 17:26:36 -0600 Subject: [Python-3000] Pre-PEP: Altering buffer protocol (tp_as_buffer) In-Reply-To: <45E203CF.2030307@canterbury.ac.nz> References: <45E203CF.2030307@canterbury.ac.nz> Message-ID: On 2/25/07, Greg Ewing wrote: > Travis Oliphant wrote: > > > 2. There is no way for a consumer to tell the protocol-exporting > > object it is "finished" with its view of the memory and therefore no way > > for the object to be sure that it can reallocate the pointer to the > > memory that it owns (the array object reallocating its memory after > > sharing it with the buffer object led to the infamous buffer-object > > problem). > > I'm not sure I'd categorise this problem that way -- it was > more the buffer object's fault for assuming that it could > hold on to a C pointer to the memory long-term. > > I'm a bit worried about having a get/release kind of thing > in the protocol, because it risks forcing all objects which > implement the protocol to provide some kind of refcounting > and locking mechanism for their data. Some objects may not > be able to do that easily or efficiently, especially if > they're wrapping some external library that has no such > notion. Only if their buffer can actually move; if the buffer can't be moved or resized once the object is created, the acquire and release can be no-ops. Another problem that would be solved by this is the current unsafety of blocking I/O operations like file.readinto() and socket.recv_into(). These operations do roughly the following: (1) get the pointer and length from the buffer API (2) release the GIL (3) call the blocking read() or recv() system call with the pointer and length (4) reacquire the GIL The problem is that while the GIL is released, another thread with access to the object whose buffer is being read into, could modify it causing the buffer to be moved in memory, and the read() or recv() operation will be overwriting freed memory (or worse, memory allocated for a different purpose). I realized this thinking about the 3.0 bytes object, but the 2.x array object has the same problems, and probably every other object that uses the buffer API and has a mutable size (if there are any). > > All that is needed is to create a Python "memory_view" object that can > > contain all the information needed and be returned when the buffer > > protocol is called --- when it is garbage-collected, the > > "bp_release_view" function is called on the exporting object. > > That sounds too heavyweight. Getting a memory view through > this protocol should be a very lightweight operation -- ideally > it shouldn't require allocating any memory at all, and it > certainly shouldn't require creating a Python object. I agree that getting the pointer and length should be separated from finding out how the bytes should be interpreted. I'd like to propose a simple stack or hierarchy of classes to address (what I think are) Travis's needs: - At the bottom is a redesigned buffer API: add locking, remove segcount and char buffers. - This API is implemented by things like mmap, and also by a "raw bytes" object which allocates a buffer from the heap; other libraries may have their own objects that implement this (e.g. numpy, PIL). - There is a mixin class (at least conceptually it's a mixin) which takes anything implementing the redesigned buffer API and adds the bytes API (see recently updated PEP 358); operations like .strip() or slicing should return copies (of the same or a different type) or views at the discretion of the underlying object. (Maybe there should be a read-only and read-write version of this; note that read-only is not the same as immutable, since the underlying buffer may be modified by other APIs, if it allows this.) - *Another* API built on top of the redesigned buffer API would be something more aligned with numpy's needs, adding (a) a shape descriptor indicating the size, offset and stride of each dimension, and (b) a record descriptor indicating the interpretation of one element of the array. For (a), a list of 3-tuples of ints would probably be sufficient (constrained so that no valid combination of indexes points outside the buffer); for (b), I propose (with Jim Hugunin who first suggested this at PyCon) to use the same concise but expressing format-string-like notation used by the struct module. (The bytes API is not quite a special case of this, since it provides more string-like operations.) The crucial idea here (like so often :-) is not to use inheritance but composition. This means that we can separate management of the buffer (e.g. malloc, mmap, whatever) from providing APIs on top of this (either the bytes API or the multi-dimensional array API). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas at python.org Mon Feb 26 00:35:19 2007 From: thomas at python.org (Thomas Wouters) Date: Sun, 25 Feb 2007 15:35:19 -0800 Subject: [Python-3000] Weird error message from bytes type In-Reply-To: References: Message-ID: <9e804ac0702251535i7fc7aabiaabddbb0209ea4f1@mail.gmail.com> This is because a bytes object is not a sequence of bytes objects, like strings. It's a sequence of small integer values, so you need to assign a small integer value to it. You can assign b'a'[0] to it, or assign b'a' to x[:1]. I guess we could specialcase length-1 bytes to make this work 'naturally', but I'm not sure that's the right approach. Guido? On 2/25/07, Neil Schemenauer wrote: > > >>> x = b'a' > >>> x[0] = b'a' > Traceback (most recent call last): > File "", line 1, in > TypeError: 'bytes' object cannot be interpreted as an index > > Huh? 0 is not a 'bytes' object and I don't see how the RHS is being > used as an index. Obviously I wanted something like: > > >>> x[0] = ord(b'a') > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/thomas%40python.org > -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070225/244ef688/attachment.htm From guido at python.org Mon Feb 26 00:40:12 2007 From: guido at python.org (Guido van Rossum) Date: Sun, 25 Feb 2007 17:40:12 -0600 Subject: [Python-3000] Weird error message from bytes type In-Reply-To: <9e804ac0702251535i7fc7aabiaabddbb0209ea4f1@mail.gmail.com> References: <9e804ac0702251535i7fc7aabiaabddbb0209ea4f1@mail.gmail.com> Message-ID: Thomas is correct. You can only assign ints in range(256) to a single index. This would work: x[:1] = b"a" The error comes from the call to PyNumber_AsSsize_t() in bytes_setitem(), which apparently looks for __index__ or the tp_index slot. --Guido On 2/25/07, Thomas Wouters wrote: > > This is because a bytes object is not a sequence of bytes objects, like > strings. It's a sequence of small integer values, so you need to assign a > small integer value to it. You can assign b'a'[0] to it, or assign b'a' to > x[:1]. I guess we could specialcase length-1 bytes to make this work > 'naturally', but I'm not sure that's the right approach. Guido? > > > On 2/25/07, Neil Schemenauer wrote: > > > > >>> x = b'a' > > >>> x[0] = b'a' > > Traceback (most recent call last): > > File "", line 1, in > > TypeError: 'bytes' object cannot be interpreted as an index > > > > Huh? 0 is not a 'bytes' object and I don't see how the RHS is being > > used as an index. Obviously I wanted something like: > > > > >>> x[0] = ord(b'a') -- --Guido van Rossum (home page: http://www.python.org/~guido/) From g.brandl at gmx.net Mon Feb 26 00:52:30 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 26 Feb 2007 00:52:30 +0100 Subject: [Python-3000] Weird error message from bytes type In-Reply-To: <9e804ac0702251535i7fc7aabiaabddbb0209ea4f1@mail.gmail.com> References: <9e804ac0702251535i7fc7aabiaabddbb0209ea4f1@mail.gmail.com> Message-ID: Thomas Wouters schrieb: > > This is because a bytes object is not a sequence of bytes objects, like > strings. It's a sequence of small integer values, so you need to assign > a small integer value to it. You can assign b'a'[0] to it, or assign > b'a' to x[:1]. I guess we could specialcase length-1 bytes to make this > work 'naturally', but I'm not sure that's the right approach. Guido? If it is deemed right, see attached patch. BTW, is it intentional that the setitem/setslice code is duplicated in bytesobject.c? Georg -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: bytes-ass.diff Url: http://mail.python.org/pipermail/python-3000/attachments/20070226/faef94aa/attachment.diff From guido at python.org Mon Feb 26 00:56:42 2007 From: guido at python.org (Guido van Rossum) Date: Sun, 25 Feb 2007 17:56:42 -0600 Subject: [Python-3000] Weird error message from bytes type In-Reply-To: References: <9e804ac0702251535i7fc7aabiaabddbb0209ea4f1@mail.gmail.com> Message-ID: No, I don't want length-1-bytes to get special treatment here. That would just perpetuate confusion, since b[0] *returns* an int no matter what you might have set it to. --Guido On 2/25/07, Georg Brandl wrote: > Thomas Wouters schrieb: > > > > This is because a bytes object is not a sequence of bytes objects, like > > strings. It's a sequence of small integer values, so you need to assign > > a small integer value to it. You can assign b'a'[0] to it, or assign > > b'a' to x[:1]. I guess we could specialcase length-1 bytes to make this > > work 'naturally', but I'm not sure that's the right approach. Guido? > > If it is deemed right, see attached patch. > > BTW, is it intentional that the setitem/setslice code is duplicated in > bytesobject.c? > > Georg > > Index: Objects/bytesobject.c > =================================================================== > --- Objects/bytesobject.c (Revision 53912) > +++ Objects/bytesobject.c (Arbeitskopie) > @@ -451,7 +451,18 @@ > slicelen = 1; > } > else { > - Py_ssize_t ival = PyNumber_AsSsize_t(values, PyExc_ValueError); > + Py_ssize_t ival; > + /* if the value is a length-one bytes object, assign it */ > + if (PyBytes_Check(values)) { > + if (PyBytes_GET_SIZE(values) != 1) { > + PyErr_SetString(PyExc_ValueError, "cannot assign bytes " > + "object of length != 1"); > + return -1; > + } > + self->ob_bytes[i] = ((PyBytesObject *)values)->ob_bytes[0]; > + return 0; > + } > + ival = PyNumber_AsSsize_t(values, PyExc_ValueError); > if (ival == -1 && PyErr_Occurred()) > return -1; > if (ival < 0 || ival >= 256) { > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From nas at arctrix.com Mon Feb 26 01:29:31 2007 From: nas at arctrix.com (Neil Schemenauer) Date: Sun, 25 Feb 2007 18:29:31 -0600 Subject: [Python-3000] Weird error message from bytes type In-Reply-To: References: <9e804ac0702251535i7fc7aabiaabddbb0209ea4f1@mail.gmail.com> Message-ID: <20070226002930.GB3067@python.ca> On Sun, Feb 25, 2007 at 05:40:12PM -0600, Guido van Rossum wrote: > Thomas is correct. You can only assign ints in range(256) to a single > index. Yes, I understand that. I think the error message is bad though. > The error comes from the call to PyNumber_AsSsize_t() in > bytes_setitem(), which apparently looks for __index__ or the tp_index > slot. I think PyNumber_AsSsize_t is being used on the RHS operand. That's perhaps convenient but makes for a confusing message. There was nothing wrong with the value I was using for an index. Neil From guido at python.org Mon Feb 26 01:39:16 2007 From: guido at python.org (Guido van Rossum) Date: Sun, 25 Feb 2007 18:39:16 -0600 Subject: [Python-3000] Weird error message from bytes type In-Reply-To: <20070226002930.GB3067@python.ca> References: <9e804ac0702251535i7fc7aabiaabddbb0209ea4f1@mail.gmail.com> <20070226002930.GB3067@python.ca> Message-ID: Correct (I wasn't saying it was used on the lhs operand :-). I find it important to use that API since anything that wants to behave like a (small) int should be acceptable. Can you suggest a better way to formulate the error from that API? --Guido On 2/25/07, Neil Schemenauer wrote: > On Sun, Feb 25, 2007 at 05:40:12PM -0600, Guido van Rossum wrote: > > Thomas is correct. You can only assign ints in range(256) to a single > > index. > > Yes, I understand that. I think the error message is bad though. > > > The error comes from the call to PyNumber_AsSsize_t() in > > bytes_setitem(), which apparently looks for __index__ or the tp_index > > slot. > > I think PyNumber_AsSsize_t is being used on the RHS operand. That's > perhaps convenient but makes for a confusing message. There was > nothing wrong with the value I was using for an index. > > Neil > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg.ewing at canterbury.ac.nz Mon Feb 26 04:04:22 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 26 Feb 2007 16:04:22 +1300 Subject: [Python-3000] Thoughts on new I/O library and bytecode In-Reply-To: <20070225222847.25807.213332275.divmod.quotient.32012@ohm> References: <20070225222847.25807.213332275.divmod.quotient.32012@ohm> Message-ID: <45E24E36.7030005@canterbury.ac.nz> Jean-Paul Calderone wrote: > > Actually, writing that > > sort of literal makes me uncomfortable too, but I'm > > not sure what to do about that. > > [1, 2, 3] > (1, 2, 3) Not quite sure what your point is. My point is that I'm thoroughly conditioned to think of anything in quotes as immutable, and it will take a while to get out of that habit, I suspect. Also I'm a little worried about the pedagogical implications of teaching people that x"..." is a unicode string for all values of x *except* b, whereupon it's not unicode and isn't even a string. I'm wondering whether it would be better to have the compiler recognise bytes("...") and special case it. At least it *looks* like a constructor call then, which is what b"..." would actually be. -- Greg From greg.ewing at canterbury.ac.nz Mon Feb 26 04:12:37 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 26 Feb 2007 16:12:37 +1300 Subject: [Python-3000] Thoughts on new I/O library and bytecode In-Reply-To: <9e804ac0702251429i36623d1bg7ea69d0ad7945429@mail.gmail.com> References: <45DF1AFA.2090909@latte.ca> <45DF7B5A.7080503@canterbury.ac.nz> <9e804ac0702231609t3f463c60xa8bee6e736c3bb@mail.gmail.com> <45E1FDE6.6060200@canterbury.ac.nz> <9e804ac0702251329k216d69d4odb96a8feee53250f@mail.gmail.com> <45E20A78.60902@canterbury.ac.nz> <9e804ac0702251429i36623d1bg7ea69d0ad7945429@mail.gmail.com> Message-ID: <45E25025.6020107@canterbury.ac.nz> Thomas Wouters wrote: > > I think you're confused. There isn't anything 'less generic' about the > bytes literal. Both bytes([...]) and b"..." can express the full 256 > value range. Yes, but it only makes sense to try to display it as characters if it's meant to represent characters in the first place. Otherwise you get something that looks like line noise. BTW, I don't really think that bytes([104, 101, 108, 108, 111]) is the right way to display it either. There ought to be some kind of compact hex format. Maybe something like $[68656C6C6F] -- Greg From greg.ewing at canterbury.ac.nz Mon Feb 26 04:19:47 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 26 Feb 2007 16:19:47 +1300 Subject: [Python-3000] Thoughts on new I/O library and bytecode In-Reply-To: References: <45DCE203.2010804@canterbury.ac.nz> <45DF1AFA.2090909@latte.ca> <45DF7B5A.7080503@canterbury.ac.nz> <9e804ac0702231609t3f463c60xa8bee6e736c3bb@mail.gmail.com> <45E1FDE6.6060200@canterbury.ac.nz> <9e804ac0702251329k216d69d4odb96a8feee53250f@mail.gmail.com> <45E20A78.60902@canterbury.ac.nz> Message-ID: <45E251D3.7050601@canterbury.ac.nz> Neil Schemenauer wrote: > Practicality beats purity here, I think. For example, if I'm > debugging a network protocol, I'd prefer > > b"EHLO ...\x0d\x0a" But what if I'm *not* debugging a network protocol, and my bytes objects all look like random gibberish when displayed as characters? To put it another way: If bytes objects are displayed in hex by default (see previous post), I can easily get them displayed as characters if that's what I want using str(b, suitable_encoding). But if they're displayed as characters by default, what do I do to get them displayed as not-characters? -- Greg From greg.ewing at canterbury.ac.nz Mon Feb 26 04:42:46 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 26 Feb 2007 16:42:46 +1300 Subject: [Python-3000] Weird error message from bytes type In-Reply-To: References: <9e804ac0702251535i7fc7aabiaabddbb0209ea4f1@mail.gmail.com> <20070226002930.GB3067@python.ca> Message-ID: <45E25736.2010404@canterbury.ac.nz> Guido van Rossum wrote: > I find it important to use that API since anything that wants to > behave like a (small) int should be acceptable. But by calling __index__ and giving error messages about indexes, PyInt_AsSsize_t seems to be assuming that the value is going to be used as an index. If that's the true purpose of PyInt_AsSsize_t, then it shouldn't be getting called in this situation. If it's not, then it shouldn't be giving error messages that talk about indexes, and there should be another API such as PyObject_AsIndex for values that really are going to be used as indexes. -- Greg From jcarlson at uci.edu Mon Feb 26 04:58:39 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Sun, 25 Feb 2007 19:58:39 -0800 Subject: [Python-3000] Thoughts on new I/O library and bytecode In-Reply-To: <45E25025.6020107@canterbury.ac.nz> References: <9e804ac0702251429i36623d1bg7ea69d0ad7945429@mail.gmail.com> <45E25025.6020107@canterbury.ac.nz> Message-ID: <20070225194742.AE4B.JCARLSON@uci.edu> Greg Ewing wrote: > > Thomas Wouters wrote: > > > > I think you're confused. There isn't anything 'less generic' about the > > bytes literal. Both bytes([...]) and b"..." can express the full 256 > > value range. > > Yes, but it only makes sense to try to display it as > characters if it's meant to represent characters in > the first place. Otherwise you get something that > looks like line noise. > > BTW, I don't really think that bytes([104, 101, 108, > 108, 111]) is the right way to display it either. > There ought to be some kind of compact hex format. > Maybe something like > > $[68656C6C6F] I think it's a bad idea to choose a representation with any format that isn't able to do the eval(repr(obj)) loop. I'm not a fan of 'bytes([101, 108, ...])', nor do I like 'bytes([0xd7, 0x19, ...])'. 'bytes(b"stuff")' is a bit redundant, but it would get the point across. I'm not sure I *like* b"stuff", but I don't loathe it like I do the other two that are passed lists. Maybe 'bytes("stuff", "latin-1")', but then it is underlying platform and/or file encoding sensitive. It may be the case that b"stuff" is the most concise and reasonable repr form... - Josiah From guido at python.org Mon Feb 26 06:36:39 2007 From: guido at python.org (Guido van Rossum) Date: Sun, 25 Feb 2007 23:36:39 -0600 Subject: [Python-3000] Thoughts on new I/O library and bytecode In-Reply-To: <45E251D3.7050601@canterbury.ac.nz> References: <45DF1AFA.2090909@latte.ca> <45DF7B5A.7080503@canterbury.ac.nz> <9e804ac0702231609t3f463c60xa8bee6e736c3bb@mail.gmail.com> <45E1FDE6.6060200@canterbury.ac.nz> <9e804ac0702251329k216d69d4odb96a8feee53250f@mail.gmail.com> <45E20A78.60902@canterbury.ac.nz> <45E251D3.7050601@canterbury.ac.nz> Message-ID: On 2/25/07, Greg Ewing wrote: > But if they're displayed as characters by default, > what do I do to get them displayed as not-characters? Well anything that's not an ASCII printable is \x escaped anyway. If you want all hex, use the .hex() method described in the PEP. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Feb 26 06:38:26 2007 From: guido at python.org (Guido van Rossum) Date: Sun, 25 Feb 2007 23:38:26 -0600 Subject: [Python-3000] Weird error message from bytes type In-Reply-To: <45E25736.2010404@canterbury.ac.nz> References: <9e804ac0702251535i7fc7aabiaabddbb0209ea4f1@mail.gmail.com> <20070226002930.GB3067@python.ca> <45E25736.2010404@canterbury.ac.nz> Message-ID: Please give the poor function a break. It was added to 2.5 and used only for indexing there. In 3.0 it is more generally useful (I want to use it whenever an int is needed). but in our pre-alpha code the error messages haven't been fixed yet. That's the whole story. On 2/25/07, Greg Ewing wrote: > Guido van Rossum wrote: > > > I find it important to use that API since anything that wants to > > behave like a (small) int should be acceptable. > > But by calling __index__ and giving error messages about > indexes, PyInt_AsSsize_t seems to be assuming that the > value is going to be used as an index. > > If that's the true purpose of PyInt_AsSsize_t, then it > shouldn't be getting called in this situation. > > If it's not, then it shouldn't be giving error messages > that talk about indexes, and there should be another > API such as PyObject_AsIndex for values that really > are going to be used as indexes. > > -- > Greg > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From ncoghlan at gmail.com Mon Feb 26 11:10:36 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 26 Feb 2007 20:10:36 +1000 Subject: [Python-3000] Weird error message from bytes type In-Reply-To: References: <9e804ac0702251535i7fc7aabiaabddbb0209ea4f1@mail.gmail.com> <20070226002930.GB3067@python.ca> <45E25736.2010404@canterbury.ac.nz> Message-ID: <45E2B21C.1030101@gmail.com> Guido van Rossum wrote: > Please give the poor function a break. It was added to 2.5 and used > only for indexing there. In 3.0 it is more generally useful (I want to > use it whenever an int is needed). but in our pre-alpha code the error > messages haven't been fixed yet. That's the whole story. A couple of locations in the 2.5 standard library actually had to deal with the same problem. They currently make their own calls to PyIndex_Check() in order to override the default error message. This happens in: sequence_repeat (abstract.c) _GetMapSize (mmapmodule.c) slice_indices (sliceobject.c) also uses PyNumber_AsSsize_t to check the length argument that is passed in, but it just allows the PyNumber_Index error message to flow through: .>>> slice(2).indices('1') Traceback (most recent call last): File "", line 1, in TypeError: 'str' object cannot be interpreted as an index > On 2/25/07, Greg Ewing wrote: >> But by calling __index__ and giving error messages about >> indexes, PyInt_AsSsize_t seems to be assuming that the >> value is going to be used as an index. >> >> If that's the true purpose of PyInt_AsSsize_t, then it >> shouldn't be getting called in this situation. >> >> If it's not, then it shouldn't be giving error messages >> that talk about indexes, and there should be another >> API such as PyObject_AsIndex for values that really >> are going to be used as indexes. Generating a different error message when passing invalid types to PyNumber_AsSsize_t (as opposed to PyNumber_Index) wasn't particularly high on the to-do list when we were trying to fix the __index__() clipping bugs for the 2.5 release - the exception raised by the eventual implementation was of the correct type, even if the message wasn't perfect. Further complicating a C API that was already somewhat complex (a type checking function, plus two different conversion functions, one with an extra argument relating to overflow handling) didn't seem to be a desirable thing to do. With additional usage in non-index contexts (and a bit more time to do the work!), then it probably makes sense to modify PyNumber_Index and PyNumber_AsSsize_t to call a common static function which allows them to specify different format strings for the type error. However, given that the error message has to make sense even when the object involved is a float, finding appropriate wording that doesn't mention the __index__ slot is somewhat challenging. For example, simply replacing 'index' with 'integer' could lead to a different kind of confusion: TypeError: 'float' object cannot be interpreted as an integer Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From oliphant.travis at ieee.org Mon Feb 26 19:43:26 2007 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Mon, 26 Feb 2007 11:43:26 -0700 Subject: [Python-3000] pre-PEP: Enhancing buffer protocol (tp_as_buffer) Message-ID: It was so nice to see many of you at PyCon this year. The event was very well handled and congradulations are deserved all around. I brought up the idea of the array interface several times. After I heard Guido's keynote and saw the scheduled time-lines, I realized that my approach should be to push for the array interface into Python 3.0 as an enhancement/adaptation of the buffer protocol (which I have not heard or seen much discussion about). Later we can back-port the result to Python 2.6. To encourage a useful discussion, I've started a Wiki that describes the idea behind my proposal and placed it at: http://wiki.python.org/moin/ArrayInterface The basic idea is to define a memory-view object which is returned by the buffer-protocol and contains not just a pointer to the memory but also shape, stride, and data-format information. It would be nice if there were also some additions to the Python C-API to make it easy to work with this memory-view object, but I don't envision needing to make this object available to Python directly. I'm willing to work on this buffer protocol and maintain it as well in the future. Thanks for any comments and/or feedback. Best regards, -Travis From oliphant.travis at ieee.org Mon Feb 26 19:46:42 2007 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Mon, 26 Feb 2007 11:46:42 -0700 Subject: [Python-3000] pre-PEP: Enhancing buffer protocol (tp_as_buffer) In-Reply-To: References: Message-ID: I'm sorry for creating two threads on the buffer protocol. I didn't see the first one because I mistakenly put it as a reply to another thread. This one is independent and should be more helpful as a discussion place-holder. -Travis From oliphant.travis at ieee.org Mon Feb 26 19:53:35 2007 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Mon, 26 Feb 2007 11:53:35 -0700 Subject: [Python-3000] pre-PEP: Enhancing buffer protocol (tp_as_buffer) In-Reply-To: References: Message-ID: > >> 2. There is no way for a consumer to tell the protocol-exporting >>object it is "finished" with its view of the memory and therefore no way >>for the object to be sure that it can reallocate the pointer to the >>memory that it owns (the array object reallocating its memory after >>sharing it with the buffer object led to the infamous buffer-object >>problem). > > > I'm a bit worried about having a get/release kind of thing > in the protocol, because it risks forcing all objects which > implement the protocol to provide some kind of refcounting > and locking mechanism for their data. Some objects may not > be able to do that easily or efficiently, especially if > they're wrapping some external library that has no such > notion. If they can't do it easily, then they don't have to define the release-function and Python will never call it. > > >>All that is needed is to create a Python "memory_view" object that can >>contain all the information needed and be returned when the buffer >>protocol is called --- when it is garbage-collected, the >>"bp_release_view" function is called on the exporting object. > > > That sounds too heavyweight. Getting a memory view through > this protocol should be a very lightweight operation -- ideally > it shouldn't require allocating any memory at all, and it > certainly shouldn't require creating a Python object. If you want shape information you are going to have to allocate memory. If you are going to do that you might as well return a Python object so you can manage this memory easily. If you don't want or need shape or detailed type information, I could also see and have no objection to keeping a lightweight version of the protocol that only returns simple integers. I'll put that in the PEP. -Travis From oliphant.travis at ieee.org Mon Feb 26 20:24:39 2007 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Mon, 26 Feb 2007 12:24:39 -0700 Subject: [Python-3000] pre-PEP: Enhancing buffer protocol (tp_as_buffer) In-Reply-To: References: Message-ID: Guido van Rossum wrote: > On 2/25/07, Greg Ewing wrote: > >>Travis Oliphant wrote: >> >> >>> 2. There is no way for a consumer to tell the protocol-exporting >>>object it is "finished" with its view of the memory and therefore no way >>>for the object to be sure that it can reallocate the pointer to the >>>memory that it owns (the array object reallocating its memory after >>>sharing it with the buffer object led to the infamous buffer-object >>>problem). >> > > Another problem that would be solved by this is the current unsafety > of blocking I/O operations like file.readinto() and > socket.recv_into(). These operations do roughly the following: > > (1) get the pointer and length from the buffer API > (2) release the GIL > (3) call the blocking read() or recv() system call with the pointer and length > (4) reacquire the GIL > > The problem is that while the GIL is released, another thread with > access to the object whose buffer is being read into, could modify it > causing the buffer to be moved in memory, and the read() or recv() > operation will be overwriting freed memory (or worse, memory allocated > for a different purpose). > > I realized this thinking about the 3.0 bytes object, but the 2.x array > object has the same problems, and probably every other object that > uses the buffer API and has a mutable size (if there are any). Yes, the NumPy object has this problem as well (although it has *very* conservative checks so that if the reference count on the array is not 1, memory is not reallocated). > > I agree that getting the pointer and length should be separated from > finding out how the bytes should be interpreted. I'd like to propose a > simple stack or hierarchy of classes to address (what I think are) > Travis's needs: > > - At the bottom is a redesigned buffer API: add locking, remove > segcount and char buffers. Great. I have no problem with this. Is your idea of locking the same as mine (i.e. a function in the API for release?) > > - There is a mixin class (at least conceptually it's a mixin) which > takes anything implementing the redesigned buffer API and adds the > bytes API (see recently updated PEP 358); operations like .strip() or > slicing should return copies (of the same or a different type) or > views at the discretion of the underlying object. (Maybe there should > be a read-only and read-write version of this; note that read-only is > not the same as immutable, since the underlying buffer may be modified > by other APIs, if it allows this.) I'm not sure what this mixin class is. Is this a base class for the bytes object? I need to understand this better in order to write a PEP. > > - *Another* API built on top of the redesigned buffer API would be > something more aligned with numpy's needs, adding (a) a shape > descriptor indicating the size, offset and stride of each dimension, > and (b) a record descriptor indicating the interpretation of one > element of the array. For (a), a list of 3-tuples of ints would > probably be sufficient (constrained so that no valid combination of > indexes points outside the buffer); for (b), I propose (with Jim > Hugunin who first suggested this at PyCon) to use the same concise but > expressing format-string-like notation used by the struct module. (The > bytes API is not quite a special case of this, since it provides more > string-like operations.) > Great. NumPy has already adopted the struct standard for it's "hidden" character codes. We also need to add some format codes for complex-data ('F','D','G') and for long doubles ('g'). I would also propose that we make an enumeration in Python so we can refer to these codes in C/C++ as constants: PYFORMAT_LONG PYFORMAT_UINT etc. a) I would prefer a 3-tuple of lists for the shape descriptor (shape list, stride list, offset list) That way default striding could be given as None and there would not have to be any offset as well. My view on the offset is that it is not necessary as the start of the array is already given by the memory pointer. But, if others see a strong need for it, I have no problem with including it. b) I'm also fine with just returning a string for the record descriptor like the struct module uses. -Travis > The crucial idea here (like so often :-) is not to use inheritance but > composition. This means that we can separate management of the buffer > (e.g. malloc, mmap, whatever) from providing APIs on top of this > (either the bytes API or the multi-dimensional array API). > From guido at python.org Mon Feb 26 21:28:47 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 26 Feb 2007 14:28:47 -0600 Subject: [Python-3000] pre-PEP: Enhancing buffer protocol (tp_as_buffer) In-Reply-To: References: Message-ID: On 2/26/07, Travis Oliphant wrote: > Guido van Rossum wrote: > > I realized this thinking about the 3.0 bytes object, but the 2.x array > > object has the same problems, and probably every other object that > > uses the buffer API and has a mutable size (if there are any). > > Yes, the NumPy object has this problem as well (although it has *very* > conservative checks so that if the reference count on the array is not > 1, memory is not reallocated). That would be *too* conservative for me -- just passing it as an argument to another function increfs it (for the duration of the call). > > I agree that getting the pointer and length should be separated from > > finding out how the bytes should be interpreted. I'd like to propose a > > simple stack or hierarchy of classes to address (what I think are) > > Travis's needs: > > > > - At the bottom is a redesigned buffer API: add locking, remove > > segcount and char buffers. > > Great. I have no problem with this. Is your idea of locking the same > as mine (i.e. a function in the API for release?) Right. > > - There is a mixin class (at least conceptually it's a mixin) which > > takes anything implementing the redesigned buffer API and adds the > > bytes API (see recently updated PEP 358); operations like .strip() or > > slicing should return copies (of the same or a different type) or > > views at the discretion of the underlying object. (Maybe there should > > be a read-only and read-write version of this; note that read-only is > > not the same as immutable, since the underlying buffer may be modified > > by other APIs, if it allows this.) > > I'm not sure what this mixin class is. Is this a base class for the > bytes object? I need to understand this better in order to write a PEP. Yes, that's a good way to describe it. > > - *Another* API built on top of the redesigned buffer API would be > > something more aligned with numpy's needs, adding (a) a shape > > descriptor indicating the size, offset and stride of each dimension, > > and (b) a record descriptor indicating the interpretation of one > > element of the array. For (a), a list of 3-tuples of ints would > > probably be sufficient (constrained so that no valid combination of > > indexes points outside the buffer); for (b), I propose (with Jim > > Hugunin who first suggested this at PyCon) to use the same concise but > > expressing format-string-like notation used by the struct module. (The > > bytes API is not quite a special case of this, since it provides more > > string-like operations.) > > Great. NumPy has already adopted the struct standard for it's "hidden" > character codes. Glad to get agreement. > We also need to add some format codes for complex-data ('F','D','G') and > for long doubles ('g'). No problem. Just make this a separate section in your PEP ("proposed additions for the struct module"). > I would also propose that we make an > enumeration in Python so we can refer to these codes in C/C++ as constants: > > PYFORMAT_LONG > PYFORMAT_UINT > > etc. Not sure I follow but sounds fine; hopefully the PEP draft will clarify this. > a) I would prefer a 3-tuple of lists for the shape descriptor > (shape list, stride list, offset list) > > That way default striding could be given as None and there would not > have to be any offset as well. Of course. I don't know much about the traditional way of representing MD array structure. > My view on the offset is that it is not necessary as the start of the > array is already given by the memory pointer. But, if others see a > strong need for it, I have no problem with including it. Well don't you end up with an offset as soon as you take a rectangular slice out of a 2d array? > b) I'm also fine with just returning a string for the record descriptor > like the struct module uses. Excellent. Are we all set then? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From oliphant.travis at ieee.org Mon Feb 26 21:37:32 2007 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Mon, 26 Feb 2007 13:37:32 -0700 Subject: [Python-3000] pre-PEP: Enhancing buffer protocol (tp_as_buffer) In-Reply-To: References: Message-ID: Guido van Rossum wrote: > On 2/26/07, Travis Oliphant wrote: > >>Guido van Rossum wrote: >> >>>I realized this thinking about the 3.0 bytes object, but the 2.x array >>>object has the same problems, and probably every other object that >>>uses the buffer API and has a mutable size (if there are any). >> >>Yes, the NumPy object has this problem as well (although it has *very* >>conservative checks so that if the reference count on the array is not >>1, memory is not reallocated). > > > That would be *too* conservative for me -- just passing it as an > argument to another function increfs it (for the duration of the > call). > It's too conservative for us to. We just don't see anyway around it without the locking mechanism (right now you can over-ride the ref-count checking if you know what you are doing). >> >>I'm not sure what this mixin class is. Is this a base class for the >>bytes object? I need to understand this better in order to write a PEP. > > > Yes, that's a good way to describe it. > > >>>- *Another* API built on top of the redesigned buffer API would be >>>something more aligned with numpy's needs, adding (a) a shape >>>descriptor indicating the size, offset and stride of each dimension, >>>and (b) a record descriptor indicating the interpretation of one >>>element of the array. For (a), a list of 3-tuples of ints would >>>probably be sufficient (constrained so that no valid combination of >>>indexes points outside the buffer); for (b), I propose (with Jim >>>Hugunin who first suggested this at PyCon) to use the same concise but >>>expressing format-string-like notation used by the struct module. (The >>>bytes API is not quite a special case of this, since it provides more >>>string-like operations.) >> >>Great. NumPy has already adopted the struct standard for it's "hidden" >>character codes. > > > Glad to get agreement. > > >>We also need to add some format codes for complex-data ('F','D','G') and >>for long doubles ('g'). > > > No problem. Just make this a separate section in your PEP ("proposed > additions for the struct module"). > O.K. great. > >>I would also propose that we make an >>enumeration in Python so we can refer to these codes in C/C++ as constants: >> >>PYFORMAT_LONG >>PYFORMAT_UINT >> >>etc. > > > Not sure I follow but sounds fine; hopefully the PEP draft will clarify this. > This is just some header magic (either defines or an enum statement so you don't have to remember character codes in C/C++). > >>a) I would prefer a 3-tuple of lists for the shape descriptor >>(shape list, stride list, offset list) >> >>That way default striding could be given as None and there would not >>have to be any offset as well. > > > Of course. I don't know much about the traditional way of representing > MD array structure. > > >>My view on the offset is that it is not necessary as the start of the >>array is already given by the memory pointer. But, if others see a >>strong need for it, I have no problem with including it. > > > Well don't you end up with an offset as soon as you take a rectangular > slice out of a 2d array? You can either 1) keep the same base memory pointer and create an offset list, or 2) have no offset and change the starting memory pointer. NumPy uses option 2 (it stores the starting point of the array). > > >>b) I'm also fine with just returning a string for the record descriptor >>like the struct module uses. > > > Excellent. Are we all set then? I think so. I have some additional ideas about the string format description that I will explain in the PEP. The draft is coming along at http://wiki.python.org/moin/ArrayInterface Feel free to make changes there. -Travis From oliphant.travis at ieee.org Mon Feb 26 21:44:01 2007 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Mon, 26 Feb 2007 13:44:01 -0700 Subject: [Python-3000] pre-PEP: Enhancing buffer protocol (tp_as_buffer) In-Reply-To: References: Message-ID: Guido van Rossum wrote: > On 2/26/07, Travis Oliphant wrote: > >>Guido van Rossum wrote: >> > > > Excellent. Are we all set then? > One more question? What is the reason for separate read/write getbuffer calls. What is the problem with just one getbuffer call with a flag to indicate whether or not you want a writeable memory area? I prefer fewer function pointers because it means that extension types must implement fewer functions. But, either way. I know there is some stylistic distaste for "flags" in APIs. One could still keep two C-API calls for getting read-only and writeable buffers. -Travis From guido at python.org Mon Feb 26 22:12:47 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 26 Feb 2007 15:12:47 -0600 Subject: [Python-3000] pre-PEP: Enhancing buffer protocol (tp_as_buffer) In-Reply-To: References: Message-ID: On 2/26/07, Travis Oliphant wrote: > One more question? What is the reason for separate read/write getbuffer > calls. What is the problem with just one getbuffer call with a flag to > indicate whether or not you want a writeable memory area? I'm not sure; that API grew somewhat organically. I guess having separate functions makes it possible to test whether the buffer is writable at all, but IMO checking for an error is just as expedient, so as long as we're redesigning the whole API you can design whatever you want. > I prefer fewer function pointers because it means that extension types > must implement fewer functions. But, either way. Right. > I know there is some stylistic distaste for "flags" in APIs. That's more a Python-level preference. > One could still keep two C-API calls for getting read-only and writeable buffers. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From mike.verdone at gmail.com Mon Feb 26 22:35:54 2007 From: mike.verdone at gmail.com (Mike Verdone) Date: Mon, 26 Feb 2007 15:35:54 -0600 Subject: [Python-3000] Draft PEP for New IO system Message-ID: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com> Hi all, Daniel Stutzbach and I have prepared a draft PEP for the new IO system for Python 3000. This document is, hopefully, true to the info that Guido wrote on the whiteboards here at PyCon. This is still a draft and there's quite a few decisions that need to be made. Feedback is welcomed. We've published it on Google Docs here: http://docs.google.com/Doc?id=dfksfvqd_1cn5g5m What follows is a plaintext version. Thanks, Mike. PEP: XXX Title: New IO Version: Last-Modified: Authors: Daniel Stutzbach, Mike Verdone Status: Draft Type: Created: 26-Feb-2007 Rationale and Goals Python allows for a variety of file-like objects that can be worked with via bare read() and write() calls using duck typing. Anything that provides read() and write() is stream-like. However, more exotic and extremely useful functions like readline() or seek() may or may not be available on a file-like object. Python needs a specification for basic byte-based IO streams to which we can add buffering and text-handling features. Once we have a defined raw byte-based IO interface, we can add buffering and text-handling layers on top of any byte-based IO class. The same buffering and text handling logic can be used for files, sockets, byte arrays, or custom IO classes developed by Python programmers. Developing a standard definition of a stream lets us separate stream-based operations like read() and write() from implementation specific operations like fileno() and isatty(). It encourages programmers to write code that uses streams as streams and not require that all streams support file-specific or socket-specific operations. The new IO spec is intended to be similar to the Java IO libraries, but generally less confusing. Programmers who don't want to muck about in the new IO world can expect that the open() factory method will produce an object backwards-compatible with old-style file objects. Specification The Python I/O Library will consist of three layers: a raw I/O layer, a buffer I/O layer, and a text I/O layer. Each layer is defined by an abstract base class, which may have multiple implementations. The raw I/O and buffer I/O layers deal with units of bytes, while the text I/O layer deals with units of characters. Raw I/O The abstract base class for raw I/O is RawIOBase. It has several methods which are wrappers around the appropriate operating system call. If one of these functions would not make sense on the object, the implementation must raise an IOError exception. For example, if a file is opened read-only, the .write() method will raise an IOError. As another example, if the object represents a socket, then .seek(), .tell(), and .truncate() will raise an IOError. .read() .write() .seek() .tell() .truncate() .close() Additionally, it defines a few other methods: (should these "is_" functions be attributes instead? "file.readable == True") .is_readable() Returns True if the object was opened for reading, False otherwise. If False, .read() will raise an IOError if called. .is_writable() Returns True if the object was opened write writing, False otherwise. If False, .write() and .truncate() will raise an IOError if called. .is_seekable() (Should this be called .is_random()? or .is_sequential() with opposite return values?) Returns True if the object supports random-access (such as disk files), or False if the object only supports sequential access (such as sockets, pipes, and ttys). If False, .seek(), .tell(), and .truncate() will raise an IOError if called. Iff a RawIOBase implementation operates on an underlying file descriptor, it must additionally provide a .fileno() member function. This could be defined specifically by the implementation, or a mix-in class could be used (Need to decide about this). .fileno() Returns the underlying file descriptor (an integer) Initially, three implementations will be provided that implement the RawIOBase interface: FileIO, SocketIO, and ByteIO (also MMapIO?). Each implementation must determine whether the object supports random access as the information provided by the user may not be sufficient (consider open("/dev/tty", "rw") or open("/tmp/named-pipe", "rw"). As an example, FileIO can determine this by calling the seek() system call; if it returns an error, the object does not support random access. Each implementation may provided additional methods appropriate to its type. The ByteIO object is analogous to Python 2's cStringIO library, but operating on the new bytes type instead of strings. Buffered I/O The next layer is the Buffer I/O layer which provides more efficient access to file-like objects. The abstract base class for all Buffered I/O implementations is BufferedIOBase, which provides similar methods to RawIOBase: .read() .write() .seek() .tell() .truncate() .close() .is_readable() .is_writable() .is_seekable() Additionally, the abstract base class provides one member variable: .raw Provides a reference to the underling RawIOBase object. The BufferIOBase methods' syntax is identical to that of RawIOBase, but may have different semantics. In particular, BufferIOBase implementations may read more data than requested or delay writing data using buffers. For the most part, this will be transparent to the user (unless, for example, they open the same file through a different descriptor). There are four implementations of the BufferIOBase abstract base class, described below. BufferedReader The BufferedReader implementation is for sequential-access read-only objects. It does not provide a .flush() method, since there is no sensible circumstance where the user would want to discard the read buffer. BufferedWriter The BufferedWriter implementation is for sequential-access write-only objects. It provides a .flush() method, which forces all cached data to be written to the underlying RawIOBase object. BufferedRWPair The BufferRWPair implementation is for sequential-access read-write objects such as sockets and ttys. As the read and write streams of these objects are completely independent, it could be implemented by simply incorporating a BufferedReader and BufferedWriter instance. It provides a .flush() method that has the same semantics as a BufferWriter's .flush() method. BufferedRandom The BufferRandom implementation is for all random-access objects, whether they are read-only, write-only, or read-write. Compared to the previous classes that operate on sequential-access objects, the BufferedRandom class must contend with the user calling .seek() to reposition the stream. Therefore, an instance of BufferRandom must keep track of both the logical and true position within the object. It provides a .flush() method that forces all cached write data to be written to the underlying RawIOBase object and all cached read data to be forgotten (so that future reads are forced to go back to the disk). Q: Do we want to mandate in the specification that switching between reading to writing on a read-write object implies a .flush()? Or is that an implementation convenience that users should not rely on? For a read-only BufferRandom object, .is_writable() returns False and the .write() and .truncate() methods throw IOError. For a write-only BufferRandom object, .is_readable() returns False and the .read() method throws IOError. Text I/O The text I/O layer provides functions to read and write strings from streams. Some new features include universal newlines and character set encoding and decoding. The Text I/O layer is defined by a TextIOBase abstract base class. It provides several methods that are similar to the BufferIOBase methods, but operate on a per-character basis instead of a per-byte basis. These methods are: .read() .write() .seek() .tell() .truncate() TextIOBase implementations also provide several methods that are pass-throughs to the underlaying BufferIOBase objects: .close() .is_readable() .is_writable() .is_seekable() TextIOBase class implementations additionally provide the following methods: .readline(self) Read until newline or EOF and return the line. .readlinesiter() Returns an iterator that returns lines from the file (which happens to be 'self'). .next() Same as readline() .__iter__() Same as readlinesiter() .__enter__() Context management protocol. Returns self. .__exit__() Context management protocol. No-op. Two implementations will be provided by the Python library. The primary implementation, TextIOWrapper, wraps a Buffered I/O object. Each TextIOWrapper object has a property name ".buffer" that provides a reference to the underlying BufferIOBase object. It's initializer has the following signature: .__init__(self, buffer, encoding=None, universal_newlines=True, crlf=None) Buffer is a reference to the BufferIOBase object to be wrapped with the TextIOWrapper. "Encoding" refers to an encoding to be used for translating between the byte-representation and character-representation. If "None", then the system's locale setting will be used as the default. If "universal_newlines" is true, then the TextIOWrapper will automatically translate the bytes "\r\n" into a single newline character during reads. If "crlf" is False, then a newline will be written as "\r\n". If "crlf" is True, then a newline will be written as "\n". If "crlf" is None, then a system-specific default will be used. Another way to do it is as follows (we should pick one or the other): .__init__(self, buffer, encoding=None, newline=None) Same as above but if newline is not None use that as the newline pattern (for reading and writing), and if newline is not set attempt to find the newline pattern from the file and if we can't for some reason use the system default newline pattern. Another implementation, StringIO, creates a file-like TextIO implementation without an underlying Buffer I/O object. While similar functionality could be provided by wrapping a BytesIO object in a Buffered I/O object in a TextIOWrapper, the String I/O object allows for much greater efficiency as it does not need to actually performing encoding and decoding. A String I/O object can just store the encoded string as-is. The String I/O object's __init__ signature is similar to the TextIOWrapper, but without the "buffer" parameter. END OF PEP From steven.bethard at gmail.com Tue Feb 27 00:00:39 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Mon, 26 Feb 2007 16:00:39 -0700 Subject: [Python-3000] Draft PEP for New IO system In-Reply-To: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com> References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com> Message-ID: On 2/26/07, Mike Verdone wrote: > Daniel Stutzbach and I have prepared a draft PEP for the new IO system > for Python 3000. Thanks for doing this! Generally, it looks pretty good. > Additionally, it defines a few other methods: > > (should these "is_" functions be attributes instead? > "file.readable == True") > > .is_readable() [snip] > .is_writable() [snip] > .is_seekable() [snip] > Additionally, the abstract base class provides one member variable: > > .raw [snip] I gather that the reason for methods instead of attributes is that it's easier to delegate to a method than it is to an attribute? That is:: def is_readable(self): return self.raw.is_readable() is easier to write than:: @property def readable(self): return self.raw.readable If that's the motivation, I'd assume that we'd want a ``get_raw()`` method instead of the ``.raw`` attribute. FWLIW, as a user, I'd rather just work with attributes. > TextIOBase class implementations additionally provide the following methods: > > .readline(self) > Read until newline or EOF and return the line. > > .readlinesiter() > Returns an iterator that returns lines from the file (which > happens to be 'self'). > > .next() > Same as readline() > > .__iter__() > Same as readlinesiter() If they do the same thing, why do we want them? I gather that the next()/readline() duplication is for backwards compatibility, but why the __iter__()/readlinesiter() duplication? > Another way to do it is as follows (we should pick one or the other): > > .__init__(self, buffer, encoding=None, newline=None) > > Same as above but if newline is not None use that as the > newline pattern (for reading and writing), and if newline is not set > attempt to find the newline pattern from the file and if we can't for > some reason use the system default newline pattern. I like this API better, but I'm not certain I understand the proposal. If I call:: TextIOWrapper(buffer, newline='\n') does that mean that any '\r\n' strings in the file will appear as '\n'? Likewise, if I call:: TextIOWrapper(buffer, newline='\r\n') does that mean that any bare '\n' strings will appear as '\r\n'? If not, how do I get universal newline support with this API? (FWLIW, I'd be happy with the you-only-see-newlines-like-you-asked-for-them semantics above.) > Another implementation, StringIO, creates a file-like TextIO > implementation without an underlying Buffer I/O object. While similar > functionality could be provided by wrapping a BytesIO object in a > Buffered I/O object in a TextIOWrapper, the String I/O object allows > for much greater efficiency as it does not need to actually performing > encoding and decoding. Sorry, I didn't understand this part. The StringIO won't have to do encoding/decoding when ``.next()`` is called? Steve -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From greg.ewing at canterbury.ac.nz Tue Feb 27 00:22:52 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 27 Feb 2007 12:22:52 +1300 Subject: [Python-3000] Weird error message from bytes type In-Reply-To: References: <9e804ac0702251535i7fc7aabiaabddbb0209ea4f1@mail.gmail.com> <20070226002930.GB3067@python.ca> <45E25736.2010404@canterbury.ac.nz> Message-ID: <45E36BCC.6030102@canterbury.ac.nz> Guido van Rossum wrote: > In 3.0 it is more generally useful (I want to > use it whenever an int is needed). but in our pre-alpha code the error > messages haven't been fixed yet. Okay, that's fine, thanks. -- Greg From guido at python.org Tue Feb 27 00:48:13 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 26 Feb 2007 17:48:13 -0600 Subject: [Python-3000] Draft PEP for New IO system In-Reply-To: References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com> Message-ID: On 2/26/07, Steven Bethard wrote: > On 2/26/07, Mike Verdone wrote: > > Daniel Stutzbach and I have prepared a draft PEP for the new IO system > > for Python 3000. > > Thanks for doing this! Generally, it looks pretty good. Agreed. I made some changes to the published doc, you may want to refresh it. > > Additionally, it defines a few other methods: > > > > (should these "is_" functions be attributes instead? > > "file.readable == True") > > > > .is_readable() > [snip] > > .is_writable() > [snip] > > .is_seekable() > [snip] These are now .readable() etc. > > Additionally, the abstract base class provides one member variable: > > > > .raw > [snip] > > I gather that the reason for methods instead of attributes is that > it's easier to delegate to a method than it is to an attribute? That > is:: > > def is_readable(self): > return self.raw.is_readable() > > is easier to write than:: > > @property > def readable(self): > return self.raw.readable > > If that's the motivation, I'd assume that we'd want a ``get_raw()`` > method instead of the ``.raw`` attribute. FWLIW, as a user, I'd > rather just work with attributes. No, the difference in API styles has more to do with that readable() etc. *may* require actual work to be done to come up with a value (especially seekable() may require one to try an lseek() syscall to see if it work). > > TextIOBase class implementations additionally provide the following methods: > > > > .readline(self) > > Read until newline or EOF and return the line. > > > > .readlinesiter() > > Returns an iterator that returns lines from the file (which > > happens to be 'self'). > > > > .next() > > Same as readline() > > > > .__iter__() > > Same as readlinesiter() > > If they do the same thing, why do we want them? I gather that the > next()/readline() duplication is for backwards compatibility, but why > the __iter__()/readlinesiter() duplication? Right. readlinesiter() is gone. > > Another way to do it is as follows (we should pick one or the other): > > > > .__init__(self, buffer, encoding=None, newline=None) > > > > Same as above but if newline is not None use that as the > > newline pattern (for reading and writing), and if newline is not set > > attempt to find the newline pattern from the file and if we can't for > > some reason use the system default newline pattern. > > I like this API better, but I'm not certain I understand the proposal. Me neither. I'll think about this some more. > If I call:: > > TextIOWrapper(buffer, newline='\n') > > does that mean that any '\r\n' strings in the file will appear as > '\n'? Likewise, if I call:: > > TextIOWrapper(buffer, newline='\r\n') > > does that mean that any bare '\n' strings will appear as '\r\n'? If > not, how do I get universal newline support with this API? (FWLIW, > I'd be happy with the you-only-see-newlines-like-you-asked-for-them > semantics above.) > > > Another implementation, StringIO, creates a file-like TextIO > > implementation without an underlying Buffer I/O object. While similar > > functionality could be provided by wrapping a BytesIO object in a > > Buffered I/O object in a TextIOWrapper, the String I/O object allows > > for much greater efficiency as it does not need to actually performing > > encoding and decoding. > > Sorry, I didn't understand this part. The StringIO won't have to do > encoding/decoding when ``.next()`` is called? The idea is that this should work like StringIO.py in Python 2.x when you only write unicode strings to it. It will then store everything as Unicode strings and the seek positions count characters, not bytes. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg.ewing at canterbury.ac.nz Tue Feb 27 00:59:36 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 27 Feb 2007 12:59:36 +1300 Subject: [Python-3000] pre-PEP: Enhancing buffer protocol (tp_as_buffer) In-Reply-To: References: Message-ID: <45E37468.9060400@canterbury.ac.nz> Travis Oliphant wrote: > If they can't do it easily, then they don't have to define the > release-function and Python will never call it. The case I'm worried about is where the data can move, so it really *needs* to be locked, yet the object has no way of ensuring that. It would be impossible for the object to correctly implement this kind of buffer protocol. > If you want shape information you are going to have to allocate memory. But only when the shape changes, not every time you want a pointer to the memory. I like Guido's idea of separating the shape/type info from getting the memory pointer. -- Greg From guido at python.org Tue Feb 27 01:16:28 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 26 Feb 2007 18:16:28 -0600 Subject: [Python-3000] pre-PEP: Enhancing buffer protocol (tp_as_buffer) In-Reply-To: <45E37468.9060400@canterbury.ac.nz> References: <45E37468.9060400@canterbury.ac.nz> Message-ID: On 2/26/07, Greg Ewing wrote: > Travis Oliphant wrote: > > > If they can't do it easily, then they don't have to define the > > release-function and Python will never call it. > > The case I'm worried about is where the data can move, > so it really *needs* to be locked, yet the object has > no way of ensuring that. It would be impossible for > the object to correctly implement this kind of buffer > protocol. Are you aware of an object that has such a requirement? I would think that the object is in charge of moving its own buffer. If it doesn't have control over when the buffer moves it shouldn't claim to implement the buffer protocol. > > If you want shape information you are going to have to allocate memory. > > But only when the shape changes, not every time you > want a pointer to the memory. > > I like Guido's idea of separating the shape/type info > from getting the memory pointer. Well it's not my area of expertise. I thought that in order to describe a generalized 3d slice of a 3d array you might need offsets for at least some of the dimensions, but I could be wrong. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Feb 27 01:30:40 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 26 Feb 2007 18:30:40 -0600 Subject: [Python-3000] Pre-PEP: Simple input built-in in Python 3000 In-Reply-To: References: <7528bcdd0612220545u147f07a4gb476dd43733dfe46@mail.gmail.com> <7528bcdd0702191032n4347e8c8p6987553deb9be445@mail.gmail.com> <45DAFEDE.4030109@gmail.com> <7528bcdd0702200901r62f8cc4fu7ea7f1e59725e4b6@mail.gmail.com> Message-ID: We implemented this at today's sprint. Andre wrote the transformations for the 2to3 tools, I copied the raw_input() implementation from the trunk back into the p3yk branch. Thanks Andre for your efforts in writing the PEP, pushing for its implementation, and writing the transformations! --Guido On 2/20/07, Guido van Rossum wrote: > Consider the PEP accepted. > > Regarding the conversion, please do use the sandbox/2to3 framework. > Write me if you have trouble understanding the many examples already > in fixes/. > > On 2/20/07, Andre Roberge wrote: > > On 2/20/07, Guido van Rossum wrote: > > > Why do you want this *before* PyCon? It would be much easier to do > > > this as part of the Py3k sprint. > > > > > > > My main interest was to have, prior to Pycon, the PEP recorded as > > such; it had been close to 2 months since the last post on this issue > > on the list. > > > > As for the actual work, I'd be willing to volunteer to write the > > required code (with test cases) that could be use to do the conversion > > input(...) -> eval(input(...)) > > raw_input(...) -> input(...) > > > > Unfortunately, I will not be participating in any sprints. > > > > Andr? > > > > > > > > > On 2/20/07, Nick Coghlan wrote: > > > > Andre Roberge wrote: > > > > > Any possibility that (some of) the following can be done before Pycon? > > > > > Respectfully yours, > > > > > Andr? Roberge > > > > > > > > I've added the PEP as 3111. I made a few small modifications (and > > > > committed it directly as Accepted) based on Guido's comments in this thread. > > > > > > > > The actual change still needs to be made, though. > > > > > > > > Cheers, > > > > Nick. > > > > > > > > -- > > > > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > > > > --------------------------------------------------------------- > > > > http://www.boredomandlaziness.org > > > > _______________________________________________ > > > > Python-3000 mailing list > > > > Python-3000 at python.org > > > > http://mail.python.org/mailman/listinfo/python-3000 > > > > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > > > > > > > > > > > > > -- > > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > > > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg.ewing at canterbury.ac.nz Tue Feb 27 01:37:37 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 27 Feb 2007 13:37:37 +1300 Subject: [Python-3000] Weird error message from bytes type In-Reply-To: <45E2B21C.1030101@gmail.com> References: <9e804ac0702251535i7fc7aabiaabddbb0209ea4f1@mail.gmail.com> <20070226002930.GB3067@python.ca> <45E25736.2010404@canterbury.ac.nz> <45E2B21C.1030101@gmail.com> Message-ID: <45E37D51.4050201@canterbury.ac.nz> Nick Coghlan wrote: > For example, simply > replacing 'index' with 'integer' could lead to a different kind of > confusion: > TypeError: 'float' object cannot be interpreted as an integer Maybe something like TypeError: 'float' object cannot be used as an integer in this context Or maybe require the caller to pass in an error message format, or at least have a version of the call which allows that. -- Greg From greg.ewing at canterbury.ac.nz Tue Feb 27 01:39:49 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 27 Feb 2007 13:39:49 +1300 Subject: [Python-3000] Thoughts on new I/O library and bytecode In-Reply-To: <20070225194742.AE4B.JCARLSON@uci.edu> References: <9e804ac0702251429i36623d1bg7ea69d0ad7945429@mail.gmail.com> <45E25025.6020107@canterbury.ac.nz> <20070225194742.AE4B.JCARLSON@uci.edu> Message-ID: <45E37DD5.2050809@canterbury.ac.nz> Josiah Carlson wrote: > Greg Ewing wrote: > >> $[68656C6C6F] > > I think it's a bad idea to choose a representation with any format that > isn't able to do the eval(repr(obj)) loop. The intention was for that to be a valid literal syntax as well. > It may be the case > that b"stuff" is the most concise and reasonable repr form... I can only see it being the most concise when most of the bytes can be meaningfully interpreted as characters. Otherwise it's full of \xyy escapes, making it up to twice as long as necessary and harder to read. I can't help feeling the people arguing for b"..." as the repr format haven't really accepted the fact that text and binary data will be distinct things in py3k, and are thinking of bytes as being a replacement for the old string type. But that's not true -- most of the time, *unicode* will be the replacement for str when it is used to represent characters, and bytes will mostly be used only for non-text. I know that there will be exceptions, such as when writing code to deal with raw SMTP connections and such like. But how often do people write code like that? Usually it's written once and put in a library. I think these cases will be in the minority. Guido wrote: > If you want all hex, use the .hex() method described in the PEP. That seems back-to-front to me. The default repr should not be making assumptions about the meaning of the bytes. It would make more sense to have a .chars() method or something for when you want it interpreted that way. -- Greg From tdelaney at avaya.com Tue Feb 27 01:55:39 2007 From: tdelaney at avaya.com (Delaney, Timothy (Tim)) Date: Tue, 27 Feb 2007 11:55:39 +1100 Subject: [Python-3000] Weird error message from bytes type Message-ID: <2773CAC687FD5F4689F526998C7E4E5F07446D@au3010avexu1.global.avaya.com> Greg Ewing wrote: > Nick Coghlan wrote: >> For example, simply >> replacing 'index' with 'integer' could lead to a different kind of >> confusion: TypeError: 'float' object cannot be interpreted as an >> integer > > Maybe something like > > TypeError: 'float' object cannot be used as an integer in this > context Going back to the original proposal: >>> x = b'a' >>> x[0] = b'a' Traceback (most recent call last): File "", line 1, in TypeError: 'bytes' object cannot be used as an integer in this context Not too bad. +1 as a default. > Or maybe require the caller to pass in an error message > format, or at least have a version of the call which > allows that. +1 >>> x = b'a' >>> x[0] = b'a' Traceback (most recent call last): File "", line 1, in TypeError: Cannot assign 'bytes' object to 'bytes' element >>> x = b'a' >>> x[0] = [1] Traceback (most recent call last): File "", line 1, in TypeError: Cannot assign 'list' object to 'bytes' element Tim Delaney From p.f.moore at gmail.com Tue Feb 27 11:57:27 2007 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 27 Feb 2007 10:57:27 +0000 Subject: [Python-3000] Draft PEP for New IO system In-Reply-To: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com> References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com> Message-ID: <79990c6b0702270257w459129bbp566385ec1e5f2646@mail.gmail.com> On 26/02/07, Mike Verdone wrote: > Daniel Stutzbach and I have prepared a draft PEP for the new IO system > for Python 3000. This document is, hopefully, true to the info that > Guido wrote on the whiteboards here at PyCon. This is still a draft > and there's quite a few decisions that need to be made. Feedback is > welcomed. Generally, this looks nice. A couple of minor points: > The new IO spec is intended to be similar to the Java IO libraries, > but generally less confusing. Programmers who don't want to muck about > in the new IO world can expect that the open() factory method will > produce an object backwards-compatible with old-style file objects. Documenting the revised open() factory in this PEP would be useful. It needs to address encoding issues, so it's not a simple copy of the existing open(). Also, should there be a factory method for opening raw byte streams? Once we start down this route, we open the can of worms, of course (does socket.socket need to be specified in terms of the new IO layers? what about the mmap module, the gzip/zipfile/tarfile modules, etc?) These sould probably be noted in an "open issues" section, and otherwise deferred for now. > The BufferedReader implementation is for sequential-access read-only > objects. It does not provide a .flush() method, since there is no > sensible circumstance where the user would want to discard the read > buffer. It's not something I've done personally, but programs sometimes flush a read buffer before (eg) reading a password from stdin, to avoid typeahead problems. I don't know if that would be relevant here. > .readlinesiter() > .__iter__() I was going to object to the name readlinesiter, but I see it's gone already :-) > Another way to do it is as follows (we should pick one or the other): > > .__init__(self, buffer, encoding=None, newline=None) > > Same as above but if newline is not None use that as the > newline pattern (for reading and writing), and if newline is not set > attempt to find the newline pattern from the file and if we can't for > some reason use the system default newline pattern. I'm not sure that can work - the point of universal newlines is that *any* of \n, \r or \r\n count as a newline, so there's no one pattern. So I think that explicitly specifying universal newlines is necessary (even though it's clunky). Regards, Paul. From rhamph at gmail.com Tue Feb 27 16:00:37 2007 From: rhamph at gmail.com (Adam Olsen) Date: Tue, 27 Feb 2007 08:00:37 -0700 Subject: [Python-3000] Draft PEP for New IO system In-Reply-To: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com> References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com> Message-ID: On 2/26/07, Mike Verdone wrote: > Text I/O > The text I/O layer provides functions to read and write strings from > streams. Some new features include universal newlines and character > set encoding and decoding. The Text I/O layer is defined by a > TextIOBase abstract base class. It provides several methods that are > similar to the BufferIOBase methods, but operate on a per-character > basis instead of a per-byte basis. These methods are: "per-character" needs some clarification. I'm guessing this will only return entire code points, but the unicode type will expose them as code units, so it could be seen as both per-code-point and per-code-unit. To be really pedantic, neither of them are truly "per-character" in unicode parlance, despite the fact that they store "character data". -- Adam Olsen, aka Rhamphoryncus From steven.bethard at gmail.com Tue Feb 27 16:09:21 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Tue, 27 Feb 2007 08:09:21 -0700 Subject: [Python-3000] Draft PEP for New IO system In-Reply-To: <79990c6b0702270257w459129bbp566385ec1e5f2646@mail.gmail.com> References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com> <79990c6b0702270257w459129bbp566385ec1e5f2646@mail.gmail.com> Message-ID: On 2/27/07, Paul Moore wrote: > > .__init__(self, buffer, encoding=None, newline=None) > > > > Same as above but if newline is not None use that as the > > newline pattern (for reading and writing), and if newline is not set > > attempt to find the newline pattern from the file and if we can't for > > some reason use the system default newline pattern. > > I'm not sure that can work - the point of universal newlines is that > *any* of \n, \r or \r\n count as a newline, so there's no one pattern. > So I think that explicitly specifying universal newlines is necessary > (even though it's clunky). Maybe there could be a special UNIVERSAL constant, so you'd write something like:: TextIOWrapper(buffer, newline=UNIVERSAL) ? Steve -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From guido at python.org Tue Feb 27 17:38:23 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 27 Feb 2007 10:38:23 -0600 Subject: [Python-3000] Draft PEP for New IO system In-Reply-To: <79990c6b0702270257w459129bbp566385ec1e5f2646@mail.gmail.com> References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com> <79990c6b0702270257w459129bbp566385ec1e5f2646@mail.gmail.com> Message-ID: On 2/27/07, Paul Moore wrote: [...] > Documenting the revised open() factory in this PEP would be useful. It > needs to address encoding issues, so it's not a simple copy of the > existing open(). Check the doc again. I added on at the end. It could use some review. I also added an elaboration into the p3yk branch in svn; that could use some review as well. > Also, should there be a factory method for opening raw byte streams? The open() I added returns a raw byte stream when you specify binary mode with buffering=0. > Once we start down this route, we open the can of worms, of course > (does socket.socket need to be specified in terms of the new IO > layers? No, but check the io.py in svn; it has a SocketIO class that wraps a socket. Sockets themselves are much lower level than this; they have all sort of other APIs. The SocketIO class only works for stream socket (e.g., TCP/IO). > what about the mmap module, the gzip/zipfile/tarfile modules, > etc?) These sould probably be noted in an "open issues" section, and > otherwise deferred for now. Agreed that we should add these to the open issues section. I don't think we should mess with mmap, but *perhaps* a mmap wrapper could be provided (by the mmap module). gzip, bzip2 etc. should probably be redefined in terms of the buffered (bytes) reader/writer protocol. zipfile and tarfile should take bytes readers/writers; the API they *provide* should be defined in terms of bytes and perhaps (when appropriate, I don't recall if they have read/write methods) in terms of buffered byte streams. It *may* even be useful if many of these would support non-blocking I/O; we're currently considering adding a standard API for returning "EWOULDBLOCK" errors (e.g. return None from read() and write()) -- though we won't be providing an API to turn that on (since it depends on the underlying implementation, e.g. sockets vs. files). > > The BufferedReader implementation is for sequential-access read-only > > objects. It does not provide a .flush() method, since there is no > > sensible circumstance where the user would want to discard the read > > buffer. > > It's not something I've done personally, but programs sometimes flush > a read buffer before (eg) reading a password from stdin, to avoid > typeahead problems. I don't know if that would be relevant here. We discussed this briefly at the sprint and came to the conclusion that this is outside the scope of the PEP; you can do this by (somehow) enabling non-blocking mode and then reading until you get None. > > Another way to do it is as follows (we should pick one or the other): > > > > .__init__(self, buffer, encoding=None, newline=None) > > > > Same as above but if newline is not None use that as the > > newline pattern (for reading and writing), and if newline is not set > > attempt to find the newline pattern from the file and if we can't for > > some reason use the system default newline pattern. > > I'm not sure that can work - the point of universal newlines is that > *any* of \n, \r or \r\n count as a newline, so there's no one pattern. > So I think that explicitly specifying universal newlines is necessary > (even though it's clunky). I think for input we should always accept all three line endings so you never need to specify anything; for output, we should pick a platform default (\r\n on Windows, \n everywhere else) and have an API to override it. So the API you quote above sounds about right: .__init__(self, buffer, encoding=None, newline=None) I'd like to constrain newline to be either \n or \r\n for writing; for reading IMO it should not be specified. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From p.f.moore at gmail.com Tue Feb 27 17:59:24 2007 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 27 Feb 2007 16:59:24 +0000 Subject: [Python-3000] Draft PEP for New IO system In-Reply-To: References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com> <79990c6b0702270257w459129bbp566385ec1e5f2646@mail.gmail.com> Message-ID: <79990c6b0702270859s55ba98can384f45dc2cd47778@mail.gmail.com> On 27/02/07, Guido van Rossum wrote: > On 2/27/07, Paul Moore wrote: > [...] > > Documenting the revised open() factory in this PEP would be useful. It > > needs to address encoding issues, so it's not a simple copy of the > > existing open(). > > Check the doc again. I added on at the end. It could use some review. > I also added an elaboration into the p3yk branch in svn; that could > use some review as well. Sorry, I hadn't checked the updated version. I'll take a look. [...] > I think for input we should always accept all three line endings so > you never need to specify anything; for output, we should pick a > platform default (\r\n on Windows, \n everywhere else) and have an API > to override it. So the API you quote above sounds about right: > > .__init__(self, buffer, encoding=None, newline=None) > > I'd like to constrain newline to be either \n or \r\n for writing; for > reading IMO it should not be specified. Ah. If that's the intent, I agree - in effect universal newlines is always on, and output uses platform semantics unless you force it to be overridden. Forcing only \n or \r\n sounds fine to me. Paul. From jimjjewett at gmail.com Tue Feb 27 19:22:48 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 27 Feb 2007 13:22:48 -0500 Subject: [Python-3000] Draft PEP for New IO system In-Reply-To: References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com> <79990c6b0702270257w459129bbp566385ec1e5f2646@mail.gmail.com> Message-ID: On 2/27/07, Guido van Rossum wrote: > On 2/27/07, Paul Moore wrote: > It *may* even be useful if many of these would support non-blocking > I/O; we're currently considering adding a standard API for returning > "EWOULDBLOCK" errors (e.g. return None from read() and write()) -- > though we won't be providing an API to turn that on (since it depends > on the underlying implementation, e.g. sockets vs. files). I thought the point of the IO subsystem was to abstract away those differences. Trying to set (non-)blocking may raise an exception on some streams, but that still seems better than having to know the internal details before you can even ask. > > > The BufferedReader implementation is for sequential-access read-only > > > objects. It does not provide a .flush() method, since there is no > > > sensible circumstance where the user would want to discard the read > > > buffer. > > ... typeahead problems. > ... outside the scope of the PEP; you can do this by > (somehow) enabling non-blocking mode and then reading until you get > None. That does sound like a use case, and flush() is the obvious method. Are you concerned that having the (rarely needed) method available may be an attractive nuisance or source of confusion? > I think for input we should always accept all three line endings so > you never need to specify anything; for output, we should pick ... So saving a text file can cause (whitespace) changes all over? That might be OK, but it should at least be called out, so that editors wanting minimal change will know that they have to implement their own Text layer. -jJ From jimjjewett at gmail.com Tue Feb 27 19:41:46 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 27 Feb 2007 13:41:46 -0500 Subject: [Python-3000] Draft PEP for New IO system In-Reply-To: References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com> Message-ID: On 2/27/07, Adam Olsen wrote: > On 2/26/07, Mike Verdone wrote: > > Text I/O > > ... operate on a per-character basis instead of a per-byte basis. > "per-character" needs some clarification. I'm guessing this will only > return entire code points, but the unicode type will expose them as > code units, so it could be seen as both per-code-point and > per-code-unit. Does this just mean that you assume (1) UTF32 (2) surrogate pairs will show up as two characters (3) diacritics may (or may not) show up separately from their base characters? This does suggest that error-correction should be specified (or at least explicitly not specified). If the underlying input byte-stream contains an invalid sequence, will the TextIO raise a UnicodeDecodeError? Or will its error/replace/delete behavior be settable? Does the Text class promise to catch things like an invalid combination of surrogates? -jJ From guido at python.org Tue Feb 27 19:51:47 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 27 Feb 2007 12:51:47 -0600 Subject: [Python-3000] Draft PEP for New IO system In-Reply-To: References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com> <79990c6b0702270257w459129bbp566385ec1e5f2646@mail.gmail.com> Message-ID: On 2/27/07, Jim Jewett wrote: > On 2/27/07, Guido van Rossum wrote: > > On 2/27/07, Paul Moore wrote: > > > It *may* even be useful if many of these would support non-blocking > > I/O; we're currently considering adding a standard API for returning > > "EWOULDBLOCK" errors (e.g. return None from read() and write()) -- > > though we won't be providing an API to turn that on (since it depends > > on the underlying implementation, e.g. sockets vs. files). > > I thought the point of the IO subsystem was to abstract away those differences. We will abstract away the differences of how you *use* a stream that's in non-blocking (or timeout) mode. but we can't abstract away the APIs used to *request* those modes since the APi depends on the abilities of the system object -- sockets, pipes and disk files all have different semantics here. > Trying to set (non-)blocking may raise an exception on some streams, > but that still seems better than having to know the internal details > before you can even ask. I doubt it -- non-blocking mode is pretty specialized. I want it to be *possible* to use the new I/O library with file descriptors that can return EWOULDBLOCK; I don't necessarily want to make it *easy*. > > > > The BufferedReader implementation is for sequential-access read-only > > > > objects. It does not provide a .flush() method, since there is no > > > > sensible circumstance where the user would want to discard the read > > > > buffer. > > > > ... typeahead problems. > > > ... outside the scope of the PEP; you can do this by > > (somehow) enabling non-blocking mode and then reading until you get > > None. > > That does sound like a use case, and flush() is the obvious method. No it isn't. Calling flush() for writing has no semantics at the highest-level abstraction: you can insert flush() calls whenever you want or omit them and the data will still be written; the only time you care is when the abstraction is broken and you lose a buffer due to a segfault etc. The semantics of this use case are very different; perhaps we can add a reset() or discard() method which throws away the buffer contents but that's as far as I want to go. The passwd-reading example ought to be hidden in the getpass module. > Are you concerned that having the (rarely needed) method available may > be an attractive nuisance or source of confusion? Perhaps; people will latch on to a name and call it; or they will mindlessly copy code that happens to contain it and a new voodoo religion or superstition is easily born. Also whether this makes sense or not depends a lot on what kind of device you are reading; I can't imagine a socket use case for example. > > I think for input we should always accept all three line endings so > > you never need to specify anything; for output, we should pick ... > > So saving a text file can cause (whitespace) changes all over? It would only normalize line endings, but yeah. > That might be OK, but it should at least be called out, so that > editors wanting minimal change will know that they have to implement > their own Text layer. I expect them to do that anyway. But I would not be against being able to specify newline="\n" on input and have it mean that \r\n line endings remain in the data where present. I'm not sure that I would like newline="\r\n" to mean that a lone \n should not be considered a line ending, even if some stupid Windows apps behave that way. A compromise would be to support what "U" mode currently does -- it makes the line endings actually encountered available as an attribute on the file. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Feb 27 20:02:20 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 27 Feb 2007 13:02:20 -0600 Subject: [Python-3000] Draft PEP for New IO system In-Reply-To: References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com> Message-ID: The encoding/decoding behavior should be no different from that of the encode() and decode() methods on unicode strings and byte arrays. Certainly no normalization of diacritics will be done; surrogate handling depends on the encoding and whether the unicode string implementation uses 16 or 32 bits per character. I agree that we need to be able to specify the error handling as well. UnicodeErrors may be raised. --Guido On 2/27/07, Jim Jewett wrote: > On 2/27/07, Adam Olsen wrote: > > On 2/26/07, Mike Verdone wrote: > > > Text I/O > > > ... operate on a per-character basis instead of a per-byte basis. > > > "per-character" needs some clarification. I'm guessing this will only > > return entire code points, but the unicode type will expose them as > > code units, so it could be seen as both per-code-point and > > per-code-unit. > > Does this just mean that you assume > (1) UTF32 > (2) surrogate pairs will show up as two characters > (3) diacritics may (or may not) show up separately from their base characters? > > This does suggest that error-correction should be specified (or at > least explicitly not specified). If the underlying input byte-stream > contains an invalid sequence, will the TextIO raise a > UnicodeDecodeError? Or will its error/replace/delete behavior be > settable? > > Does the Text class promise to catch things like an invalid > combination of surrogates? > > -jJ > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jimjjewett at gmail.com Tue Feb 27 20:18:31 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 27 Feb 2007 14:18:31 -0500 Subject: [Python-3000] Draft PEP for New IO system In-Reply-To: References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com> <79990c6b0702270257w459129bbp566385ec1e5f2646@mail.gmail.com> Message-ID: On 2/27/07, Guido van Rossum wrote: > On 2/27/07, Jim Jewett wrote: > > Trying to set (non-)blocking may raise an exception on some streams, > > but that still seems better than having to know the internal details > > before you can even ask. > I doubt it -- non-blocking mode is pretty specialized. I want it to be > *possible* to use the new I/O library with file descriptors that can > return EWOULDBLOCK; I don't necessarily want to make it *easy*. Rewording to see if I understand: source.read() will always block, unless something out-of-band has changed it. *If* it has been changed out-of-band, then None is used to indicate this. Therefore, normal code can ignore the possibility, or (to be really robust against someone else messing with the input stream) add an "if result is None: continue" clause to its loops. > No it isn't. Calling flush() for writing has no semantics at the > highest-level abstraction: Are you saying that flush() need not be a blocking operation? That makes it a bit hard to force interaction. >> So saving a text file can cause (whitespace) changes all over? > It would only normalize line endings, but yeah. > > That might be OK, but it should at least be called out, so that > > editors wanting minimal change will know that they have to implement > > their own Text layer. > I expect them to do that anyway. I don't. Wanting to minimize diffs doesn't imply any interest in unicode. > But I would not be against being able > to specify newline="\n" on input and have it mean that \r\n line > endings remain in the data where present. That sort of passthrough mode is enough for me. Thank you. -jJ From guido at python.org Tue Feb 27 21:39:25 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 27 Feb 2007 14:39:25 -0600 Subject: [Python-3000] Draft PEP for New IO system In-Reply-To: References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com> <79990c6b0702270257w459129bbp566385ec1e5f2646@mail.gmail.com> Message-ID: On 2/27/07, Jim Jewett wrote: > On 2/27/07, Guido van Rossum wrote: > > On 2/27/07, Jim Jewett wrote: > > > > Trying to set (non-)blocking may raise an exception on some streams, > > > but that still seems better than having to know the internal details > > > before you can even ask. > > > I doubt it -- non-blocking mode is pretty specialized. I want it to be > > *possible* to use the new I/O library with file descriptors that can > > return EWOULDBLOCK; I don't necessarily want to make it *easy*. > > Rewording to see if I understand: > > source.read() will always block, unless something out-of-band has changed it. > > *If* it has been changed out-of-band, then None is used to indicate this. Imprecise language, but I understand what you mean. More exacgt would be None is returned instead of raising an IOError with errno set to EWOULDBLOCK (or whatever its equivalent on Windows). > Therefore, normal code can ignore the possibility, or (to be really > robust against someone else messing with the input stream) add an "if > result is None: continue" clause to its loops. No, since that would mean busy-waiting while the I/O isn't ready, unless there's a select or similar at the top of the loop, in which case you're not "normal code". Better raise an exception if you get this. Better even not to check for this at all if you're not prepared to handle it -- attempting to use None as a string will raise an exception for you. You could also treat it as EOF. > > No it isn't. Calling flush() for writing has no semantics at the > > highest-level abstraction: > > Are you saying that flush() need not be a blocking operation? > That makes it a bit hard to force interaction. I didn't intend to say that. Depending on whether and how often you call flush(), the other side could see your bytes at different times, but it should see the same data in the same order regardless (except if you never flush your final writes). FWIW we just discovered that the buffered writers need a __del__ method that calls flush()... > >> So saving a text file can cause (whitespace) changes all over? > > > It would only normalize line endings, but yeah. > > > > That might be OK, but it should at least be called out, so that > > > editors wanting minimal change will know that they have to implement > > > their own Text layer. > > > I expect them to do that anyway. > > I don't. Wanting to minimize diffs doesn't imply any interest in unicode. > > > But I would not be against being able > > to specify newline="\n" on input and have it mean that \r\n line > > endings remain in the data where present. > > That sort of passthrough mode is enough for me. Thank you. OK, I'll update the PEP text. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From walter at livinglogic.de Tue Feb 27 21:39:48 2007 From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=) Date: Tue, 27 Feb 2007 21:39:48 +0100 Subject: [Python-3000] Draft PEP for New IO system In-Reply-To: References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com> Message-ID: <45E49714.9060003@livinglogic.de> Guido van Rossum wrote: > The encoding/decoding behavior should be no different from that of the > encode() and decode() methods on unicode strings and byte arrays. Except that it must work in incremental mode. The new (in 2.5) incremental codecs should be usable for that. > Certainly no normalization of diacritics will be done; surrogate > handling depends on the encoding and whether the unicode string > implementation uses 16 or 32 bits per character. > > I agree that we need to be able to specify the error handling as well. Should it be possible to change the error handling during the lifetime of a stream? Then this change would have to be passed through to the underlying codec. > UnicodeErrors may be raised. Servus, Walter > On 2/27/07, Jim Jewett wrote: >> On 2/27/07, Adam Olsen wrote: >>> On 2/26/07, Mike Verdone wrote: >>>> Text I/O >>>> ... operate on a per-character basis instead of a per-byte basis. >>> "per-character" needs some clarification. I'm guessing this will only >>> return entire code points, but the unicode type will expose them as >>> code units, so it could be seen as both per-code-point and >>> per-code-unit. >> Does this just mean that you assume >> (1) UTF32 >> (2) surrogate pairs will show up as two characters >> (3) diacritics may (or may not) show up separately from their base characters? >> >> This does suggest that error-correction should be specified (or at >> least explicitly not specified). If the underlying input byte-stream >> contains an invalid sequence, will the TextIO raise a >> UnicodeDecodeError? Or will its error/replace/delete behavior be >> settable? >> >> Does the Text class promise to catch things like an invalid >> combination of surrogates? >> >> -jJ >> _______________________________________________ >> Python-3000 mailing list >> Python-3000 at python.org >> http://mail.python.org/mailman/listinfo/python-3000 >> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org >> > > From guido at python.org Tue Feb 27 21:44:25 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 27 Feb 2007 14:44:25 -0600 Subject: [Python-3000] Draft PEP for New IO system In-Reply-To: <45E49714.9060003@livinglogic.de> References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com> <45E49714.9060003@livinglogic.de> Message-ID: On 2/27/07, Walter D?rwald wrote: > Guido van Rossum wrote: > > > The encoding/decoding behavior should be no different from that of the > > encode() and decode() methods on unicode strings and byte arrays. > > Except that it must work in incremental mode. The new (in 2.5) > incremental codecs should be usable for that. Thanks for reminding! Do the incremental codecs have internal state? I wonder how this interacts with non-blocking reads. (I know next-to-nothing about incremental codecs beyond that they exist. :-) > > Certainly no normalization of diacritics will be done; surrogate > > handling depends on the encoding and whether the unicode string > > implementation uses 16 or 32 bits per character. > > > > I agree that we need to be able to specify the error handling as well. > > Should it be possible to change the error handling during the lifetime > of a stream? Then this change would have to be passed through to the > underlying codec. Not unless you have a really good use case handy... -- --Guido van Rossum (home page: http://www.python.org/~guido/) From oliphant.travis at ieee.org Tue Feb 27 22:14:11 2007 From: oliphant.travis at ieee.org (Travis E. Oliphant) Date: Tue, 27 Feb 2007 14:14:11 -0700 Subject: [Python-3000] pre-PEP: Enhancing buffer protocol (tp_as_buffer) In-Reply-To: References: Message-ID: Guido van Rossum wrote: > On 2/26/07, Travis Oliphant wrote: >> Guido van Rossum wrote: >> Great. I have no problem with this. Is your idea of locking the same >> as mine (i.e. a function in the API for release?) > > Right. My understanding of this locking mechanism would require objects that wish to use it to keep track of how many views they have "exported" and refuse to re-allocate memory until the views have all been released. In my understanding this would require the addition of at least one integer to the object structure. So, for example, the bytesobject would need to at least add int ob_views to it's C-structure: /* Object layout */ typedef struct { PyObject_VAR_HEAD Py_ssize_t ob_alloc; /* How many bytes allocated */ int ob_views; /* Number of views to these bytes */ char *ob_bytes; } PyBytesObject; On creation, ob_views would be initialized to 0 and whenever getbuffer was called it would increase this number and whenever releasebuffer was called it would decrease this number. Am I missing something here? -Travis From guido at python.org Tue Feb 27 22:18:33 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 27 Feb 2007 15:18:33 -0600 Subject: [Python-3000] pre-PEP: Enhancing buffer protocol (tp_as_buffer) In-Reply-To: References: Message-ID: On 2/27/07, Travis E. Oliphant wrote: > My understanding of this locking mechanism would require objects that > wish to use it to keep track of how many views they have "exported" and > refuse to re-allocate memory until the views have all been released. Right. > In my understanding this would require the addition of at least one > integer to the object structure. Right. > So, for example, the bytesobject would need to at least add > > int ob_views > > to its C-structure: > > /* Object layout */ > typedef struct { > PyObject_VAR_HEAD > Py_ssize_t ob_alloc; /* How many bytes allocated */ > int ob_views; /* Number of views to these bytes */ > char *ob_bytes; > } PyBytesObject; > > On creation, ob_views would be initialized to 0 and whenever getbuffer > was called it would increase this number and whenever releasebuffer was > called it would decrease this number. > > Am I missing something here? I don't think so -- this is exactly what I was thinking of. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From walter at livinglogic.de Tue Feb 27 22:27:02 2007 From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=) Date: Tue, 27 Feb 2007 22:27:02 +0100 Subject: [Python-3000] Draft PEP for New IO system In-Reply-To: References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com> <45E49714.9060003@livinglogic.de> Message-ID: <45E4A226.9010908@livinglogic.de> Guido van Rossum wrote: > On 2/27/07, Walter D?rwald wrote: >> Guido van Rossum wrote: >> >> > The encoding/decoding behavior should be no different from that of the >> > encode() and decode() methods on unicode strings and byte arrays. >> >> Except that it must work in incremental mode. The new (in 2.5) >> incremental codecs should be usable for that. > > Thanks for reminding! Do the incremental codecs have internal state? They might have, however in all *decoding* cases (except the CJK codecs, which I know nothing about) this is just undecoded input. E.g. if the UTF-16-LE incremental decoder (which is a BufferedIncrementalDecoder) gets passed an odd number of bytes in the decode() call, it decodes as much as possible and keeps the last byte in a buffer, which will be reused on the next call to decode(). AFAICR the only *encoder* that keeps state is the UTF-16 encoder: it has to remember whether a BOM has been output. I don't know whether the CJK codecs do keep any state besides undecoded input for decoding. (E.g. a greedy UTF-7 incremental decoder might have to). > I > wonder how this interacts with non-blocking reads. Non-blocking reads where the reason for implementing the incremental codecs: The codec decodes as much of the available input as possible and keeps the undecoded rest until the next decode() call. > (I know > next-to-nothing about incremental codecs beyond that they exist. :-) The basic principle is that these codecs can encode strings and decode bytes in multiple chunks. If you want to encode a unicode string u in UTF-16 you can do it in one go: s = u.encode("utf-16") or character by character: encoder = codecs.lookup("utf-16").incrementalencoder() s = "".join(encoder.encode(c) for c in u) + encoder.encode(u"", True) The incremental encoder makes sure, that the result contains only one BOM. Decoding works in the same way: decoder = codecs.lookup("utf-16").incrementaldecoder() u = u"".join(decoder.decode(c) for c in s) + decoder.decode("", True) >> > Certainly no normalization of diacritics will be done; surrogate >> > handling depends on the encoding and whether the unicode string >> > implementation uses 16 or 32 bits per character. >> > >> > I agree that we need to be able to specify the error handling as well. >> >> Should it be possible to change the error handling during the lifetime >> of a stream? Then this change would have to be passed through to the >> underlying codec. > > Not unless you have a really good use case handy... Not for decoding, but for encoding: If you're outputting XML and use an encoding that can't encode all unicode characters, then it makes sense to switch to "xmlcharrefreplace" error handling during the output of text nodes (and back to "strict" for element names etc.). Servus, Walter From guido at python.org Tue Feb 27 22:37:29 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 27 Feb 2007 15:37:29 -0600 Subject: [Python-3000] Draft PEP for New IO system In-Reply-To: <45E4A226.9010908@livinglogic.de> References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com> <45E49714.9060003@livinglogic.de> <45E4A226.9010908@livinglogic.de> Message-ID: On 2/27/07, Walter D?rwald wrote: > Guido van Rossum wrote: > > > On 2/27/07, Walter D?rwald wrote: > >> Guido van Rossum wrote: > >> > >> > The encoding/decoding behavior should be no different from that of the > >> > encode() and decode() methods on unicode strings and byte arrays. > >> > >> Except that it must work in incremental mode. The new (in 2.5) > >> incremental codecs should be usable for that. > > > > Thanks for reminding! Do the incremental codecs have internal state? > > They might have, however in all *decoding* cases (except the CJK codecs, > which I know nothing about) this is just undecoded input. E.g. if the > UTF-16-LE incremental decoder (which is a BufferedIncrementalDecoder) > gets passed an odd number of bytes in the decode() call, it decodes as > much as possible and keeps the last byte in a buffer, which will be > reused on the next call to decode(). > > AFAICR the only *encoder* that keeps state is the UTF-16 encoder: it has > to remember whether a BOM has been output. > > I don't know whether the CJK codecs do keep any state besides undecoded > input for decoding. (E.g. a greedy UTF-7 incremental decoder might have to). > > > I > > wonder how this interacts with non-blocking reads. > > Non-blocking reads where the reason for implementing the incremental > codecs: The codec decodes as much of the available input as possible and > keeps the undecoded rest until the next decode() call. > > > (I know > > next-to-nothing about incremental codecs beyond that they exist. :-) > > The basic principle is that these codecs can encode strings and decode > bytes in multiple chunks. If you want to encode a unicode string u in > UTF-16 you can do it in one go: > s = u.encode("utf-16") > or character by character: > encoder = codecs.lookup("utf-16").incrementalencoder() > s = "".join(encoder.encode(c) for c in u) + encoder.encode(u"", True) > The incremental encoder makes sure, that the result contains only one BOM. > > Decoding works in the same way: > decoder = codecs.lookup("utf-16").incrementaldecoder() > u = u"".join(decoder.decode(c) for c in s) + decoder.decode("", True) Thanks for the explanations, it is a little bit clearer now! > >> > Certainly no normalization of diacritics will be done; surrogate > >> > handling depends on the encoding and whether the unicode string > >> > implementation uses 16 or 32 bits per character. > >> > > >> > I agree that we need to be able to specify the error handling as well. > >> > >> Should it be possible to change the error handling during the lifetime > >> of a stream? Then this change would have to be passed through to the > >> underlying codec. > > > > Not unless you have a really good use case handy... > > Not for decoding, but for encoding: If you're outputting XML and use an > encoding that can't encode all unicode characters, then it makes sense > to switch to "xmlcharrefreplace" error handling during the output of > text nodes (and back to "strict" for element names etc.). So do the incremental codecs allow this switching? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From walter at livinglogic.de Tue Feb 27 22:47:26 2007 From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=) Date: Tue, 27 Feb 2007 22:47:26 +0100 Subject: [Python-3000] Draft PEP for New IO system In-Reply-To: References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com> <45E49714.9060003@livinglogic.de> <45E4A226.9010908@livinglogic.de> Message-ID: <45E4A6EE.6010903@livinglogic.de> Guido van Rossum wrote: > On 2/27/07, Walter D?rwald wrote: > [...] >> The basic principle is that these codecs can encode strings and decode >> bytes in multiple chunks. If you want to encode a unicode string u in >> UTF-16 you can do it in one go: >> s = u.encode("utf-16") >> or character by character: >> encoder = codecs.lookup("utf-16").incrementalencoder() >> s = "".join(encoder.encode(c) for c in u) + encoder.encode(u"", True) >> The incremental encoder makes sure, that the result contains only one >> BOM. >> >> Decoding works in the same way: >> decoder = codecs.lookup("utf-16").incrementaldecoder() >> u = u"".join(decoder.decode(c) for c in s) + decoder.decode("", True) > > Thanks for the explanations, it is a little bit clearer now! > > [...] >> >> Should it be possible to change the error handling during the lifetime >> >> of a stream? Then this change would have to be passed through to the >> >> underlying codec. >> > >> > Not unless you have a really good use case handy... >> >> Not for decoding, but for encoding: If you're outputting XML and use an >> encoding that can't encode all unicode characters, then it makes sense >> to switch to "xmlcharrefreplace" error handling during the output of >> text nodes (and back to "strict" for element names etc.). > > So do the incremental codecs allow this switching? Yes: >>> import codecs >>> ci = codecs.lookup("ascii") >>> enc = ci.incrementalencoder(errors="xmlcharrefreplace") >>> enc.encode(u"\xff") 'ÿ' >>> enc.errors = "strict" >>> enc.encode(u"\xff") Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python2.5/encodings/ascii.py", line 22, in encode return codecs.ascii_encode(input, self.errors)[0] UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in position 0: ordinal not in range(128) And it's documented that changing the errors attribute is allowed: http://docs.python.org/lib/incremental-encoder-objects.html http://docs.python.org/lib/incremental-decoder-objects.html Servus, Walter From jimjjewett at gmail.com Tue Feb 27 23:17:50 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 27 Feb 2007 17:17:50 -0500 Subject: [Python-3000] Draft PEP for New IO system In-Reply-To: References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com> <79990c6b0702270257w459129bbp566385ec1e5f2646@mail.gmail.com> Message-ID: On 2/27/07, Guido van Rossum wrote: > On 2/27/07, Jim Jewett wrote: > > Therefore, normal code can ignore the possibility, or (to be really > > robust against someone else messing with the input stream) add an "if > > result is None: continue" clause to its loops. > No, since that would mean busy-waiting while the I/O isn't ready, Then should I assume that: (1) Read with a timeout is in the "better know your concrete object" category. (2) Dealing with possibly unready objects in a library/framework (yield the timeslot?) should generally be framework specific. > FWIW we just discovered that the buffered writers need a __del__ > method that calls flush()... All they really need is a __close__ method -- you don't want it to cause gc cycles, and it is OK if the flush happens more than once. (I'll stop for now, as the __del__ semantics are a different long thread.) -jJ From greg.ewing at canterbury.ac.nz Tue Feb 27 23:17:11 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 28 Feb 2007 11:17:11 +1300 Subject: [Python-3000] Draft PEP for New IO system In-Reply-To: References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com> <79990c6b0702270257w459129bbp566385ec1e5f2646@mail.gmail.com> Message-ID: <45E4ADE7.7040709@canterbury.ac.nz> Guido van Rossum wrote: > I'd like to constrain newline to be either \n or \r\n for writing; What about \r? -- Greg From guido at python.org Tue Feb 27 23:37:18 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 27 Feb 2007 16:37:18 -0600 Subject: [Python-3000] Draft PEP for New IO system In-Reply-To: <45E4ADE7.7040709@canterbury.ac.nz> References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com> <79990c6b0702270257w459129bbp566385ec1e5f2646@mail.gmail.com> <45E4ADE7.7040709@canterbury.ac.nz> Message-ID: On 2/27/07, Greg Ewing wrote: > Guido van Rossum wrote: > > > I'd like to constrain newline to be either \n or \r\n for writing; > > What about \r? Mac OS 9 has been dead and unsupported for many years now. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Feb 27 23:39:12 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 27 Feb 2007 16:39:12 -0600 Subject: [Python-3000] Draft PEP for New IO system In-Reply-To: References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com> <79990c6b0702270257w459129bbp566385ec1e5f2646@mail.gmail.com> Message-ID: On 2/27/07, Jim Jewett wrote: > On 2/27/07, Guido van Rossum wrote: > > On 2/27/07, Jim Jewett wrote: > > > > Therefore, normal code can ignore the possibility, or (to be really > > > robust against someone else messing with the input stream) add an "if > > > result is None: continue" clause to its loops. > > > No, since that would mean busy-waiting while the I/O isn't ready, > > Then should I assume that: > > (1) Read with a timeout is in the "better know your concrete object" category. Using these shouldn't necessarily need to be (but you *should* know to expect EWOULDBLOCK); but setting the timeout should be, yes. > (2) Dealing with possibly unready objects in a library/framework > (yield the timeslot?) should generally be framework specific. Yeah, event loop business typically is. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg.ewing at canterbury.ac.nz Wed Feb 28 01:20:24 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 28 Feb 2007 13:20:24 +1300 Subject: [Python-3000] Draft PEP for New IO system In-Reply-To: References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com> <79990c6b0702270257w459129bbp566385ec1e5f2646@mail.gmail.com> <45E4ADE7.7040709@canterbury.ac.nz> Message-ID: <45E4CAC8.1070902@canterbury.ac.nz> Guido van Rossum wrote: > On 2/27/07, Greg Ewing wrote: > > > What about \r? > > Mac OS 9 has been dead and unsupported for many years now. Even if the Python code isn't running on MacOS 9, it might want to write a file that will be read by a MacOS 9 system. -- Greg From oliphant.travis at ieee.org Wed Feb 28 02:10:23 2007 From: oliphant.travis at ieee.org (Travis E. Oliphant) Date: Tue, 27 Feb 2007 18:10:23 -0700 Subject: [Python-3000] PEP Draft: Enhancing the buffer protcol Message-ID: Attached is my current draft of the enhanced buffer protocol for Python 3000. It is basically what has been discussed except for some issues with non single-segment memory areas (such as a sub-array). Comments are welcome. -Travis Oliphant -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: pep_buffer.txt Url: http://mail.python.org/pipermail/python-3000/attachments/20070227/6a49daa4/attachment-0001.txt From daniel at stutzbachenterprises.com Wed Feb 28 03:00:12 2007 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Tue, 27 Feb 2007 20:00:12 -0600 Subject: [Python-3000] PEP Draft: Enhancing the buffer protcol In-Reply-To: References: Message-ID: I know I'm joining this discussion late in the game, so I apologize if my look through the list archives was not sufficiently exhaustive and this has been proposed and shot down before... What if the locking mechanism were put into the array's memory instead of the container's memory? If the array-memory is a PyObject, then the existing reference counting mechanism can be used, instead of inventing a new one. We can introduce a new type, PyArray, that is pretty much opaque (bare minimum of methods). A PyArray is just a PyObject_HEAD (one type pointer plus the reference counter) followed by the data that would normally be there. When an array-like container allocates memory, it allocates a PyArray to store the actual data in. When a caller request a view, the container increments the PyArray's reference counter and returns a pointer to the PyArray. The caller is responsible for decrementing the reference counter when it is done with the view, so bf_releasebuffer becomes unnecessary. The container cannot reallocate the memory unless the reference counter on the PyArray is exactly 1. Basically, I'm wondering if it makes sense to move the new reference counter into the buffered memory rather than putting it in the container, so that there is only one reference counter implementation. Different question: what is a container supposed to do if a view is locking its memory and it needs to reallocate to complete some operation? I assume it would raise an exception, but it would be nice to spell this out in the PEP. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises LLC From ncoghlan at gmail.com Wed Feb 28 03:19:30 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 28 Feb 2007 12:19:30 +1000 Subject: [Python-3000] PEP Draft: Enhancing the buffer protcol In-Reply-To: References: Message-ID: <45E4E6B2.8090506@gmail.com> Daniel Stutzbach wrote: > When an array-like container allocates memory, it allocates a PyArray > to store the actual data in. When a caller request a view, the > container increments the PyArray's reference counter and returns a > pointer to the PyArray. The caller is responsible for decrementing > the reference counter when it is done with the view, so > bf_releasebuffer becomes unnecessary. > > The container cannot reallocate the memory unless the reference > counter on the PyArray is exactly 1. > > Basically, I'm wondering if it makes sense to move the new reference > counter into the buffered memory rather than putting it in the > container, so that there is only one reference counter implementation. An object can use a similar approach (by calling Py_INCREF/DECREF in the get/release methods), but there is no need for it to be the *only* approach (TOOWTDI is given significantly less emphasis in the C API, while speed & memory efficiency concerns are higher on the priority list). > Different question: what is a container supposed to do if a view is > locking its memory and it needs to reallocate to complete some > operation? I assume it would raise an exception, but it would be nice > to spell this out in the PEP. > I was wondering this, too. I'd also like to know what should happen if the object's Python refcount drops to zero, but the view count is still greater than 0. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From jcarlson at uci.edu Wed Feb 28 03:48:03 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Tue, 27 Feb 2007 18:48:03 -0800 Subject: [Python-3000] PEP Draft: Enhancing the buffer protcol In-Reply-To: References: Message-ID: <20070227182352.AE67.JCARLSON@uci.edu> "Daniel Stutzbach" wrote: [snip] > The container cannot reallocate the memory unless the reference > counter on the PyArray is exactly 1. Alternatively, some objects could opt to create a new PyArray of sufficient size, copy data as necessary, and leave all previous views to point to the old data. If done periodically, this could lead to an interesting versioning mechanism (especially if we could teach Python to virtualize itself and pull memory from a specific buffer), but that is a different discussion for a different day :) About the only issue I can see with implementing the mechanism as you describe is that everything that wants to offer the buffer interface would need to store its data in a PyArray structure. Bytes, unicode, array.array, mmap, etc. Most of the difference will essentially be a call to PyArray_New() rather than PyMalloc(), and an indirection via macro of PyArray_ASSTRINGANDSIZE() to get the pointer and length of the buffer. I would suspect that such overhead would be minimal, but without implementing and testing it on something that is used often (maybe Python 2.x strings as the simplest example?), it would be hard to say. The benefit to implementing the interface as described by Travis is that if an object is read-only (like unicode), the acquire/release is (as in the PyArray version) an incref/decref, and no other structural changes are necessary. Then again, after switching to PyArrays, all views are more or less an incref or decref and the allocation of a "view" object to describe memory layout. - Josiah From daniel at stutzbachenterprises.com Wed Feb 28 04:38:39 2007 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Tue, 27 Feb 2007 21:38:39 -0600 Subject: [Python-3000] PEP Draft: Enhancing the buffer protcol In-Reply-To: <20070227182352.AE67.JCARLSON@uci.edu> References: <20070227182352.AE67.JCARLSON@uci.edu> Message-ID: On 2/27/07, Josiah Carlson wrote: > About the only issue I can see with implementing the mechanism as you > describe is that everything that wants to offer the buffer interface > would need to store its data in a PyArray structure. Bytes, unicode, > array.array, mmap, etc. Most of the difference will essentially be a > call to PyArray_New() rather than PyMalloc(), and an indirection via > macro of PyArray_ASSTRINGANDSIZE() to get the pointer and length of the > buffer. I would suspect that such overhead would be minimal, but without > implementing and testing it on something that is used often (maybe > Python 2.x strings as the simplest example?), it would be hard to say. Each type can implement it's own PyArray subtype, so there'd be no need for a macro/function to do the indirection. For example, if we wanted to build an C integer-based array type for some reason, we could create it's PyArray subtype as follows: typedef struct { PyObject_HEAD int ival[1]; } PyIntArray; The data can then be accessed cleanly like this: PyIntArray *my_array = allocate_some_memory(); my_array->ival[some_index] = v; Possibly on some architectures accessing the data will be very slightly slower because ival isn't at the top of the structure. I wrote a short test program just now and didn't see a difference on my architecture (Intel Duo). > The benefit to implementing the interface as described by Travis is that > if an object is read-only (like unicode), the acquire/release is (as in > the PyArray version) an incref/decref, and no other structural changes > are necessary. If I read the source right, the current Unicode implementation converts the unicode string to a regular string using the default encoding when a buffer is requested. Presumably this will need to be re-thought for Python 3000 since non-unicode strings are going away. However, for certain read-only types (like 2.5-style strings) their implementation is already a PyObject with an array-tacked on to the end. These could be subtypes of the PyArray type with very little trouble, and it would only be necessary to maintain one reference counter for them instead of two. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises LLC From oliphant.travis at ieee.org Wed Feb 28 05:05:23 2007 From: oliphant.travis at ieee.org (Travis E. Oliphant) Date: Tue, 27 Feb 2007 21:05:23 -0700 Subject: [Python-3000] PEP Draft: Enhancing the buffer protcol In-Reply-To: References: Message-ID: Daniel Stutzbach wrote: > I know I'm joining this discussion late in the game, so I apologize if > my look through the list archives was not sufficiently exhaustive and > this has been proposed and shot down before... No, I don't think you are late. But this discussion has been going on off and on for at least 10 years :-) We don't all remember all the issues, though. > > What if the locking mechanism were put into the array's memory instead > of the container's memory? Basically, my first proposal was to have a single view object and you would get at the memory through it. But, having a light-weight API that returns a pointer to memory like the current one does is desirable. > If the array-memory is a PyObject, then > the existing reference counting mechanism can be used, instead of > inventing a new one. We can introduce a new type, PyArray, that is > pretty much opaque (bare minimum of methods). A PyArray is just a > PyObject_HEAD (one type pointer plus the reference counter) followed > by the data that would normally be there. > The original object still needs to distinguish between normal references and "view-based references." Thus, even with your proposal it seems you will need another counter on the objects that wish to track buffer-interface views. > When an array-like container allocates memory, it allocates a PyArray > to store the actual data in. When a caller request a view, the > container increments the PyArray's reference counter and returns a > pointer to the PyArray. The caller is responsible for decrementing > the reference counter when it is done with the view, so > bf_releasebuffer becomes unnecessary. Maybe I'm not understanding you correctly. Perhaps what you are saying is that we should have all memory allocation go through a light-weight memory-object. Then, you would get this object + an offset when you wanted a pointer into memory. This way, the memory would never be deallocated until nothing was referencing it. I think this approach would work. However, you could still have the case, where an object reallocated memory while another object which thought it had a view of that object ended up with a "out-dated" view. You just wouldn't segfault in that case. You could check the reference count on the memory object, before reallocating, I suppose. But I've heard that the reference counts on Python objects can be larger than 1 in some cases (even though there isn't really anything "viewing" the memory). > > The container cannot reallocate the memory unless the reference > counter on the PyArray is exactly 1. I'm not sure we can guarantee this would work. It seems like for various reasons depending on the state of the interpreter, reference counts increase. > > Basically, I'm wondering if it makes sense to move the new reference > counter into the buffered memory rather than putting it in the > container, so that there is only one reference counter implementation. > This is an idea I've thought of too, but we would be enforcing a "use-python for all shared memory allocations" restriction. > Different question: what is a container supposed to do if a view is > locking its memory and it needs to reallocate to complete some > operation? I assume it would raise an exception, but it would be nice > to spell this out in the PEP. It would raise an exception. -Travis From greg.ewing at canterbury.ac.nz Wed Feb 28 04:58:07 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 28 Feb 2007 16:58:07 +1300 Subject: [Python-3000] PEP Draft: Enhancing the buffer protcol In-Reply-To: <45E4E6B2.8090506@gmail.com> References: <45E4E6B2.8090506@gmail.com> Message-ID: <45E4FDCF.6060408@canterbury.ac.nz> Nick Coghlan wrote: > I'd also like to know what should happen if > the object's Python refcount drops to zero, but the view count is still > greater than 0. That shouldn't happen, because the code using the view ought to be responsible for holding a reference to the containing object as long as it's using the view. -- Greg From ncoghlan at gmail.com Wed Feb 28 05:09:34 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 28 Feb 2007 14:09:34 +1000 Subject: [Python-3000] PEP Draft: Enhancing the buffer protcol In-Reply-To: <45E4FDCF.6060408@canterbury.ac.nz> References: <45E4E6B2.8090506@gmail.com> <45E4FDCF.6060408@canterbury.ac.nz> Message-ID: <45E5007E.3000203@gmail.com> Greg Ewing wrote: > Nick Coghlan wrote: >> I'd also like to know what should happen if >> the object's Python refcount drops to zero, but the view count is still >> greater than 0. > > That shouldn't happen, because the code using the view > ought to be responsible for holding a reference to the > containing object as long as it's using the view. That's what I thought, but it should probably be mentioned explicitly in the PEP (and the eventual docs) that the memory reference needs to be in addition to a normal object reference, rather than instead of. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From greg.ewing at canterbury.ac.nz Wed Feb 28 05:20:07 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 28 Feb 2007 17:20:07 +1300 Subject: [Python-3000] PEP Draft: Enhancing the buffer protcol In-Reply-To: References: Message-ID: <45E502F7.4070603@canterbury.ac.nz> Travis E. Oliphant wrote: > typedef char *(*formatbufferproc)(PyObject *view, int *itemsize) > > Get the format-string of the memory using the struct-module > string syntax I'm not sure whether a struct-format string would be the most convenient form for use by C-level code, as it could require some tedious parsing to extract useful information from it. > typedef PyObject *(*shapebufferproc)(PyObject *view) > > Return a 2-tuple of lists containing shape information: (shape, > strides). I'm also not sure about using Python data structures to represent this, as it will force C-level code to use Python API calls to pull it apart. What would be wrong with C array of structs containing two integers each? The buffer API is for the use of C code, and it should be designed with the convenience of C code in mind. Using Python data structures unnecessarily seems like the wrong way to go about that. The following alternative would seem to provide most of the things that Travis's proposal does without involving Python objects: struct pybuffer_shape { Py_ssize_t length; Py_ssize_t stride; }; typedef int (*getbufferproc)(PyObject *obj, void **buf, Py_ssize_t *len, char **format, struct pybuffer_shape **shape, int *ndim); /* Any of buf, format and shape may be NULL if you're not interested in them. */ typedef int (*releasebufferproc)(PyObject *obj); -- Greg From greg.ewing at canterbury.ac.nz Wed Feb 28 05:21:50 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 28 Feb 2007 17:21:50 +1300 Subject: [Python-3000] PEP Draft: Enhancing the buffer protcol In-Reply-To: <20070227182352.AE67.JCARLSON@uci.edu> References: <20070227182352.AE67.JCARLSON@uci.edu> Message-ID: <45E5035E.1050602@canterbury.ac.nz> Josiah Carlson wrote: > About the only issue I can see with implementing the mechanism as you > describe is that everything that wants to offer the buffer interface > would need to store its data in a PyArray structure. And that wouldn't be acceptable, because the point of the buffer interface is to provide access to memory which is *not* kept in any standard kind of container. Often the memory is allocated and managed by an external library, and we have no control over it. -- Greg From oliphant.travis at ieee.org Wed Feb 28 05:53:42 2007 From: oliphant.travis at ieee.org (Travis E. Oliphant) Date: Tue, 27 Feb 2007 21:53:42 -0700 Subject: [Python-3000] PEP Draft: Enhancing the buffer protcol In-Reply-To: <45E502F7.4070603@canterbury.ac.nz> References: <45E502F7.4070603@canterbury.ac.nz> Message-ID: Greg Ewing wrote: > Travis E. Oliphant wrote: > >> typedef char *(*formatbufferproc)(PyObject *view, int *itemsize) >> >> Get the format-string of the memory using the struct-module >> string syntax > > I'm not sure whether a struct-format string would be > the most convenient form for use by C-level code, as > it could require some tedious parsing to extract > useful information from it. Yes, this was the reason for my dtype object. But, I think that folks felt it was too much, especially since the struct-style syntax is already there in Python. Do you have any other suggestions? > >> typedef PyObject *(*shapebufferproc)(PyObject *view) >> >> Return a 2-tuple of lists containing shape information: (shape, >> strides). > > I'm also not sure about using Python data structures > to represent this, as it will force C-level code to > use Python API calls to pull it apart. What would be > wrong with C array of structs containing two integers > each? Nothing except memory management. Now, you have to worry about allocating and deallocating memory. > > The buffer API is for the use of C code, and it should > be designed with the convenience of C code in mind. I agree. I would like to use something besides Python objects, but handling the memory allocation is non-trivial. On the other hand, Python tuples are pretty simple wrappers around integers. > Using Python data structures unnecessarily seems like > the wrong way to go about that. > > The following alternative would seem to provide most of > the things that Travis's proposal does without involving > Python objects: > > struct pybuffer_shape { > Py_ssize_t length; > Py_ssize_t stride; > }; > > typedef int (*getbufferproc)(PyObject *obj, > void **buf, Py_ssize_t *len, > char **format, > struct pybuffer_shape **shape, int *ndim); > > /* Any of buf, format and shape may be NULL if you're > not interested in them. */ > Besides not allowing for the request of a "contiguous" buffer from the object or a writeable one you are also not describing how allocation for this array of structs will be handled. I'm not opposed in principle. In fact, I would like to get rid of the Python objects in the protocol (in the array_struct interface for NumPy we have the shape and strides in an array of integers). The memory management is the only issue. -Travis From oliphant.travis at ieee.org Wed Feb 28 06:00:19 2007 From: oliphant.travis at ieee.org (Travis E. Oliphant) Date: Tue, 27 Feb 2007 22:00:19 -0700 Subject: [Python-3000] PEP Draft: Enhancing the buffer protcol In-Reply-To: References: <45E502F7.4070603@canterbury.ac.nz> Message-ID: Travis E. Oliphant wrote: > > The memory management is the only issue. In fact, the PEP still has the issue of who manages the memory for the format-description string when it is returned. The easiest thing to do is to return a Python String and let reference counting handle the memory management. What if we were also to return from the shape call a Python C-Object that loosely wrapped the shape and strides c-arrays. Then, it would free the memory on deallocation. A C-API call that created such a C-Object from two arrays of integers could be provided to make it easy. -Travis From oliphant.travis at ieee.org Wed Feb 28 06:34:28 2007 From: oliphant.travis at ieee.org (Travis E. Oliphant) Date: Tue, 27 Feb 2007 22:34:28 -0700 Subject: [Python-3000] PEP Draft: Enhancing the buffer protcol In-Reply-To: <45E502F7.4070603@canterbury.ac.nz> References: <45E502F7.4070603@canterbury.ac.nz> Message-ID: Greg Ewing wrote: > > The buffer API is for the use of C code, and it should > be designed with the convenience of C code in mind. > Using Python data structures unnecessarily seems like > the wrong way to go about that. > > The following alternative would seem to provide most of > the things that Travis's proposal does without involving > Python objects: > In my latest version of the PEP, I suggest using Python CObject's as loose wrappers around C-structures for both the char * format string and the structure int ndim Py_ssize_t *shape; Py_ssize_t *strides; This way, we get the benefit of Python object counting for memory management but easy-access to the relevant C-objects. I've also added simple functions to the proposed C-API to construct these C-objects. -Travis From oliphant.travis at ieee.org Wed Feb 28 06:35:54 2007 From: oliphant.travis at ieee.org (Travis E. Oliphant) Date: Tue, 27 Feb 2007 22:35:54 -0700 Subject: [Python-3000] PEP Draft: Enhancing the buffer protcol In-Reply-To: References: Message-ID: Travis E. Oliphant wrote: > > Attached is my current draft of the enhanced buffer protocol for Python > 3000. It is basically what has been discussed except for some issues > with non single-segment memory areas (such as a sub-array). > > Comments are welcome. > The latest version of the PEP is always available here: http://projects.scipy.org/scipy/numpy/browser/trunk/numpy/doc/pep_buffer.txt -Travis From rasky at develer.com Wed Feb 28 09:20:01 2007 From: rasky at develer.com (Giovanni Bajo) Date: Wed, 28 Feb 2007 09:20:01 +0100 Subject: [Python-3000] Draft PEP for New IO system In-Reply-To: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com> References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com> Message-ID: [reposting since the first time it didn't get through...] On 26/02/2007 22.35, Mike Verdone wrote: > Daniel Stutzbach and I have prepared a draft PEP for the new IO system > for Python 3000. This document is, hopefully, true to the info that > Guido wrote on the whiteboards here at PyCon. This is still a draft > and there's quite a few decisions that need to be made. Feedback is > welcomed. Thanks for this! > Raw I/O > The abstract base class for raw I/O is RawIOBase. It has several > methods which are wrappers around the appropriate operating system > call. If one of these functions would not make sense on the object, > the implementation must raise an IOError exception. For example, if a > file is opened read-only, the .write() method will raise an IOError. > As another example, if the object represents a socket, then .seek(), > .tell(), and .truncate() will raise an IOError. > > .read(n: int) -> bytes > .readinto(b: bytes) -> int > .write(b: bytes) -> int What are the requirements here? - Can read()/readinto() return *less* bytes than specified? - Can read() return a 0-sized byte object (=no data available)? - Can read() return *more* bytes than specified (think of a datagram socket or a decompressing stream)? - Can readinto() read *less* bytes than specified? - Can readinto() read zero bytes? - Should read()/readinto() raise EOFError? - Can write() write less bytes than specified? - Can write() write zero bytes? Please, see also the examples at the end of the mail before providing an answer :) > .seek(pos: int, whence: int = 0) -> None > .tell() -> int > .truncate(n: int = None) -> None > .close() -> None Why should this very low-level basic type define *two* read methods? Assuming that readinto() is the most primitive, can we have the ABC RawIOBase provide a default read() method that calls readinto? Consider providing more ABC/mixins to help implementations. ReadIOBase/WriteIOBase are pretty obvious: class RawIOBase: def readable(self): return False def writeable(self): return False def seekable(self): return False def read(self,n): raise IOError def readinto(self,b): raise IOError def write(self,b): raise IOError def seek(self,pos,wh): raise IOError def tell(self): raise IOError def truncate(self,n=None): raise IOError class ReadIOBase(RawIOBase): def readable(self): return True def read(self, n): b = bytes(n) #whatever self.readinto(b) return b class MySpecialReader(ReadIOBase): def readinto(self, b): # .... # must implement only this and nothing else class MySpecialReaderWriter(ReadIOBase, WriteIOBase): def readinto(self, b): # .... def write(self, b): # .... > (should these "is_" functions be attributes instead? > "file.readable == True") Yes, I think readable/writeable/seekable/fileno *perfectly* match the good usage of attributes/properties. They all provide a value without any side-effect and that can be computed without doing O(n)-style computations. > Buffered I/O > The next layer is the Buffer I/O layer which provides more efficient > access to file-like objects. The abstract base class for all Buffered I think you probably want the buffer size to be optionally specified by the user, for the standard 4 implementations. > Q: Do we want to mandate in the specification that switching between > reading to writing on a read-write object implies a .flush()? Or is > that an implementation convenience that users should not rely on? I'd be glad if using flush() wasn't a requirement for users of the class. It always strikes me as abstraction leak to me. > TextIOBase class implementations additionally provide the following methods: > > .readline(self) > > Read until newline or EOF and return the line. > > .readlinesiter() > > Returns an iterator that returns lines from the file (which > happens to be 'self'). > > .next() > > Same as readline() > > .__iter__() > > Same as readlinesiter() Note sure why you need "readlinesiter()" at all. I thought Py3k was disposing most of the "fooiter()" functions (thinking of dicts...). > Another way to do it is as follows (we should pick one or the other): > > .__init__(self, buffer, encoding=None, newline=None) I think this is clearer. I can't find a good real-world usecase for requiring the two parameters version. ========================================================================== Now for some real example. Let's say I'm given a readable RawIOBase object. I'm told that it's a foobar-compressed utf-8 text-file. I have this API available: class Foobar: # initialize decompressor __init__() # feed compressed bytes and get uncompressed bytes. # The uncompressed data can be smaller, equal or larger # than the compressed data decompress(bytes) -> bytes # finish decompression and get tail flush() -> bytes This is basically similar to the way zlib.decompress/flush works. I would like to wrap the readable RawIOBase object in a way that I obtain a textual file-like with readline() etc. This is pretty hard to do with the current I/O library (you need to write a lot of code). It'd be good if the new I/O library makes it easier to achieve. Let's see. I start with a raw I/O reader: class FoobarRaw(RawIOBase): def __init__(self, raw): self.raw = raw self._d = Foobar() self._buf = bytes() def readable(self): return True # I assume RawIOBase.read() must return the # exact number of bytes (unless at the end). # I assume RawIOBase.read() raises EOFError when done # I assume readinto() does not exist... def read(self, n): try: while len(self._buf) < n: b = self.raw.read(n) self._buf += self._d.decompress(b) except EOFError: self._buf += self._d.flush() d = self._buf[:n] del self._buf[:n] if not d: raise EOFError return d and complete the job: def foobar_open(raw): return TextIOWrapper(BufferedReader(FoobarRaw(raw)), encoding="utf-8") for L in foobar_open(sock): print(L) Uhm, looks great! ========================================================================== Now, it might be interesting playing with the different semantic of RawIOBase.read(), which I proposed above, and see how the implementation of FoobarRaw.read() changes. For instance (now being radical): why don't we drop the "n" argument altogether? We could just define it like this: # Returns a block of data, whose size is implementation-defined # and may vary between calls. It never returns a zero-sized block. # Raises EOFError when done. read() -> bytes After all, there's a BufferedIO layer to handle buffering and exact-size reads/writes. If we go this way, the above example is even easier: def read(self): try: b = self.raw.read() # any size! return self._d.decompress(b) except EOFError: b = self._d.flush() if not b: raise EOFError return b It would also work well for sockets, since they would return exactly the buffer of data arrived from the network, and simply block once if there's not data available. -- Giovanni Bajo From theller at ctypes.org Wed Feb 28 13:07:53 2007 From: theller at ctypes.org (Thomas Heller) Date: Wed, 28 Feb 2007 13:07:53 +0100 Subject: [Python-3000] PEP Draft: Enhancing the buffer protcol In-Reply-To: References: Message-ID: Travis E. Oliphant schrieb: > Attached is my current draft of the enhanced buffer protocol for Python > 3000. It is basically what has been discussed except for some issues > with non single-segment memory areas (such as a sub-array). > > Comments are welcome. > > -Travis Oliphant > Additions to the struct string-syntax > > The struct string-syntax is missing some characters to fully > implement data-format descriptions already available elsewhere (in > ctypes and NumPy for example). Here are the proposed additions: > > Character Description > ================================== > '1' bit (number before states how many bits) > '?' platform _Bool type In SVN trunk (2.6), the struct module already supports _Bool, but the format character used is 't'. Not a big issue, though, and I like '?' better. > 'g' long double > 'F' complex float > 'D' complex double > 'G' complex long double IIUC, in the latest PEP draft you have apparently changed to two-letter codes for complex types; which is inconsistent with previous conventions in struct. > 'c' ucs-1 (latin-1) encoding > 'u' ucs-2 > 'w' ucs-4 > 'O' pointer to Python Object > 'T{}' structure (detailed layout inside {}) > '(k1,k2,...,kn)' multi-dimensional array of whatever follows > ':name:' optional name of the preceeding element > '&' specific pointer (prefix before another charater) > 'X{}' pointer to a function (optional function > signature inside {}) > > The struct module will be changed to understand these as well and > return appropriate Python objects on unpacking. Un-packing a > long-double will return a c-types long_double. This is probably because there is no way for current Python to support the long double datatype. The question for ctypes is: How should ctypes support that? Should the .value attribute of a c_longdouble have two components, should it expose the value as decimal, should Python itself switch to using long double internally, or are there other possibilities? > Unpacking 'u' or > 'w' will return Python unicode. Unpacking a multi-dimensional > array will return a list of lists. Un-packing a pointer will > return a ctypes pointer object. ctypes does not support pointer objects of non-native byte order; should they be forbidden? > Un-packing a bit will return a > Python Bool. > > Endian-specification ('=','>','<') is also allowed inside the > string so that it can change if needed. The previously-specified > endian string is enforce at all times. The default endian is '='. > > According to the struct-module, a number can preceed a character > code to specify how many of that type there are. The > (k1,k2,...,kn) extension also allows specifying if the data is > supposed to be viewed as a (C-style contiguous, last-dimension > varies the fastest) multi-dimensional array of a particular format. > > Functions should be added to ctypes to create a ctypes object from > a struct description, and add long-double, and ucs-2 to ctypes. Well, ucs-4 should probably be added to ctypes as well. The current ctypes.c_wchar type corresponds to the C WCHAR type, its size is configuration dependend. Thomas From daniel at stutzbachenterprises.com Wed Feb 28 14:39:33 2007 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Wed, 28 Feb 2007 07:39:33 -0600 Subject: [Python-3000] Draft PEP for New IO system In-Reply-To: References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com> Message-ID: Note: to make my answers true, I had to change the Non-blocking I/O part of the PEP so that .read(), .write(), and .readinto() all return None if no data is available from a non-blocking object. Previously it had specified that .readinto() would return 0, but I realized this would be ambiguous with an EOF condition. I'll work on fleshing out the PEP with answers to these questions within a couple hours. On 2/28/07, Giovanni Bajo wrote: > > Raw I/O > > > > .read(n: int) -> bytes > > .readinto(b: bytes) -> int > > .write(b: bytes) -> int > > What are the requirements here? > > - Can read()/readinto() return *less* bytes than specified? Yes. > - Can read() return a 0-sized byte object (=no data available)? A 0-sized byte object indicates end-of-file. > - Can read() return *more* bytes than specified (think of a datagram socket or > a decompressing stream)? No. For a Raw I/O object, any such extra bytes are either buffered in the kernel or lost. For a Buffered IO object, extra bytes are buffered. > - Can readinto() read *less* bytes than specified? For a Raw I/O object, yes. For a Buffered I/O object in non-blocking mode, yes. For a Buffer I/O object in blocking mode, no. > - Can readinto() read zero bytes? Only on end-of-file. > - Should read()/readinto() raise EOFError? On EOF, they return a length-0 object or 0 instead. If the user tries to read again *after* hitting EOF, then an EOFError is raised. > - Can write() write less bytes than specified? For a Raw I/O or non-blocking Buffered I/O object, yes. For a blocking Buffered I/O object, no. > - Can write() write zero bytes? Only if requested by the user. ;) Exception to a few questions about: a zero-byte read/readinto/write can occur on a non-blocking object, but the functions return None to distinguish this case from an EOF condition. > Please, see also the examples at the end of the mail before providing an answer :) > > > .seek(pos: int, whence: int = 0) -> None > > .tell() -> int > > .truncate(n: int = None) -> None > > .close() -> None > > Why should this very low-level basic type define *two* read methods? Assuming > that readinto() is the most primitive, can we have the ABC RawIOBase provide a > default read() method that calls readinto? > Yes, I think readable/writeable/seekable/fileno *perfectly* match the good > usage of attributes/properties. They all provide a value without any > side-effect and that can be computed without doing O(n)-style computations. Unfortunately, seekable() may need to call .seek() to figure it out. I favor calling .seek() (or usting stat()) once when constructing the object and storing the value (since we'll almost certainly need to do this anyway to figure out what kind of Buffered I/O object to use). If we do that, then we can make these attributes. > Now for some real example. Let's say I'm given a readable RawIOBase object. > I'm told that it's a foobar-compressed utf-8 text-file. I have this API available: > > class Foobar: > # initialize decompressor > __init__() > > # feed compressed bytes and get uncompressed bytes. > # The uncompressed data can be smaller, equal or larger > # than the compressed data > decompress(bytes) -> bytes > > # finish decompression and get tail > flush() -> bytes > > > This is basically similar to the way zlib.decompress/flush works. I would like > to wrap the readable RawIOBase object in a way that I obtain a textual > file-like with readline() etc. The easy way to do this is for the zlib decompressor to wrap the RawIOBase object in an appropriate BufferIOBase object first. Then read() can be called with no argument and return as many bytes as are available. It sounds like you want to force RawIOBase objects to have a buffer, too, which defeats the point of having layers. Most use-cases will want to use a BufferIOBase object to buffer the bytes coming out of the raw object. In a few cases though, it really is useful to get down to the system-call level. Part of the motivation for reworking the I/O interface is to make this possible. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises LLC From exarkun at divmod.com Wed Feb 28 15:10:46 2007 From: exarkun at divmod.com (Jean-Paul Calderone) Date: Wed, 28 Feb 2007 09:10:46 -0500 Subject: [Python-3000] Draft PEP for New IO system In-Reply-To: Message-ID: <20070228141046.17852.1877073320.divmod.quotient.899@ohm> On Wed, 28 Feb 2007 07:39:33 -0600, Daniel Stutzbach wrote: > > [snip] > >> - Should read()/readinto() raise EOFError? > >On EOF, they return a length-0 object or 0 instead. If the user tries >to read again *after* hitting EOF, then an EOFError is raised. > What is the motivation for having two different ways to signal EOF? How is this case handled? >>> f = file('name', 'w') >>> g = file('name', 'r') >>> g.read(10) '' >>> f.write('bytes') >>> f.flush() >>> g.read(10) 'bytes' >>> Jean-Paul From agthorr at barsoom.org Wed Feb 28 05:52:54 2007 From: agthorr at barsoom.org (Daniel Stutzbach) Date: Tue, 27 Feb 2007 22:52:54 -0600 Subject: [Python-3000] PEP Draft: Enhancing the buffer protcol In-Reply-To: References: Message-ID: On 2/27/07, Travis E. Oliphant wrote: > Maybe I'm not understanding you correctly. Perhaps what you are saying > is that we should have all memory allocation go through a light-weight > memory-object. Then, you would get this object + an offset when you > wanted a pointer into memory. > > This way, the memory would never be deallocated until nothing was > referencing it. I think this approach would work. However, you could > still have the case, where an object reallocated memory while another > object which thought it had a view of that object ended up with a > "out-dated" view. You just wouldn't segfault in that case. > > You could check the reference count on the memory object, before > reallocating, I suppose. You have understood me correctly. (though I see that Greg Ewing has raised a good objection so its a moot point) > But I've heard that the reference counts on > Python objects can be larger than 1 in some cases (even though there > isn't really anything "viewing" the memory). Is that true? I'm writing an extension module (for my own use and to scratch an itch) that relies on the following notion: If a C module never exposes an object to the user, then the object's reference counter is only incremented/decremented by the module. Does the garbage collector sometimes temporarily increment reference counters in the course of its operation? I looked through the code, but didn't see anything to that effect (except with regard to weak reference objects). I can't see how anything other than the garbage collector would even find such an object. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises LLC From shredwheat at gmail.com Wed Feb 28 06:50:16 2007 From: shredwheat at gmail.com (Pete Shinners) Date: Tue, 27 Feb 2007 21:50:16 -0800 Subject: [Python-3000] unit test for advanced formatting Message-ID: I've gone over PEP3101 to create an initial unittest for the advanced formatting. Based on this intro to the formatting syntax, I thought I'd also share my thoughts. I've also experimented with this against the python prototype of the formatting. I have commented out the tests where that implementation fails, but should work (by my interpretation). If anything these tests will provide a preview look at the way the formatting looks. 1. The early python implementation does not allow "reusing" an argument either by index or by keyword name. The PEP has not defined this behavior. I think it is important to be allowed to reuse any of the argument objects given to format. 2. The implementation we have always requires a "fill" argument in the format, if a width is specified. It would be a big improvement if space characters were default. 3. The specification is deep. It will take an intense amount of unit testing of corner cases to make sure this is actually doing what is correct. It may be too complex, but it is hard to know what might be yagni. 4. The PEP still leaves a bit of wiggle room in the design, but since an implementation is underway, I think more experimentation would be better before locking down the design. 5. The "strict mode" activation through a global state on the string object is a bad idea. I would prefer some sort of "flags" argument passed to each function. I would prefer the "strict" mode where exceptions are raised by default. But I do not want the strict behavior of requiring all arguments to be used. 6. Security on the attribute lookups is probably an unending topic. A simple minimum would be to not allow attribute lookups on names starting with an underscore. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070227/47ef8085/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: test_format.py Type: text/x-python Size: 5990 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070227/47ef8085/attachment-0001.py From agthorr at barsoom.org Wed Feb 28 15:24:21 2007 From: agthorr at barsoom.org (Daniel Stutzbach) Date: Wed, 28 Feb 2007 08:24:21 -0600 Subject: [Python-3000] Draft PEP for New IO system In-Reply-To: <20070228141046.17852.1877073320.divmod.quotient.899@ohm> References: <20070228141046.17852.1877073320.divmod.quotient.899@ohm> Message-ID: On 2/28/07, Jean-Paul Calderone wrote: > >On EOF, they return a length-0 object or 0 instead. If the user tries > >to read again *after* hitting EOF, then an EOFError is raised. > > What is the motivation for having two different ways to signal EOF? How > is this case handled? I checked how Python 2.5 handles this, and you're right. Read operations should continue to return 0 bytes if the user keeps trying to read at EOF. Not sure what I was thinking. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises LLC From mike.verdone at gmail.com Wed Feb 28 17:20:08 2007 From: mike.verdone at gmail.com (Mike Verdone) Date: Wed, 28 Feb 2007 10:20:08 -0600 Subject: [Python-3000] unit test for advanced formatting In-Reply-To: References: Message-ID: <5487f95e0702280820y17bdb171i2f4cb50fb62a54f0@mail.gmail.com> Hi Pete, These look very good. My comments to your comments below, > 1. The early python implementation does not allow "reusing" an argument > either by index or by keyword name. The PEP has not defined this behavior. I > think it is important to be allowed to reuse any of the argument objects > given to format. I just sort of assumed it would be possible to do. Hopefully someone can add it to the spec officially. > 5. The "strict mode" activation through a global state on the string object > is a bad idea. I would prefer some sort of "flags" argument passed to each > function. I would prefer the "strict" mode where exceptions are raised by > default. But I do not want the strict behavior of requiring all arguments to > be used. I agree. It feels kind of Perl-like. I have nightmares of someone setting strict mode on the string and having unrelated modules start blowing up. Could the strict formatting string be a subclass of string? strictformat("my format string {0}").format(...) Alternately maybe strings should be strict by default and you'd have a lenientformat type for lenient mode. Old-style formatting would blow up when you were missing arguments. New format should be just as strict unless you ask it to be nice. Just my 2c. Mike. On 2/27/07, Pete Shinners wrote: > I've gone over PEP3101 to create an initial unittest for the advanced > formatting. Based on this intro to the formatting syntax, I thought I'd also > share my thoughts. I've also experimented with this against the python > prototype of the formatting. > > I have commented out the tests where that implementation fails, but should > work (by my interpretation). If anything these tests will provide a preview > look at the way the formatting looks. > > 1. The early python implementation does not allow "reusing" an argument > either by index or by keyword name. The PEP has not defined this behavior. I > think it is important to be allowed to reuse any of the argument objects > given to format. > > 2. The implementation we have always requires a "fill" argument in the > format, if a width is specified. It would be a big improvement if space > characters were default. > > 3. The specification is deep. It will take an intense amount of unit testing > of corner cases to make sure this is actually doing what is correct. It may > be too complex, but it is hard to know what might be yagni. > > 4. The PEP still leaves a bit of wiggle room in the design, but since an > implementation is underway, I think more experimentation would be better > before locking down the design. > > 5. The "strict mode" activation through a global state on the string object > is a bad idea. I would prefer some sort of "flags" argument passed to each > function. I would prefer the "strict" mode where exceptions are raised by > default. But I do not want the strict behavior of requiring all arguments to > be used. > > 6. Security on the attribute lookups is probably an unending topic. A simple > minimum would be to not allow attribute lookups on names starting with an > underscore. > > > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/mike.verdone%40gmail.com > > > From daniel at stutzbachenterprises.com Wed Feb 28 18:13:39 2007 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Wed, 28 Feb 2007 11:13:39 -0600 Subject: [Python-3000] Draft PEP for New IO system In-Reply-To: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com> References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com> Message-ID: Should FileIO objects define the following methods and properties that the Python 2 file object defines? mode name closed isatty Secondly, should any of these be bumped up to the Raw I/O ABC? -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises LLC From oliphant.travis at ieee.org Wed Feb 28 18:56:16 2007 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Wed, 28 Feb 2007 10:56:16 -0700 Subject: [Python-3000] PEP Draft: Enhancing the buffer protcol In-Reply-To: References: Message-ID: Thomas Heller wrote: > >>Additions to the struct string-syntax >> >> The struct string-syntax is missing some characters to fully >> implement data-format descriptions already available elsewhere (in >> ctypes and NumPy for example). Here are the proposed additions: >> >> Character Description >> ================================== >> '1' bit (number before states how many bits) >> '?' platform _Bool type > > > In SVN trunk (2.6), the struct module already supports _Bool, but the > format character used is 't'. Not a big issue, though, and I like '?' > better. > I think 't' should be used for the bit type also (because '1' is confusing when you have something like '71b' which looks like 71 signed chars but is actually 7 bits + 1 signed char). I've changed this in the current PEP. > >> 'g' long double >> 'F' complex float >> 'D' complex double >> 'G' complex long double > > > IIUC, in the latest PEP draft you have apparently changed to two-letter codes > for complex types; which is inconsistent with previous conventions in struct. Yeah, I've introduced two-letter codes for pointers as well. But, there is a certain logic to it because 'Zd' would be similar to 'dd' except you would know that the two are supposed to be treated as a complex number. > > >> 'c' ucs-1 (latin-1) encoding >> 'u' ucs-2 >> 'w' ucs-4 >> 'O' pointer to Python Object >> 'T{}' structure (detailed layout inside {}) >> '(k1,k2,...,kn)' multi-dimensional array of whatever follows >> ':name:' optional name of the preceeding element >> '&' specific pointer (prefix before another charater) >> 'X{}' pointer to a function (optional function >> signature inside {}) >> >> The struct module will be changed to understand these as well and >> return appropriate Python objects on unpacking. Un-packing a >> long-double will return a c-types long_double. > > > This is probably because there is no way for current Python to support > the long double datatype. Right. On some platforms there is no difference between double and long double. I guess returning a decimal object might actually be the easiest solution. > The question for ctypes is: How should ctypes > support that? Should the .value attribute of a c_longdouble have two > components, should it expose the value as decimal, should Python itself > switch to using long double internally, or are there other possibilities? > I think I like the decimal object solution better. > >> Unpacking 'u' or >> 'w' will return Python unicode. Unpacking a multi-dimensional >> array will return a list of lists. Un-packing a pointer will >> return a ctypes pointer object. > > > ctypes does not support pointer objects of non-native byte order; > should they be forbidden? Yes, I'm fine with them being forbidden. > > >> >> Functions should be added to ctypes to create a ctypes object from >> a struct description, and add long-double, and ucs-2 to ctypes. > > > Well, ucs-4 should probably be added to ctypes as well. The current ctypes.c_wchar > type corresponds to the C WCHAR type, its size is configuration dependend. I think you are right. In the discussions for unifying string/unicode I really like the proposals that are leaning toward having a unicode object be an immutable string of either ucs-1, ucs-2, or ucs-4 depending on what is in the string. This does create some conversion issues that must be handled, but I think it is the best option. In the Python 3.0 version of NumPy, I think that's what we are going to have (three different string types ucs-1, ucs-2, ucs-4). -Travis From jcarlson at uci.edu Wed Feb 28 19:55:21 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Wed, 28 Feb 2007 10:55:21 -0800 Subject: [Python-3000] PEP Draft: Enhancing the buffer protcol In-Reply-To: References: Message-ID: <20070228104438.AE6E.JCARLSON@uci.edu> Travis Oliphant wrote: > I think you are right. In the discussions for unifying string/unicode I > really like the proposals that are leaning toward having a unicode > object be an immutable string of either ucs-1, ucs-2, or ucs-4 depending > on what is in the string. Except that its not going to happen. The width of the unicode representation is going to be fixed at compile time, generally utf-16 or ucs-4. I say utf-16 because the representation allows for surrogate pairs, etc., but each value of the pair are considered a "character", where as (according to my potentially flawed memory of reading the spec) ucs-2 doesn't allow for surrogates. Note that I previously offered an overlay structure that could support the O(logn) time access of arbitrary full characters regardless of encoding (utf-8, utf-16 or ucs-4) using O(logn) space, but it was decided by Guido that Python should return partial character (half of a surrogate pair) rather than offer non-constant character access time.* - Josiah * As a side note, the space and time is really a function of how often surrogates or their equivalent in utf-8, etc., occurred. In worst-case O(logn) for both, but is actually a function of the structure of occurrances of the non-constant character lengths. From jackdied at jackdied.com Wed Feb 28 21:52:12 2007 From: jackdied at jackdied.com (Jack Diederich) Date: Wed, 28 Feb 2007 15:52:12 -0500 Subject: [Python-3000] PEP Draft: Class Decorators Message-ID: <20070228205212.GD5537@performancedrivers.com> Greetings from PyCon! I read hundreds of emails in the dozens of threads about class decorators and there was surprisingly little content (most of the arguments were about syntax which is no longer up for debate). As a result this PEP is quite plain. If any IronPython or Jython folks could throw in their two bits it would be appreciated. PEP: 3XXX Title: Class Decorators Version: 1 Last-Modified: 28-Feb-2007 Authors: Jack Diederich Implementation: SF#1671208 Status: Draft Type: Standards Track Created: 26-Feb-2007 Abstract ======== Extending the decorator syntax to allow the decoration of classes. Rationale ========= Allowing classes to be decorated serves many of the same purposes as the ability to decorate functions. Decorators move factory registration and class manipulation to the top of the class definition instead of the current alternate methods of post-processing or the action-at-a-distance of metaclasses. import myfactory @myfactory.register class MyClass: pass History and Implementation ========================== Class decorators were originally proposed in PEP318 [1]_ and were rejected by Guido [2]_ for lack of use cases. Two years later he saw a use case he liked and gave the go-ahead for a PEP and patch [3]_. The current patch is loosely based on a pre-2.4 patch [4]_ and updated to use the new AST. Grammar/Grammar is changed from funcdef: [decorators] 'def' NAME parameters ['->' test] ':' suite to decorated_thing: decorators (classdef | funcdef) funcdef: 'def' NAME parameters ['->' test] ':' suite References ========== If you enjoyed this PEP you might also enjoy: .. [1] PEP 318, "Decorators for Functions and Methods" http://www.python.org/dev/peps/pep-0318/ .. [2] Class decorators rejection http://mail.python.org/pipermail/python-dev/2004-March/043458.html .. [3] Class decorator go-ahead http://mail.python.org/pipermail/python-dev/2006-March/062942.html .. [4] 2.4 class decorator patch http://python.org/sf/1007991 .. [5] 3.x class decorator patch http://python.org/sf/1671208 From oliphant at ee.byu.edu Wed Feb 28 21:28:12 2007 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed, 28 Feb 2007 13:28:12 -0700 Subject: [Python-3000] PEP Draft: Enhancing the buffer protcol In-Reply-To: <20070228104438.AE6E.JCARLSON@uci.edu> References: <20070228104438.AE6E.JCARLSON@uci.edu> Message-ID: <45E5E5DC.9040103@ee.byu.edu> Josiah Carlson wrote: >Travis Oliphant wrote: > > >>I think you are right. In the discussions for unifying string/unicode I >>really like the proposals that are leaning toward having a unicode >>object be an immutable string of either ucs-1, ucs-2, or ucs-4 depending >>on what is in the string. >> >> > >Except that its not going to happen. The width of the unicode >representation is going to be fixed at compile time, generally utf-16 or >ucs-4. > Are you sure about this? Guido was still talking about the multiple-version representation at PyCon a few days ago. -Travis From jcarlson at uci.edu Wed Feb 28 22:40:06 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Wed, 28 Feb 2007 13:40:06 -0800 Subject: [Python-3000] PEP Draft: Enhancing the buffer protcol In-Reply-To: <45E5E5DC.9040103@ee.byu.edu> References: <20070228104438.AE6E.JCARLSON@uci.edu> <45E5E5DC.9040103@ee.byu.edu> Message-ID: <20070228132631.AE7A.JCARLSON@uci.edu> Travis Oliphant wrote: > Josiah Carlson wrote: > >Travis Oliphant wrote: > >>I think you are right. In the discussions for unifying string/unicode I > >>really like the proposals that are leaning toward having a unicode > >>object be an immutable string of either ucs-1, ucs-2, or ucs-4 depending > >>on what is in the string. > > > >Except that its not going to happen. The width of the unicode > >representation is going to be fixed at compile time, generally utf-16 or > >ucs-4. > > Are you sure about this? Guido was still talking about the > multiple-version representation at PyCon a few days ago. I was thinking of Guido's message from August 31, 2006 with the subject of "Re: [Python-3000] UTF-16", in that message he states that he would like it to be a configure (presumably during compilation) option. If he's talking about different runtime representations, then there's an entire thread discussing it with the subject of "How will unicode get used?" in September of 2006, and an earlier thread prior to that. While I was an early proponent of 'represent minimally', I'm not terribly worried about it either way at this point, and was merely attempting to state what had been expressed in the past. - Josiah From collinw at gmail.com Wed Feb 28 23:15:30 2007 From: collinw at gmail.com (Collin Winter) Date: Wed, 28 Feb 2007 16:15:30 -0600 Subject: [Python-3000] PEP Draft: Class Decorators In-Reply-To: <20070228205212.GD5537@performancedrivers.com> References: <20070228205212.GD5537@performancedrivers.com> Message-ID: <43aa6ff70702281415o2c7ccd75n7fb3db167506abfe@mail.gmail.com> On 2/28/07, Jack Diederich wrote: [snip] > History and Implementation > ========================== > > Class decorators were originally proposed in PEP318 [1]_ and were rejected > by Guido [2]_ for lack of use cases. Two years later he saw a use case > he liked and gave the go-ahead for a PEP and patch [3]_. While I can look up the use-case that prompted Guido to change his mind via the footnote, I'd appreciate having a sampling of use-cases listed in the PEP itself. [snip] > Grammar/Grammar is changed from > > funcdef: [decorators] 'def' NAME parameters ['->' test] ':' suite > > to > > decorated_thing: decorators (classdef | funcdef) > funcdef: 'def' NAME parameters ['->' test] ':' suite The PEP should show how 'decorated_thing' fits into the existing grammar. Thanks, Collin Winter From talin at acm.org Wed Feb 28 21:37:13 2007 From: talin at acm.org (Talin) Date: Wed, 28 Feb 2007 12:37:13 -0800 Subject: [Python-3000] unit test for advanced formatting In-Reply-To: References: Message-ID: <45E5E7F9.2060004@acm.org> Pete Shinners wrote: > I've gone over PEP3101 to create an initial unittest for the advanced > formatting. Based on this intro to the formatting syntax, I thought I'd > also > share my thoughts. I've also experimented with this against the python > prototype of the formatting. > > I have commented out the tests where that implementation fails, but should > work (by my interpretation). If anything these tests will provide a preview > look at the way the formatting looks. > > 1. The early python implementation does not allow "reusing" an argument > either by index or by keyword name. The PEP has not defined this > behavior. I > think it is important to be allowed to reuse any of the argument objects > given to format. Sounds good to me. I think that may have been a side effect of trying to insure that all arguments were used at least once. > 2. The implementation we have always requires a "fill" argument in the > format, if a width is specified. It would be a big improvement if space > characters were default. Concur. > 3. The specification is deep. It will take an intense amount of unit > testing > of corner cases to make sure this is actually doing what is correct. It may > be too complex, but it is hard to know what might be yagni. Well, all I can say is - it could have been a lot deeper. I had to restrict myself to limiting the scope as it was, as there are a lot of related issues that weren't covered. > 4. The PEP still leaves a bit of wiggle room in the design, but since an > implementation is underway, I think more experimentation would be better > before locking down the design. > > 5. The "strict mode" activation through a global state on the string object > is a bad idea. I would prefer some sort of "flags" argument passed to each > function. I would prefer the "strict" mode where exceptions are raised by > default. But I do not want the strict behavior of requiring all > arguments to > be used. Here's my primary issue with this: I wanted a way to enable strictness on an application-wide level, without having to go and individually revise the many (typically hundreds) of individual calls to the string formatting function. A typical example of what I am talking about here is something like a web application server, where you have a "development" mode and a "production" mode. In the development mode, you want to find errors as quickly as possible, so you enable strict formatting. In production, however, you want the server to be as fault-tolerant as possible, so you would enable lenient mode. Moreover, I would want this strict/lenient decision to apply to all of the code modules that I am using, including libraries that I didn't write. The problem with a flags argument is that most people aren't going to bother using it, since it destroys some of the convenience and simplicity of the string formatting function. Thus, if I am calling a library function, and that library wasn't written using the flag argument, then I have no way to control the setting except through some kind of global. A more ideal solution would be to allow some kind of 'security context', i.e. apply strictness to all code which is running within a given context. However, I don't know of a way to do that in Python. If someone can think of a better way to accomplish this, I'd love to hear it. > 6. Security on the attribute lookups is probably an unending topic. A > simple > minimum would be to not allow attribute lookups on names starting with an > underscore. > > > ------------------------------------------------------------------------ > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/talin%40acm.org From mike.verdone at gmail.com Wed Feb 28 23:57:31 2007 From: mike.verdone at gmail.com (Mike Verdone) Date: Wed, 28 Feb 2007 16:57:31 -0600 Subject: [Python-3000] unit test for advanced formatting In-Reply-To: <45E5E7F9.2060004@acm.org> References: <45E5E7F9.2060004@acm.org> Message-ID: <5487f95e0702281457t7ff68deq73eeed7df5b58482@mail.gmail.com> Hi Talin, Some more thoughts... > Here's my primary issue with this: I wanted a way to enable strictness > on an application-wide level, without having to go and individually > revise the many (typically hundreds) of individual calls to the string > formatting function. > > A typical example of what I am talking about here is something like a > web application server, where you have a "development" mode and a > "production" mode. In the development mode, you want to find errors as > quickly as possible, so you enable strict formatting. In production, > however, you want the server to be as fault-tolerant as possible, so you > would enable lenient mode. Personally, I think this is something that application writers should worry about. Like this: def fmtString(string): return lenientformat(string) if applicationInProduction else string fmtString("my format {0}").format(...) Now the application can turn on and off strict mode from one place for all strings that call fmtString. I don't think I'd ever want to change between lenient and strict mode across the board. What if a module writer writes code that depends upon format strings being strict and raising exceptions? When you switch your application to lenient mode, the entire functioning of the module could change. With an across the board switch module writers will have to assume two completely different failure modes for every use of format(). Mike. On 2/28/07, Talin wrote: > Pete Shinners wrote: > > I've gone over PEP3101 to create an initial unittest for the advanced > > formatting. Based on this intro to the formatting syntax, I thought I'd > > also > > share my thoughts. I've also experimented with this against the python > > prototype of the formatting. > > > > I have commented out the tests where that implementation fails, but should > > work (by my interpretation). If anything these tests will provide a preview > > look at the way the formatting looks. > > > > 1. The early python implementation does not allow "reusing" an argument > > either by index or by keyword name. The PEP has not defined this > > behavior. I > > think it is important to be allowed to reuse any of the argument objects > > given to format. > > Sounds good to me. I think that may have been a side effect of trying to > insure that all arguments were used at least once. > > > 2. The implementation we have always requires a "fill" argument in the > > format, if a width is specified. It would be a big improvement if space > > characters were default. > > Concur. > > > 3. The specification is deep. It will take an intense amount of unit > > testing > > of corner cases to make sure this is actually doing what is correct. It may > > be too complex, but it is hard to know what might be yagni. > > Well, all I can say is - it could have been a lot deeper. I had to > restrict myself to limiting the scope as it was, as there are a lot of > related issues that weren't covered. > > > 4. The PEP still leaves a bit of wiggle room in the design, but since an > > implementation is underway, I think more experimentation would be better > > before locking down the design. > > > > 5. The "strict mode" activation through a global state on the string object > > is a bad idea. I would prefer some sort of "flags" argument passed to each > > function. I would prefer the "strict" mode where exceptions are raised by > > default. But I do not want the strict behavior of requiring all > > arguments to > > be used. > > Here's my primary issue with this: I wanted a way to enable strictness > on an application-wide level, without having to go and individually > revise the many (typically hundreds) of individual calls to the string > formatting function. > > A typical example of what I am talking about here is something like a > web application server, where you have a "development" mode and a > "production" mode. In the development mode, you want to find errors as > quickly as possible, so you enable strict formatting. In production, > however, you want the server to be as fault-tolerant as possible, so you > would enable lenient mode. > > Moreover, I would want this strict/lenient decision to apply to all of > the code modules that I am using, including libraries that I didn't write. > > The problem with a flags argument is that most people aren't going to > bother using it, since it destroys some of the convenience and > simplicity of the string formatting function. Thus, if I am calling a > library function, and that library wasn't written using the flag > argument, then I have no way to control the setting except through some > kind of global. > > A more ideal solution would be to allow some kind of 'security context', > i.e. apply strictness to all code which is running within a given > context. However, I don't know of a way to do that in Python. > > If someone can think of a better way to accomplish this, I'd love to > hear it. > > > 6. Security on the attribute lookups is probably an unending topic. A > > simple > > minimum would be to not allow attribute lookups on names starting with an > > underscore. > > > > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > Python-3000 mailing list > > Python-3000 at python.org > > http://mail.python.org/mailman/listinfo/python-3000 > > Unsubscribe: http://mail.python.org/mailman/options/python-3000/talin%40acm.org > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/mike.verdone%40gmail.com > From brett at python.org Wed Feb 28 23:29:28 2007 From: brett at python.org (Brett Cannon) Date: Wed, 28 Feb 2007 16:29:28 -0600 Subject: [Python-3000] unit test for advanced formatting In-Reply-To: <45E5E7F9.2060004@acm.org> References: <45E5E7F9.2060004@acm.org> Message-ID: On 2/28/07, Talin wrote: [SNIP] > > > > 5. The "strict mode" activation through a global state on the string object > > is a bad idea. I would prefer some sort of "flags" argument passed to each > > function. I would prefer the "strict" mode where exceptions are raised by > > default. But I do not want the strict behavior of requiring all > > arguments to > > be used. > > Here's my primary issue with this: I wanted a way to enable strictness > on an application-wide level, without having to go and individually > revise the many (typically hundreds) of individual calls to the string > formatting function. > > A typical example of what I am talking about here is something like a > web application server, where you have a "development" mode and a > "production" mode. In the development mode, you want to find errors as > quickly as possible, so you enable strict formatting. In production, > however, you want the server to be as fault-tolerant as possible, so you > would enable lenient mode. > > Moreover, I would want this strict/lenient decision to apply to all of > the code modules that I am using, including libraries that I didn't write. > > The problem with a flags argument is that most people aren't going to > bother using it, since it destroys some of the convenience and > simplicity of the string formatting function. Thus, if I am calling a > library function, and that library wasn't written using the flag > argument, then I have no way to control the setting except through some > kind of global. > > A more ideal solution would be to allow some kind of 'security context', > i.e. apply strictness to all code which is running within a given > context. However, I don't know of a way to do that in Python. > > If someone can think of a better way to accomplish this, I'd love to > hear it. Insert a value into __builtin__ and reference that in all format calls. That way it is global to the application if you want it but does not force this level of granularity on people who want a more fine-grained configuation at a per-method level. -Brett From daniel.stutzbach at gmail.com Wed Feb 28 17:00:58 2007 From: daniel.stutzbach at gmail.com (Daniel Stutzbach) Date: Wed, 28 Feb 2007 10:00:58 -0600 Subject: [Python-3000] Draft PEP for New IO system In-Reply-To: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com> References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com> Message-ID: What should Buffered I/O .write() do for a non-blocking object? It seems like the .write() should write as much as it can to the Raw I/O object and buffer the rest, but then how do we tell the Buffered I/O object to "write more data from the buffer but still don't block"? Along the same lines, for a non-blocking Buffer I/O object, how do we specify "Okay, I know I've been writing only one byte a time so you probably haven't bothered writing it to the raw object. Write as much data as you can now, but don't block". Option #1: On a non-blocking object, .flush() writes as much as it can, but won't block. It would need a return value then, to indicate whether the flush completed or not. Option #2: Calling .write() with no arguments causes the Buffer I/O object to flush as much write data to the raw object, but won't block. (For a blocking object, it would block until all data is written to the raw object). I prefer option #2 because a .flush() that doesn't flush is more surprising. The goal of supporting non-blocking file-like objects is to be able to use select() with buffered I/O objects (and other things like a compressed socket stream). -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises LLC