From ericsnowcurrently at gmail.com Fri Mar 1 00:11:04 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 28 Feb 2013 16:11:04 -0700 Subject: [Python-ideas] OrderedDict for kwargs and class statement namespace In-Reply-To: <32fd3c22d72c258ae8e49c7c3fa10f98@chopin.edu.pl> References: <32fd3c22d72c258ae8e49c7c3fa10f98@chopin.edu.pl> Message-ID: On Thu, Feb 28, 2013 at 3:51 PM, Jan Kaliszewski wrote: > While having class namespace ordered sounds very nice, ordered **kwargs > sometimes may not be a desirable feature -- especially when you want to keep > a cooperatively used interface as simple as possible (because of the risk > that keyword argument order could then be taken into consideration by *some* > actors while others would still operate with the assumption that argument > order cannot be known...). You mean like we had with dicts? Now that they are randomized things like docstrings started to break unexpectedly. With **kwargs the OrderedDict is created by the interpreter and passed to the called function. So the the writer of the function is the only one in control of how the ordering is interpreted. Granted, an existing function might, as currently written, expose the ordering or even the kwargs. So that aspect has to be considered. However it would still remain in the complete control of the function how the ordering of **kwargs is exposed. > Ad issue #16991: will the present semantics of inheriting from OrderedDict > be kept untouched? (I mean: which methods call other methods => which > methods you need to override etc.) > > If not we'll have a backward compatibility issue (maybe not very serious but > still...). Any OrderedDict written in C must have identical semantics, including regarding subclassing. I've gone through my implementation on several occasions to check this and I'll probably do so again. Keep in mind that the unit tests for OrderedDict will be run against both the pure Python and C version (see PEP 399). That includes some tests regarding subclassing, though there could probably be a few more of those. Bottom line, if it doesn't quack it's not a duck we want. -eric From ethan at stoneleaf.us Fri Mar 1 00:20:32 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 28 Feb 2013 15:20:32 -0800 Subject: [Python-ideas] OrderedDict for kwargs and class statement namespace In-Reply-To: <512FD16E.30503@nedbatchelder.com> References: <20130228112742.09f5500b@pitrou.net> <20130228164810.74f2d367@pitrou.net> <512F88AD.30306@stoneleaf.us> <512FD16E.30503@nedbatchelder.com> Message-ID: <512FE640.10801@stoneleaf.us> On 02/28/2013 01:51 PM, Ned Batchelder wrote: > > On 2/28/2013 11:41 AM, Ethan Furman wrote: >> On 02/28/2013 08:14 AM, Don Spaulding wrote: >>> >>> >>> >>> On Thu, Feb 28, 2013 at 9:48 AM, Antoine Pitrou > wrote: >>> >>> Le Thu, 28 Feb 2013 09:30:50 -0600, >>> Don Spaulding > >>> a ?crit : >>> > >>> > For an example of the "recommended" way to get the ordering of your >>> > class attributes: >>> >http://stackoverflow.com/questions/3288107/how-can-i-get-fields-in-an-original-order >>> >>> This is already possible with the __prepare__ magic method. >>> http://docs.python.org/3.4/reference/datamodel.html#preparing-the-class-namespace >>> >>> >>> Sure. Case in point: Django has been working around it since at least python 2.4. >>> >>> > It seems to me that the "right thing" for python to do when given an >>> > ordered list of key=value pairs in a function call or class >>> > definition, is to retain the order. So what's an acceptable level of >>> > performance regression for the sake of doing things the "right way" >>> > here? >>> >>> Or, rather, what is the benefit of doing things "the right way"? There >>> are incredibly few cases for relying on the order of key=value pairs in >>> function calls. >>> >>> >>> "If you build it, they will come..." >>> >>> When I originally encountered the need for python to retain the order of kwargs that my caller specified, it surprised >>> me that there wasn't more clamoring for kwargs being an OrderedDict. However, since my development timeline didn't >>> allow for holding the project up while a patch was pushed through python-dev and out into a real python release, I >>> sucked it up, forced my calling code to send in hand-crafted OrderedDicts and called it a day. I think most developers >>> don't even stop to think that the language *could* be different, they just put in the workaround and move on. >>> >>> I think if python stopped dropping the order of kwargs on the floor today, you'd see people start to rely on the order >>> of kwargs tomorrow. >> >> +1 >> >> I'd already be relying on it if it were there. >> > > Could you advance the discussion by elaborating your use case? I've never had need for ordered kwargs, so I'm having a > hard time seeing how they would be useful. I no longer remember my original use-case, but currently I'm working on a command-line parser (I know, there are already plenty -- it's a learning experience) with multiple subcommands, and the order of the subcommands can make a difference. -- ~Ethan~ From christian at python.org Fri Mar 1 00:31:28 2013 From: christian at python.org (Christian Heimes) Date: Fri, 01 Mar 2013 00:31:28 +0100 Subject: [Python-ideas] OrderedDict for kwargs and class statement namespace In-Reply-To: References: Message-ID: <512FE8D0.2010001@python.org> Am 28.02.2013 22:53, schrieb Eric Snow: > Good point. My own use of **kwargs rarely sees the object leave the > function or get very big, and this aspect of it just hadn't come up to > broaden my point of view. I'm glad we're having this discussion. My > intuition is that such a use case would be pretty rare, but even > then... I guess most function calls don't need the feature of ordered kwargs. Could we implement yet another prefix that turns unordered keyword arguments into ordered keyword arguments, e.g. ***ordkwargs (3 *) and METH_VARARGS|METH_KEYWORDS|METH_ORDERED PyMethodDef.ml_flags? That would allow ordered keyword arguments while keeping backward compatibility to existing programs. Only functions that ask for ordered kwargs would have to pay the minor performance penalty, too. I don't know if its feasible or even possible. The interpreter would have to check the function's flags for each method call in order to decide if it has to create an ordinary dict or an OrderedDict. Christian From ericsnowcurrently at gmail.com Fri Mar 1 00:41:28 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 28 Feb 2013 16:41:28 -0700 Subject: [Python-ideas] OrderedDict for kwargs and class statement namespace In-Reply-To: <20130228161437.57f15090@pitrou.net> References: <20130228112742.09f5500b@pitrou.net> <20130228161437.57f15090@pitrou.net> Message-ID: On Thu, Feb 28, 2013 at 8:14 AM, Antoine Pitrou wrote: > And it also has to be reviewed in deep. > To quote you: "The memory-related issues are pushing well past my > experience". Agreed and I appreciate your concern, genuinely. I won't apologize for not having experience in various areas, though I recognize the extra caution it requires. However, I will continue to take opportunities to expand my experience--particularly working on things that others have considered to be good ideas but which no one has advanced. I will continue to do this even if it's slow going and even if my effort eventually bears no fruit other than the experience of having walked that path. Ultimately my goal is to be confident that my fellow stewards feel my contributions are helping Python get better. With OrderedDict, I have no illusions of getting everything done quickly, but do feel that the the bulk of the coding is wrapping up. I suppose that in more experienced hands it would be done quickly, but I'm not asking for that. Rather, I want to get a sense of the applicability of OrderedDict to Python's internals since it would be available as a built-in type. I've presented what I consider as two useful internal applications but would certainly like to know what you think. -eric From solipsis at pitrou.net Fri Mar 1 10:26:54 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 1 Mar 2013 10:26:54 +0100 Subject: [Python-ideas] OrderedDict for kwargs and class statement namespace References: <512FE8D0.2010001@python.org> Message-ID: <20130301102654.5ee2e055@pitrou.net> Le Fri, 01 Mar 2013 00:31:28 +0100, Christian Heimes a ?crit : > Am 28.02.2013 22:53, schrieb Eric Snow: > > Good point. My own use of **kwargs rarely sees the object leave the > > function or get very big, and this aspect of it just hadn't come up > > to broaden my point of view. I'm glad we're having this > > discussion. My intuition is that such a use case would be pretty > > rare, but even then... > > I guess most function calls don't need the feature of ordered kwargs. > Could we implement yet another prefix that turns unordered keyword > arguments into ordered keyword arguments, e.g. ***ordkwargs (3 *) and > METH_VARARGS|METH_KEYWORDS|METH_ORDERED PyMethodDef.ml_flags? That > would allow ordered keyword arguments while keeping backward > compatibility to existing programs. Only functions that ask for > ordered kwargs would have to pay the minor performance penalty, too. Well, you know, the performance concern also applies to pure Python functions, not just C ones ;) Regards Antoine. From solipsis at pitrou.net Fri Mar 1 10:31:10 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 1 Mar 2013 10:31:10 +0100 Subject: [Python-ideas] OrderedDict for kwargs and class statement namespace References: <20130228112742.09f5500b@pitrou.net> <20130228161437.57f15090@pitrou.net> Message-ID: <20130301103110.6e7ba997@pitrou.net> Le Thu, 28 Feb 2013 16:41:28 -0700, Eric Snow a ?crit : > > With OrderedDict, I have no illusions of getting everything done > quickly, but do feel that the the bulk of the coding is wrapping up. > I suppose that in more experienced hands it would be done quickly, but > I'm not asking for that. Rather, I want to get a sense of the > applicability of OrderedDict to Python's internals since it would be > available as a built-in type. I've presented what I consider as two > useful internal applications but would certainly like to know what you > think. Well, the OrderedDict constructor is quite a strong use case, as you pointed out (and the only one I can think of :-)). Still, in an aesthetical sense, I like the idea of the Python dict being a pure unordered hash table. Ordered dicts are good for some use cases (I do use them too), but it sounds a bit wrong to make them the first-class mapping type; perhaps because it would feel like PHP :-) Regards Antoine. From kybinz at gmail.com Fri Mar 1 17:55:22 2013 From: kybinz at gmail.com (=?EUC-KR?B?sei/67rz?=) Date: Sat, 2 Mar 2013 01:55:22 +0900 Subject: [Python-ideas] string.format() default variable assignment Message-ID: why we bother with '{variable}'.format(variable=variable) ? can we just '{variable}.format()' ? if variable is exist, then assign it. if variable is not exist, then raise error I am not language expert. so sorry if this is not a good idea, or already discussed. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dustin at v.igoro.us Fri Mar 1 18:08:56 2013 From: dustin at v.igoro.us (Dustin J. Mitchell) Date: Fri, 1 Mar 2013 12:08:56 -0500 Subject: [Python-ideas] string.format() default variable assignment In-Reply-To: References: Message-ID: On Fri, Mar 1, 2013 at 11:55 AM, ??? wrote: > why we bother with '{variable}'.format(variable=variable) ? > can we just '{variable}.format()' ? > > if variable is exist, then assign it. > if variable is not exist, then raise error > > I am not language expert. so sorry if this is not a good idea, or already > discussed. Explicit is better than implicit. There are also security issues with automatically making all local variables available to a format string, where that format string might come from an untrusted source. If you want the behavior you suggest, you can use '{variable}'.format(**locals()) unless you don't intend to include global variables. I don't recall the behavior of locals() with regard to non-global variables from enclosing scopes. Dustin From random832 at fastmail.us Fri Mar 1 18:11:31 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Fri, 01 Mar 2013 12:11:31 -0500 Subject: [Python-ideas] string.format() default variable assignment In-Reply-To: References: Message-ID: <1362157891.17231.140661198734709.40FDBC36@webmail.messagingengine.com> On Fri, Mar 1, 2013, at 11:55, ??? wrote: why we bother with '{variable}'.format(variable=variable) ? can we just '{variable}.format()' ? if variable is exist, then assign it. if variable is not exist, then raise error I am not language expert. so sorry if this is not a good idea, or already discussed. _______________________________________________ Python-ideas mailing list [1]Python-ideas at python.org [2]http://mail.python.org/mailman/listinfo/python-ideas If you don't want to repeat a name multiple times, just use '{0}'.format(variable) The format function doesn't (i think?) have a way to see your local variables to look up the name. References 1. mailto:Python-ideas at python.org 2. http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Fri Mar 1 19:51:24 2013 From: brett at python.org (Brett Cannon) Date: Fri, 1 Mar 2013 13:51:24 -0500 Subject: [Python-ideas] string.format() default variable assignment In-Reply-To: References: Message-ID: On Fri, Mar 1, 2013 at 11:55 AM, ??? wrote: > why we bother with '{variable}'.format(variable=variable) ? > can we just '{variable}.format()' ? > variable = "Hello, World!" print('{variable}'.format_map(locals())) > > if variable is exist, then assign it. > if variable is not exist, then raise error > > I am not language expert. so sorry if this is not a good idea, or already > discussed. > As Dustin said, Explicit is Better Than Implicit; you don't want a variable to accidentally make your string formatting work by luck because you forgot to pass in an argument you meant to but just so happened to have a variable with the "right" name. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeff at jeffreyjenkins.ca Fri Mar 1 20:29:59 2013 From: jeff at jeffreyjenkins.ca (Jeff Jenkins) Date: Fri, 1 Mar 2013 14:29:59 -0500 Subject: [Python-ideas] string.format() default variable assignment In-Reply-To: References: Message-ID: A friend of mine wrote a library which does this: https://pypi.python.org/pypi/ScopeFormatter It's super handy when doing scripting/debugging. On Fri, Mar 1, 2013 at 1:51 PM, Brett Cannon wrote: > > > > On Fri, Mar 1, 2013 at 11:55 AM, ??? wrote: > >> why we bother with '{variable}'.format(variable=variable) ? >> can we just '{variable}.format()' ? >> > > variable = "Hello, World!" > print('{variable}'.format_map(locals())) > > >> >> if variable is exist, then assign it. >> if variable is not exist, then raise error >> >> I am not language expert. so sorry if this is not a good idea, or already >> discussed. >> > > As Dustin said, Explicit is Better Than Implicit; you don't want a > variable to accidentally make your string formatting work by luck because > you forgot to pass in an argument you meant to but just so happened to have > a variable with the "right" name. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Sat Mar 2 08:16:08 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Sat, 2 Mar 2013 00:16:08 -0700 Subject: [Python-ideas] OrderedDict for kwargs and class statement namespace In-Reply-To: References: Message-ID: On Thu, Feb 28, 2013 at 2:28 PM, Guido van Rossum wrote: > On Thu, Feb 28, 2013 at 1:16 PM, Eric Snow wrote: >> There were a few other reasonable use cases mentioned in other threads >> a while back. I'll dig them up if that would help. > > It would. Other than OrderedDict, I've only found one other thread that had meaningful references: http://mail.python.org/pipermail/python-dev/2012-December/123105.html I know there were at least a couple more. I'll keep digging. -eric From dimaqq at gmail.com Sat Mar 2 14:53:15 2013 From: dimaqq at gmail.com (Dima Tisnek) Date: Sat, 2 Mar 2013 15:53:15 +0200 Subject: [Python-ideas] pep8 clarification, conditional top-level class/function leading newlines Message-ID: Hi, I'm trying to figure out how to space following code according to pep-8: try: import x class A: def foo(self): # magic using x pass except ImportError: import y # different magic, using y typical conditions are try/except and if/elif/else, though I can imagine a true hacker to wrap top-level definitions in with x, possibly even for/else, while/else as well ;-) PEP-8 states to separate top-level class and functions by 2 blank lines and methods in a class by 1 blank line. This case falls into the crack, it's neither strictly top-level, nor a sub-level. option1: semantical, 2 lines before conditional top-levels option2: legalist, 1 line before any indented top-level pep8 tool only accepts option2 I think I would prefer option1; or explicitly leave it up user, then I can call option1 pep-8 compliant and someone else call option2 pep-8 compliant as well. What do you think or prefer? Perhaps this was discussed ages ago and I can't find the traces? Thanks, d. From breamoreboy at yahoo.co.uk Sat Mar 2 15:06:09 2013 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Sat, 02 Mar 2013 14:06:09 +0000 Subject: [Python-ideas] pep8 clarification, conditional top-level class/function leading newlines In-Reply-To: References: Message-ID: On 02/03/2013 13:53, Dima Tisnek wrote: > Hi, > I'm trying to figure out how to space following code according to pep-8: I ignore PEP 8 whenever I feel like it, it's not written in stone, it's simply a guide. > > try: > import x > > > class A: > def foo(self): > # magic using x > pass > except ImportError: > import y > # different magic, using y > > typical conditions are try/except and if/elif/else, though I can > imagine a true hacker to wrap top-level definitions in with x, > possibly even for/else, while/else as well ;-) > > PEP-8 states to separate top-level class and functions by 2 blank > lines and methods in a class by 1 blank line. This case falls into the > crack, it's neither strictly top-level, nor a sub-level. > > option1: semantical, 2 lines before conditional top-levels > > option2: legalist, 1 line before any indented top-level > > pep8 tool only accepts option2 Which tool? Any configuration option that you could set to change the behaviour? > > I think I would prefer option1; or explicitly leave it up user, then I > can call option1 pep-8 compliant and someone else call option2 pep-8 > compliant as well. > > What do you think or prefer? An irrelevance as far as I'm concerned. I know that others have different opinions so I'd better du... > Perhaps this was discussed ages ago and I can't find the traces? > > Thanks, d. > -- Cheers. Mark Lawrence From ncoghlan at gmail.com Sat Mar 2 15:46:34 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 3 Mar 2013 00:46:34 +1000 Subject: [Python-ideas] OrderedDict for kwargs and class statement namespace In-Reply-To: References: Message-ID: On Fri, Mar 1, 2013 at 7:37 AM, Eric Snow wrote: > On Thu, Feb 28, 2013 at 11:02 AM, Nick Coghlan wrote: >> And PEP 422 is designed to make it easier to share a common __prepare__ >> method with different post processing. > > A major use case for __prepare__() is to have the class definition use > an OrderedDict. It's even a point of discussion in PEP 3115. While I > agree with the the conclusion of the PEP that using only OrderedDict > is inferior to __prepare__(), I also think defaulting to OrderedDict > is viable and useful. > > Using __prepare__() necessitates the use of a metaclass, which most > people consider black magic. Even when you can inherit from a class > that has a custom metaclass (like collections.abc.ABC), it still > necessitates inheritance and the chance for metaclass conflicts. While > I'm on board with PEP 422, I'm not clear on how it helps here. (catching up after moving house, haven't read the whole thread) PEP 422 would make it useful to also add a "namespace" meta argument to type.__prepare__ to give it a namespace instance to return. Then, all uses of such a OrderedDict based metaclass can be replaced by: class MyClass(namespace=OrderedDict()): @classmethod def __init_class__(cls): # The class namespace is the one we passed in! You could pass in a factory function instead, but I think that's a net loss for readability (you would lose the trailing "()" from the empty namespace case, but have to add "lambda:" or "functools.partial" to the prepopulated namespace case) Even if type wasn't modified, you could create your own metaclass that accepted a namespace and returned it: class UseNamespace(type): def __prepare__(cls, namespace): return namespace class MyClass(metaclass=UseNamespace, namespace=OrderedDict()) @classmethod def __init_class__(cls): # The class namespace is the one we passed in! I prefer the approach of adding the "namespace" argument to PEP 422, though, since it makes __init_class__ a far more powerful and compelling idea, and between them the two ideas should cover every metaclass use case that *only* customises creation rather than ongoing behaviour. I actually had this idea a week or so ago, but packaging discussions and moving meant I had postponed writing it up and posting it. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sat Mar 2 15:51:48 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 3 Mar 2013 00:51:48 +1000 Subject: [Python-ideas] OrderedDict for kwargs and class statement namespace In-Reply-To: References: Message-ID: On Sun, Mar 3, 2013 at 12:46 AM, Nick Coghlan wrote: > class UseNamespace(type): > def __prepare__(cls, namespace): > return namespace Oops, that signature is incorrect. Assume it's tweaked appropriately to accept the normal __prepare__ arguments and still retrieve the "namespace" setting from the class header. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From jbvsmo at gmail.com Sat Mar 2 22:04:30 2013 From: jbvsmo at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Bernardo?=) Date: Sat, 2 Mar 2013 18:04:30 -0300 Subject: [Python-ideas] Experimental package Message-ID: I was thinking about "yield from" and how many times I wanted to use the feature, but because of backward compatibilities, I couldn't. It's sad that it may take possibly 5 or 10 years to see this statement being used in real programs... The PEP 380was proposed while Python3.1 was still alpha or beta and it took two and a half years to be accepted, but once it was, Python3.2 was already released. I don't know if this was proposed before, but why can't Python have an "__experimental__" magic package for this kind of stuff? The __future__ thing already add features that are sure to be included on a future release, but why those that *may *become official have to suffer that much? What if an experimental feature becomes rejected? from __experimental__ import foo DeprecationWarning: "foo" will be removed on Python 4.0 Note that* the idea is not for people to use experimental features* when they appear, but to have a fallback for when the feature becomes official! As side effect, the feature will be more tested and become more mature when released. -- Jo?o Bernardo -------------- next part -------------- An HTML attachment was scrubbed... URL: From dwblas at gmail.com Sat Mar 2 22:21:25 2013 From: dwblas at gmail.com (David Blaschke) Date: Sat, 2 Mar 2013 13:21:25 -0800 Subject: [Python-ideas] pep8 clarification, conditional top-level class/function leading newlines In-Reply-To: References: Message-ID: I generally use something like (assuming either x or y will always exist) try: import x as value except ImportError: import y as value class A: def foo(self): # magic using value A real world example try: import Tkinter as tk ## Python 2.x except ImportError: import tkinter as tk ## Python 3.x From tjreedy at udel.edu Sat Mar 2 23:23:44 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 02 Mar 2013 17:23:44 -0500 Subject: [Python-ideas] Experimental package In-Reply-To: References: Message-ID: On 3/2/2013 4:04 PM, Jo?o Bernardo wrote: > I was thinking about "yield from" and how many times I wanted to use the > feature, but because of backward compatibilities, I couldn't. Load 3.3 and use it. If you need an external 3.x library that will not run on 3.3 yet, 6 months after release, bug the author. If you want your code to run on earlier releases, select new and old versions with 'if version....'/ > It's sad that it may take possibly 5 or 10 years to see this statement > being used in real programs... But it will not take that long. 'yield from' plays an essential role in Guido's new asynch package (code name: Tulip), which he hope will be ready for 3.4. It will probably also run in 3.3, but that will be it. People who want to use it will have to upgrade. > I don't know if this was proposed before, but why can't Python have an > "__experimental__" magic package for this kind of stuff? Experimental modules and packages are usually available on PyPI. I think experimental syntax that might be removed is a BAD idea. Getting rid of things that were not experimental is bad enough. > The __future__ thing already add features that are sure to be included > on a future release, Actually, the feature is included in the release that adds the __future__ option. It is just not included *by default*. This is only done when the new feature changes the meaning of legal existing code. (Take a look at the 2.x examples*.) So the __future__ import allows the old deprecated meaning to still be used while simultaneously making the great new meaning available *immediately* to those willing to explicitly disable the old meaning. When 'yield from' was ready to be added, it was just added because 'yield' and 'from' were already keywords and 'yield from' was previously a syntax error. We have not yet added any new future imports in 3.x. The existing future imports from 2.x have no effect since 3.0 incorporated all existing __future__ changes. * There are only 7 future features. Generaters required a future transition period because making 'yield' a keyword invalidated uses of 'yield' as an identifier (as would be common in finance software). > Note that*the idea is not for people to use experimental features* when > they appear, but to have a fallback for when the feature becomes official! The details of a new feature are not fixed until they are fixed, when it becomes official be being added. Even future imports are not backported as bugfix releases do not get new features. -- Terry Jan Reedy From alan at breakrs.com Sat Mar 2 23:45:17 2013 From: alan at breakrs.com (Alan Johnson) Date: Sat, 2 Mar 2013 17:45:17 -0500 Subject: [Python-ideas] One-line "try with" statement Message-ID: <15135910-8FB4-4B82-A93E-41CA48086A1B@breakrs.com> It seems to me that one of the intended uses of the with statement was to automate simple initialization and deinitialization, which often accompanies a try block. It wouldn't be a game changing thing by any means, but has anybody ever thought about allowing a "try with" statement on one line? So instead of: try: with context_manager(): ? bunch of code ? except: ? exception handler ? you would have: try with context_manager(): ? bunch of code ? except: ? exception handler ? I envision the two examples being equivalent, the principle benefits being readability and one less indention level for the with block code. So a similar justification to "lower < x < upper" idiom. With standard 4 space indentation, existing with statements at the top of try blocks wouldn't even be any closer to the right margin. I'm no expert in Python interpreters, but it seems like a simple one-step internal conversion whenever this proposed syntax is encountered. But obviously, it would involve that change to every interpreter in existence with no actual new functionality, so I'm sensitive to that. Anyway, just a thought. -- Alan Johnson Cofounder | Breakrs.com 347-630-2036 | alan at breakrs.com From solipsis at pitrou.net Sun Mar 3 00:51:29 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 3 Mar 2013 00:51:29 +0100 Subject: [Python-ideas] Updated PEP 428 (pathlib) Message-ID: <20130303005129.26eb0e00@pitrou.net> Hello, I've updated PEP 428 following the previous discussion. Highlights: - the operator for combining paths is now `/`: >>> p / PurePosixPath('bar') PurePosixPath('foo/bar') >>> 'bar' / p PurePosixPath('bar/foo') - the method for combining paths is now named `joinpath` - new as_uri() method to represent a path as a `file` URI http://www.python.org/dev/peps/pep-0428/ Regards Antoine. From steve at pearwood.info Sun Mar 3 01:01:31 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 03 Mar 2013 11:01:31 +1100 Subject: [Python-ideas] pep8 clarification, conditional top-level class/function leading newlines In-Reply-To: References: Message-ID: <513292DB.6060705@pearwood.info> On 03/03/13 01:06, Mark Lawrence wrote: > On 02/03/2013 13:53, Dima Tisnek wrote: > > Hi, > > I'm trying to figure out how to space following code according to pep-8: > > I ignore PEP 8 whenever I feel like it, it's not written in stone, it's simply a guide. Technically you don't, since PEP 8 states to break the rules when needed, so even when you break it you are obeying it :-) > > try: > > import x > > > > > > class A: > > def foo(self): > > # magic using x > > pass > > except ImportError: > > import y > > # different magic, using y Where possible, I would write that as: try: import x except ImportError: import y as x class A: # unconditional magic using x Another variation: try: from x import A except ImportError: import y class A: # magic using y ... I must admit I've never come across your variation, but if I did: try: import x class A: # why is this in the try block? pass except ImportError: import y class B: pass ... [...] > > PEP-8 states to separate top-level class and functions by 2 blank > > lines and methods in a class by 1 blank line. This case falls into the > > crack, it's neither strictly top-level, nor a sub-level. Note that PEP 8 states "top level", not "global". That means, no leading indentation. So a conditional class definition falls into the "1 line between indented classes and function" bucket. But frankly, I would make a final judgement only after actually typing up the code and looking at it. -- Steven From jbvsmo at gmail.com Sun Mar 3 02:00:06 2013 From: jbvsmo at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Bernardo?=) Date: Sat, 2 Mar 2013 22:00:06 -0300 Subject: [Python-ideas] Experimental package In-Reply-To: References: Message-ID: Jo?o Bernardo 2013/3/2 Terry Reedy > On 3/2/2013 4:04 PM, Jo?o Bernardo wrote: > >> I was thinking about "yield from" and how many times I wanted to use the >> feature, but because of backward compatibilities, I couldn't. >> > > Load 3.3 and use it. If you need an external 3.x library that will not run > on 3.3 yet, 6 months after release, bug the author. If you want your code > to run on earlier releases, select new and old versions with 'if > version....'/ > > It's sad that it may take possibly 5 or 10 years to see this statement >> being used in real programs... >> > > But it will not take that long. 'yield from' plays an essential role in > Guido's new asynch package (code name: Tulip), which he hope will be ready > for 3.4. It will probably also run in 3.3, but that will be it. People who > want to use it will have to upgrade. > > Writing new stuff for the stdlib doesn't need to be compatible with older python versions... I develop on 3.3 but need to support 3.1 or 3.0. > I don't know if this was proposed before, but why can't Python have an >> "__experimental__" magic package for this kind of stuff? >> > > Experimental modules and packages are usually available on PyPI. I think > experimental syntax that might be removed is a BAD idea. Getting rid of > things that were not experimental is bad enough. > > If something is experimental, people are advised against using it for production. The idea here is to make forward compatibility possible. > Note that*the idea is not for people to use experimental features* when >> >> they appear, but to have a fallback for when the feature becomes official! >> > > The details of a new feature are not fixed until they are fixed, when it > becomes official be being added. Even future imports are not backported as > bugfix releases do not get new features. > > It is not about backporting features. Think about the "with" statement: It was added on version 2.5 (not as default) then it changed on version 2.7 to allow multiple contexts. You can write code compatible with 2.5, 2.6 and 2.7 by just restricting yourself to use only one context per block. Now, if you had a partial version of "yield from" syntax on 3.1 that could solve 50% of the problems of the current syntax, it would be used a lot by now. The current problem is that on older versions, using it gives *SyntaxError *that cannot be bypassed by a try...except... statement! Having the syntax available makes possible to do: if sys.version_info >= (3,3): yield from foo() else: bar() -------------- next part -------------- An HTML attachment was scrubbed... URL: From cf.natali at gmail.com Sun Mar 3 09:46:16 2013 From: cf.natali at gmail.com (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=) Date: Sun, 3 Mar 2013 09:46:16 +0100 Subject: [Python-ideas] Updated PEP 428 (pathlib) In-Reply-To: <20130303005129.26eb0e00@pitrou.net> References: <20130303005129.26eb0e00@pitrou.net> Message-ID: > Hello, Hi, > I've updated PEP 428 following the previous discussion. I really look forward to PEP 428 acceptance :-) Just a couple remarks: >>> p = PureNTPath('c:/Downloads/pathlib.tar.gz') >>> p.name 'pathlib.tar.gz' >>> p.basename 'pathlib.tar' >>> p.suffix '.gz' I find the 'p.basename' name confusing: following POSIX conventions, 'basename' should be 'pathlib.tar.gz'. I don't have another 'name' to propose for the stripped name, though. > match() matches the path against a glob pattern: I think it could be interesting to add an optional argument to mitigate glob-based DoS. Whether this should be made default is left as an exercise to the reader :-) cf From tomasz.rybak at post.pl Sun Mar 3 11:32:16 2013 From: tomasz.rybak at post.pl (Tomasz Rybak) Date: Sun, 03 Mar 2013 11:32:16 +0100 Subject: [Python-ideas] One-line "try with" statement In-Reply-To: <15135910-8FB4-4B82-A93E-41CA48086A1B@breakrs.com> References: <15135910-8FB4-4B82-A93E-41CA48086A1B@breakrs.com> Message-ID: <1362306736.5045.3.camel@rogue.dyndns.info> Dnia 2013-03-02, sob o godzinie 17:45 -0500, Alan Johnson pisze: > It seems to me that one of the intended uses of the with statement was to automate simple initialization and deinitialization, which often accompanies a try block. It wouldn't be a game changing thing by any means, but has anybody ever thought about allowing a "try with" statement on one line? So instead of: > > try: > with context_manager(): > ? bunch of code ? > except: > ? exception handler ? > > you would have: > > try with context_manager(): > ? bunch of code ? > except: > ? exception handler ? > > I envision the two examples being equivalent, the principle benefits being readability and one less indention level for the with block code. So a similar justification to "lower < x < upper" idiom. With standard 4 space indentation, existing with statements at the top of try blocks wouldn't even be any closer to the right margin. > > I'm no expert in Python interpreters, but it seems like a simple one-step internal conversion whenever this proposed syntax is encountered. But obviously, it would involve that change to every interpreter in existence with no actual new functionality, so I'm sensitive to that. Anyway, just a thought. Isn't context manager supposed to deal with exceptions by itself? If I understand things correctly with context manager you do not need try/except - context manager will deal with exceptions in __exit__. Regards. -- Tomasz Rybak GPG/PGP key ID: 2AD5 9860 Fingerprint A481 824E 7DD3 9C0E C40A 488E C654 FB33 2AD5 9860 http://member.acm.org/~tomaszrybak -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: This is a digitally signed message part URL: From solipsis at pitrou.net Sun Mar 3 11:41:12 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 3 Mar 2013 11:41:12 +0100 Subject: [Python-ideas] Updated PEP 428 (pathlib) References: <20130303005129.26eb0e00@pitrou.net> Message-ID: <20130303114112.403019c6@pitrou.net> On Sun, 3 Mar 2013 09:46:16 +0100 Charles-Fran?ois Natali wrote: > > >>> p = PureNTPath('c:/Downloads/pathlib.tar.gz') > >>> p.name > 'pathlib.tar.gz' > >>> p.basename > 'pathlib.tar' > >>> p.suffix > '.gz' > > I find the 'p.basename' name confusing: following POSIX conventions, > 'basename' should be 'pathlib.tar.gz'. I don't have another 'name' to > propose for the stripped name, though. Yes. We could call it "root" (http://en.wikipedia.org/wiki/Root_%28linguistics%29) but in this context it would be confusing. Also, it's not exactly the root since as you point there can still be a remaining suffix. There's "stem", too (http://en.wikipedia.org/wiki/Word_stem). With the same provision about not being the actual stem. > > match() matches the path against a glob pattern: > > I think it could be interesting to add an optional argument to > mitigate glob-based DoS. Whether this should be made default is left > as an exercise to the reader :-) You mean for glob() (match() is just a regex-like matcher, it doesn't do any I/O). Yes, I think we could add a `allow_recursive` argument. Is there any other DoS issue? Regards Antoie. From robertc at robertcollins.net Sun Mar 3 12:19:36 2013 From: robertc at robertcollins.net (Robert Collins) Date: Mon, 4 Mar 2013 00:19:36 +1300 Subject: [Python-ideas] BufferedIO and detach Message-ID: There doesn't seem to be a way to safely use detach() on stdin - I'd like to get down to the raw stream, but after calling detach(), the initial BufferedIOReader is unusable - so you cannot retrieve any buffered content) - and unless you detach(), you can't guarantee that the buffer will ever be empty. I presume I'm missing something, but if there was a read([n], buffered_only=False) call, which you could invoke with buffered_only=True, then it would be possible to get out of this situation. -Rob -- Robert Collins Distinguished Technologist HP Cloud Services From stefan_ml at behnel.de Sun Mar 3 13:15:54 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 03 Mar 2013 13:15:54 +0100 Subject: [Python-ideas] One-line "try with" statement In-Reply-To: <1362306736.5045.3.camel@rogue.dyndns.info> References: <15135910-8FB4-4B82-A93E-41CA48086A1B@breakrs.com> <1362306736.5045.3.camel@rogue.dyndns.info> Message-ID: Tomasz Rybak, 03.03.2013 11:32: > Dnia 2013-03-02, sob o godzinie 17:45 -0500, Alan Johnson pisze: >> It seems to me that one of the intended uses of the with statement was to automate simple initialization and deinitialization, which often accompanies a try block. It wouldn't be a game changing thing by any means, but has anybody ever thought about allowing a "try with" statement on one line? So instead of: >> >> try: >> with context_manager(): >> ? bunch of code ? >> except: >> ? exception handler ? >> >> you would have: >> >> try with context_manager(): >> ? bunch of code ? >> except: >> ? exception handler ? >> >> I envision the two examples being equivalent, the principle benefits being readability and one less indention level for the with block code. So a similar justification to "lower < x < upper" idiom. With standard 4 space indentation, existing with statements at the top of try blocks wouldn't even be any closer to the right margin. >> >> I'm no expert in Python interpreters, but it seems like a simple one-step internal conversion whenever this proposed syntax is encountered. But obviously, it would involve that change to every interpreter in existence with no actual new functionality, so I'm sensitive to that. Anyway, just a thought. > > Isn't context manager supposed to deal with exceptions by itself? > If I understand things correctly with context manager you > do not need try/except - context manager will deal with > exceptions in __exit__. Yes, that's the main idea. The above example therefore strikes me as useless. If you need a try-except around a with block, then your context manager is doing something wrong. Stefan From ncoghlan at gmail.com Sun Mar 3 14:13:49 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 3 Mar 2013 23:13:49 +1000 Subject: [Python-ideas] Experimental package In-Reply-To: References: Message-ID: "yield from" is just syntactic sugar for stuff you can already effectively do with a few appropriate helper functions and an event loop - otherwise Twisted's inline deferreds wouldn't work. The exclusion of PEP 380 from 3.2 was due to the moratorium imposed on language changes in 3.2, to give the broader Python community a chance to catch up with the Python 3 transition. Cheers, Nick. On 3 Mar 2013 11:01, "Jo?o Bernardo" wrote: > > > Jo?o Bernardo > > > 2013/3/2 Terry Reedy > >> On 3/2/2013 4:04 PM, Jo?o Bernardo wrote: >> >>> I was thinking about "yield from" and how many times I wanted to use the >>> feature, but because of backward compatibilities, I couldn't. >>> >> >> Load 3.3 and use it. If you need an external 3.x library that will not >> run on 3.3 yet, 6 months after release, bug the author. If you want your >> code to run on earlier releases, select new and old versions with 'if >> version....'/ >> >> It's sad that it may take possibly 5 or 10 years to see this statement >>> being used in real programs... >>> >> >> But it will not take that long. 'yield from' plays an essential role in >> Guido's new asynch package (code name: Tulip), which he hope will be ready >> for 3.4. It will probably also run in 3.3, but that will be it. People who >> want to use it will have to upgrade. >> >> > Writing new stuff for the stdlib doesn't need to be compatible with older > python versions... I develop on 3.3 but need to support 3.1 or 3.0. > > >> I don't know if this was proposed before, but why can't Python have an >>> "__experimental__" magic package for this kind of stuff? >>> >> >> Experimental modules and packages are usually available on PyPI. I think >> experimental syntax that might be removed is a BAD idea. Getting rid of >> things that were not experimental is bad enough. >> >> > If something is experimental, people are advised against using it for > production. The idea here is to make forward compatibility possible. > > >> Note that*the idea is not for people to use experimental features* when >>> >>> they appear, but to have a fallback for when the feature becomes >>> official! >>> >> >> The details of a new feature are not fixed until they are fixed, when it >> becomes official be being added. Even future imports are not backported as >> bugfix releases do not get new features. >> >> > It is not about backporting features. Think about the "with" statement: > > It was added on version 2.5 (not as default) then it changed on version > 2.7 to allow multiple contexts. > You can write code compatible with 2.5, 2.6 and 2.7 by just restricting > yourself to use only one context per block. > > Now, if you had a partial version of "yield from" syntax on 3.1 that could > solve 50% of the problems of the current syntax, it would be used a lot by > now. > The current problem is that on older versions, using it gives *SyntaxError > *that cannot be bypassed by a try...except... statement! > > Having the syntax available makes possible to do: > > if sys.version_info >= (3,3): > yield from foo() > else: > bar() > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sun Mar 3 14:24:06 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 3 Mar 2013 05:24:06 -0800 Subject: [Python-ideas] One-line "try with" statement In-Reply-To: References: <15135910-8FB4-4B82-A93E-41CA48086A1B@breakrs.com> <1362306736.5045.3.camel@rogue.dyndns.info> Message-ID: <04F2F204-22CC-406D-A033-613BEE889854@yahoo.com> On Mar 3, 2013, at 4:15, Stefan Behnel wrote: > Tomasz Rybak, 03.03.2013 11:32: >> Dnia 2013-03-02, sob o godzinie 17:45 -0500, Alan Johnson pisze: >>> It seems to me that one of the intended uses of the with statement was to automate simple initialization and deinitialization, which often accompanies a try block. It wouldn't be a game changing thing by any means, but has anybody ever thought about allowing a "try with" statement on one line? So instead of: >>> >>> try: >>> with context_manager(): >>> ? bunch of code ? >>> except: >>> ? exception handler ? >>> >>> you would have: >>> >>> try with context_manager(): >>> ? bunch of code ? >>> except: >>> ? exception handler ? >>> >>> I envision the two examples being equivalent, the principle benefits being readability and one less indention level for the with block code. So a similar justification to "lower < x < upper" idiom. With standard 4 space indentation, existing with statements at the top of try blocks wouldn't even be any closer to the right margin. >>> >>> I'm no expert in Python interpreters, but it seems like a simple one-step internal conversion whenever this proposed syntax is encountered. But obviously, it would involve that change to every interpreter in existence with no actual new functionality, so I'm sensitive to that. Anyway, just a thought. >> >> Isn't context manager supposed to deal with exceptions by itself? >> If I understand things correctly with context manager you >> do not need try/except - context manager will deal with >> exceptions in __exit__. > > Yes, that's the main idea. The above example therefore strikes me as > useless. If you need a try-except around a with block, then your context > manager is doing something wrong. A try-finally, sure, but a try-except is perfectly reasonable, idiomatic, and common. For example, if you do "with open(path) as f:" the context manager doesn't (and shouldn't) do anything to protect you from a FileNotFoundError in the open, or an IOError reading inside the block. If you want to, say, log, or try a backup file, how else would you handle that but a with inside a try? That being said, I'm not sure this is necessary. For something you do repeatedly, you can always write a wrapper function. For something you only do once, I'm not sure the extra indent is that terrible. From abarnert at yahoo.com Sun Mar 3 14:45:23 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 3 Mar 2013 05:45:23 -0800 Subject: [Python-ideas] Experimental package In-Reply-To: References: Message-ID: On Mar 2, 2013, at 17:00, Jo?o Bernardo wrote: > 2013/3/2 Terry Reedy >> >> Load 3.3 and use it. If you need an external 3.x library that will not run on 3.3 yet, 6 months after release, bug the author. If you want your code to run on earlier releases, select new and old versions with 'if version....'/ >> >>> It's sad that it may take possibly 5 or 10 years to see this statement >>> being used in real programs... >> >> But it will not take that long. 'yield from' plays an essential role in Guido's new asynch package (code name: Tulip), which he hope will be ready for 3.4. It will probably also run in 3.3, but that will be it. People who want to use it will have to upgrade. > > Writing new stuff for the stdlib doesn't need to be compatible with older python versions... I develop on 3.3 but need to support 3.1 or 3.0. Is there really that much need to support 3.0? I write stuff all the time that requires 2.6, 2.7, or 3.2 or later, and I've had many people asking for 2.5, but not a single request for 3.1 or 3.0. Is that not typical? > Now, if you had a partial version of "yield from" syntax on 3.1 that could solve 50% of the problems of the current syntax, it would be used a lot by now. But there wasn't a working version, partial or otherwise, to add at the time. And even if the usual rule of "no backports to bug fix releases" we're suspended, do you really have users who would gladly upgrade to a later 3.1, but can't upgrade to a later 3.x? In my experience, people who stick with an old version are doing it because "that's the version that comes with CentOS x.y" or similar. Are there any important OS/distro extended service releases that come with 3.1? Meanwhile, if you need a workaround, it's not that hard. I've got code that does this today: I have my nifty 3.3 module, and the fallback is in a separate module, so I can "import foo33 as foo" or "import foo26 as foo" as appropriate. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cf.natali at gmail.com Sun Mar 3 15:12:09 2013 From: cf.natali at gmail.com (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=) Date: Sun, 3 Mar 2013 15:12:09 +0100 Subject: [Python-ideas] Updated PEP 428 (pathlib) In-Reply-To: <20130303114112.403019c6@pitrou.net> References: <20130303005129.26eb0e00@pitrou.net> <20130303114112.403019c6@pitrou.net> Message-ID: > Yes. We could call it > "root" (http://en.wikipedia.org/wiki/Root_%28linguistics%29) but in > this context it would be confusing. Also, it's not exactly the root > since as you point there can still be a remaining suffix. Indeed, "root" would be even more confusing. > There's "stem", too (http://en.wikipedia.org/wiki/Word_stem). With the > same provision about not being the actual stem. Also, it doesn't sound familiar (at least to me). How about "rootname", or "stripped_name" (the last one is a little too long)? > You mean for glob() (match() is just a regex-like matcher, it doesn't > do any I/O). Yes, I meant glob() (fnmatch() implementations can also be subject to DoS through stack exhaustion, but Python's implementation is based on regex). > Yes, I think we could add a `allow_recursive` argument. > Is there any other DoS issue? If by recursive you mean the '**' pattern (cross-directory match), then I'm afraid that's not enough. For example, a pattern like '*/../*/../*/../*/../*' would have the same problem: """ $ mkdir -p /tmp/foo/a /tmp/foo/b $ ~/python/cpython/python -c "from pathlib import *; p = Path('/tmp/foo'); print(list(p.glob('*/../*/../*/../*')))" [PosixPath('/tmp/foo/a/../a/../a/../a'), PosixPath('/tmp/foo/a/../a/../a/../b'), PosixPath('/tmp/foo/a/../a/../b/../a'), PosixPath('/tmp/foo/a/../a/../b/../b'), PosixPath('/tmp/foo/a/../b/../a/../a'), PosixPath('/tmp/foo/a/../b/../a/../b'), PosixPath('/tmp/foo/a/../b/../b/../a'), PosixPath('/tmp/foo/a/../b/../b/../b'), PosixPath('/tmp/foo/b/../a/../a/../a'), PosixPath('/tmp/foo/b/../a/../a/../b'), PosixPath('/tmp/foo/b/../a/../b/../a'), PosixPath('/tmp/foo/b/../a/../b/../b'), PosixPath('/tmp/foo/b/../b/../a/../a'), PosixPath('/tmp/foo/b/../b/../a/../b'), PosixPath('/tmp/foo/b/../b/../b/../a'), PosixPath('/tmp/foo/b/../b/../b/../b')] """ cf From jbvsmo at gmail.com Sun Mar 3 16:42:56 2013 From: jbvsmo at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Bernardo?=) Date: Sun, 3 Mar 2013 12:42:56 -0300 Subject: [Python-ideas] Experimental package In-Reply-To: References: Message-ID: > > Is there really that much need to support 3.0? I write stuff all the time > that requires 2.6, 2.7, or 3.2 or later, and I've had many people asking > for 2.5, but not a single request for 3.1 or 3.0. Is that not typical? > > Ubuntu LTS. Also when CentOS/RedHat start using Py3k, they will probably choose the oldest possible release like they always do with everything... > Now, if you had a partial version of "yield from" syntax on 3.1 that could > solve 50% of the problems of the current syntax, it would be used a lot by > now. > > > But there wasn't a working version, partial or otherwise, to add at the > time. > > Because the PEP hasn't been accepted at the time and there was no way to add experimental stuff to the language. > And even if the usual rule of "no backports to bug fix releases" we're > suspended, do you really have users who would gladly upgrade to a later > 3.1, but can't upgrade to a later 3.x? In my experience, people who stick > with an old version are doing it because "that's the version that comes > with CentOS x.y" or similar. Are there any important OS/distro extended > service releases that come with 3.1? > > Meanwhile, if you need a workaround, it's not that hard. I've got code > that does this today: I have my nifty 3.3 module, and the fallback is in a > separate module, so I can "import foo33 as foo" or "import foo26 as foo" as > appropriate. > If I wanted to write a lot of boring duplicated code, I ought to use Java instead. If I can't write it in a single code base (either using "2to3" or "six" or similar) I don't write it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From benjamin at python.org Sun Mar 3 17:16:34 2013 From: benjamin at python.org (Benjamin Peterson) Date: Sun, 3 Mar 2013 16:16:34 +0000 (UTC) Subject: [Python-ideas] BufferedIO and detach References: Message-ID: Robert Collins writes: > > There doesn't seem to be a way to safely use detach() on stdin - I'd > like to get down to the raw stream, but after calling detach(), the > initial BufferedIOReader is unusable - so you cannot retrieve any > buffered content) - and unless you detach(), you can't guarantee that > the buffer will ever be empty. Presumably if you call it before anyone else has had a chance to read from it, you should be okay. From abarnert at yahoo.com Sun Mar 3 17:53:31 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 3 Mar 2013 08:53:31 -0800 Subject: [Python-ideas] Experimental package In-Reply-To: References: Message-ID: On Mar 3, 2013, at 7:42, Jo?o Bernardo wrote: >> Meanwhile, if you need a workaround, it's not that hard. I've got code that does this today: I have my nifty 3.3 module, and the fallback is in a separate module, so I can "import foo33 as foo" or "import foo26 as foo" as appropriate. > > If I wanted to write a lot of boring duplicated code, I ought to use Java instead. If I can't write it in a single code base (either using "2to3" or "six" or similar) I don't write it. > Most of the time, you can just use "for x in foo: yield foo", so you really don't need a workaround. But when that's not appropriate (e.g., for performance reasons), you generally have to implement things pretty differently anyway. You're already writing the code twice; fooling yourself into thinking otherwise doesn't help. More importantly, you don't have to duplicate the entire module. Factor out the part that's version dependent, and create two tiny modules for the two different implementations. It ends up being two more lines of code, one level less indented, and more readable. Where's the harm? -------------- next part -------------- An HTML attachment was scrubbed... URL: From cf.natali at gmail.com Sun Mar 3 18:02:57 2013 From: cf.natali at gmail.com (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=) Date: Sun, 3 Mar 2013 18:02:57 +0100 Subject: [Python-ideas] speeding up shutil.copy*() Message-ID: shutil.copy*() use copyfileobj(): """ while 1: buf = fsrc.read(length) if not buf: break fdst.write(buf) """ This allocates and frees a lot of buffers, and could be optimized with readinto(). Unfortunately, I don't think we can change copyfileobj(), because it might be passed objects that don't implement readinto(). By implementing it directly in copyfile() (it would probably be better to expose it in shutil to make it available to tarfile & Co), there's a modest improvement: $ dd if=/dev/zero of=/tmp/foo bs=1M count=100 Without patch: $ ./python -m timeit -s "import shutil" "shutil.copyfile('/tmp/foo', '/dev/null')" 10 loops, best of 3: 218 msec per loop With readinto(): $ ./python -m timeit -s "import shutil" "shutil.copyfile('/tmp/foo', '/dev/null')" 10 loops, best of 3: 202 msec per loop (I'm using /dev/null as target because my hdd is really slow: other benchmarks are welcome, just beware that /tmp might be tmpfs). I've also written a dirty patch to use sendfile(). Here, the improvement is really significant: With sendfile(): $ ./python -m timeit -s "import shutil" "shutil.copyfile('/tmp/foo', '/dev/null')" 100 loops, best of 3: 5.39 msec per loop Thoughts? cf From christian at python.org Sun Mar 3 19:03:55 2013 From: christian at python.org (Christian Heimes) Date: Sun, 03 Mar 2013 19:03:55 +0100 Subject: [Python-ideas] speeding up shutil.copy*() In-Reply-To: References: Message-ID: Am 03.03.2013 18:02, schrieb Charles-Fran?ois Natali: > I've also written a dirty patch to use sendfile(). Here, the > improvement is really significant: > > With sendfile(): > $ ./python -m timeit -s "import shutil" "shutil.copyfile('/tmp/foo', > '/dev/null')" > 100 loops, best of 3: 5.39 msec per loop > > Thoughts? sendfile() is a Linux-only syscall. It's also limited to certain kinds of file descriptors. The limitations have been lifted in recent kernel versions. http://linux.die.net/man/2/sendfile TL;DR the input fd must support mmap. The output fd used to be socket fd only, since 2.6.33 sendfile() supports any fd as output fd. From tjreedy at udel.edu Sun Mar 3 19:27:46 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 03 Mar 2013 13:27:46 -0500 Subject: [Python-ideas] One-line "try with" statement In-Reply-To: <15135910-8FB4-4B82-A93E-41CA48086A1B@breakrs.com> References: <15135910-8FB4-4B82-A93E-41CA48086A1B@breakrs.com> Message-ID: On 3/2/2013 5:45 PM, Alan Johnson wrote: > It seems to me that one of the intended uses of the with statement > was to automate simple initialization and deinitialization, which > often accompanies a try block. It wouldn't be a game changing thing > by any means, but has anybody ever thought about allowing a "try > with" statement on one line? So instead of: > > try: > with context_manager(): > ? bunch of code ? > except: > ? exception handler ? > > you would have: > > try with context_manager(): > ? bunch of code ? > except: > ? exception handler ? > > I envision the two examples being equivalent, the principle benefits > being readability To me it is less readable. And it only works when the with statement is the entire suite for the try: part. > and one less indention level for the with block There is no end of possible combinations of statements and others have been proposed with the same justification - saving an indent level -- and rejected. If indent level is really a problem, use fewer spaces per indent, or pull highly indented blocks into a separate function. -- Terry Jan Reedy From cf.natali at gmail.com Sun Mar 3 19:40:04 2013 From: cf.natali at gmail.com (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=) Date: Sun, 3 Mar 2013 19:40:04 +0100 Subject: [Python-ideas] speeding up shutil.copy*() In-Reply-To: References: Message-ID: > This allocates and frees a lot of buffers, and could be optimized with > readinto(). > Unfortunately, I don't think we can change copyfileobj(), because it > might be passed objects that don't implement readinto(). Or we could just use: if hasattr(fileobj, 'readinto') hoping that readinto() is really a readinto() implementation and not an unrelated method :-) > sendfile() is a Linux-only syscall. It's also limited to certain kinds > of file descriptors. The limitations have been lifted in recent kernel > versions. No, it's not Linux-only, many BSD also have it, although all don't support an arbitrary output file descriptor (Solaris does allow regular files too). It would be possible to catch EINVAL/EBADF, and fall back to a regular copy loop. Note that the above benchmark is really biased by writing the data to /dev/null: with a real target file, the zero-copy wouldn't bring such a large gain, because the bottleneck will really be the I/O devices (also a read()/write() loop is more expensive in Python than in C). But I see at least two cases where it could be interesting: when reading/writing from/to a tmpfs partition, or when the source and target files are on different disks. I'm not sure it's worth it though, that's why I'm asking here :-) (but I do think readinto() is interesting). From python at mrabarnett.plus.com Sun Mar 3 19:40:34 2013 From: python at mrabarnett.plus.com (MRAB) Date: Sun, 03 Mar 2013 18:40:34 +0000 Subject: [Python-ideas] Experimental package In-Reply-To: References: Message-ID: <51339922.6060309@mrabarnett.plus.com> On 2013-03-03 13:45, Andrew Barnert wrote: > On Mar 2, 2013, at 17:00, Jo?o Bernardo > wrote: > >> 2013/3/2 Terry Reedy >> >> Load 3.3 and use it. If you need an external 3.x library that will >> not run on 3.3 yet, 6 months after release, bug the author. If you >> want your code to run on earlier releases, select new and old >> versions with 'if version....'/ >> >> It's sad that it may take possibly 5 or 10 years to see this >> statement >> being used in real programs... >> >> >> But it will not take that long. 'yield from' plays an essential >> role in Guido's new asynch package (code name: Tulip), which he >> hope will be ready for 3.4. It will probably also run in 3.3, but >> that will be it. People who want to use it will have to upgrade. >> >> Writing new stuff for the stdlib doesn't need to be compatible with >> older python versions... I develop on 3.3 but need to support 3.1 or 3.0. > > Is there really that much need to support 3.0? I write stuff all the > time that requires 2.6, 2.7, or 3.2 or later, and I've had many people > asking for 2.5, but not a single request for 3.1 or 3.0. Is that not > typical? > [snip] 3.0 was the first of the Python 3 series, but it had a few issues, and was relatively short-lived. The recommendation is not to use it. From dholth at gmail.com Sun Mar 3 19:50:23 2013 From: dholth at gmail.com (Daniel Holth) Date: Sun, 3 Mar 2013 13:50:23 -0500 Subject: [Python-ideas] speeding up shutil.copy*() In-Reply-To: References: Message-ID: Great idea. I would also appreciate being able to simply specify the block size in more places. This is probably the kind of change that you could get in as a patch. On Mar 3, 2013 1:40 PM, "Charles-Fran?ois Natali" wrote: > > This allocates and frees a lot of buffers, and could be optimized with > > readinto(). > > Unfortunately, I don't think we can change copyfileobj(), because it > > might be passed objects that don't implement readinto(). > > Or we could just use: > if hasattr(fileobj, 'readinto') > > hoping that readinto() is really a readinto() implementation and not > an unrelated method :-) > > > sendfile() is a Linux-only syscall. It's also limited to certain kinds > > of file descriptors. The limitations have been lifted in recent kernel > > versions. > > No, it's not Linux-only, many BSD also have it, although all don't > support an arbitrary output file descriptor (Solaris does allow > regular files too). It would be possible to catch EINVAL/EBADF, and > fall back to a regular copy loop. > > Note that the above benchmark is really biased by writing the data to > /dev/null: with a real target file, the zero-copy wouldn't bring such > a large gain, because the bottleneck will really be the I/O devices > (also a read()/write() loop is more expensive in Python than in C). > But I see at least two cases where it could be interesting: when > reading/writing from/to a tmpfs partition, or when the source and > target files are on different disks. > > I'm not sure it's worth it though, that's why I'm asking here :-) (but > I do think readinto() is interesting). > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Sun Mar 3 20:00:46 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 3 Mar 2013 20:00:46 +0100 Subject: [Python-ideas] speeding up shutil.copy*() References: Message-ID: <20130303200046.3de4d3a0@pitrou.net> On Sun, 3 Mar 2013 19:40:04 +0100 Charles-Fran?ois Natali wrote: > > > sendfile() is a Linux-only syscall. It's also limited to certain kinds > > of file descriptors. The limitations have been lifted in recent kernel > > versions. > > No, it's not Linux-only, many BSD also have it, although all don't > support an arbitrary output file descriptor (Solaris does allow > regular files too). It would be possible to catch EINVAL/EBADF, and > fall back to a regular copy loop. > > Note that the above benchmark is really biased by writing the data to > /dev/null: with a real target file, the zero-copy wouldn't bring such > a large gain, because the bottleneck will really be the I/O devices > (also a read()/write() loop is more expensive in Python than in C). Can you post your benchmark's code? I could time it on a SSD. > But I see at least two cases where it could be interesting: when > reading/writing from/to a tmpfs partition, or when the source and > target files are on different disks. That's already nice. Regards Antoine. From jbvsmo at gmail.com Sun Mar 3 20:06:35 2013 From: jbvsmo at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Bernardo?=) Date: Sun, 3 Mar 2013 16:06:35 -0300 Subject: [Python-ideas] One-line "try with" statement In-Reply-To: <15135910-8FB4-4B82-A93E-41CA48086A1B@breakrs.com> References: <15135910-8FB4-4B82-A93E-41CA48086A1B@breakrs.com> Message-ID: When dealing with files it would be nice to avoid an extra block. This is not the same as the __exit__ method because you would have to write a function to just catch an error. BTW, why not "with...except.."? Just like "for" and "while" loops have "else" clauses. with open('foo') as f: print(f.read()) except IOError: print('Problem with the file!') -- Jo?o Bernardo -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Sun Mar 3 20:50:28 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 03 Mar 2013 20:50:28 +0100 Subject: [Python-ideas] One-line "try with" statement In-Reply-To: <04F2F204-22CC-406D-A033-613BEE889854@yahoo.com> References: <15135910-8FB4-4B82-A93E-41CA48086A1B@breakrs.com> <1362306736.5045.3.camel@rogue.dyndns.info> <04F2F204-22CC-406D-A033-613BEE889854@yahoo.com> Message-ID: Andrew Barnert, 03.03.2013 14:24: > For example, if you do "with open(path) as f:" the context manager > doesn't (and shouldn't) do anything to protect you from a > FileNotFoundError in the open, or an IOError reading inside the block. > If you want to, say, log, or try a backup file, how else would you > handle that but a with inside a try? If you really care about errors when opening the file (i.e. when creating the context manager), then the correct way to do this is to only wrap the creation of the context manager in a try-except clause, i.e. try: f = open("somefile.txt") except FileNotFoundError: do_stuff() raise # or return, or whatever with f: do_other_stuff() Otherwise, you risk accidentally catching (and potentially shadowing) exceptions that originated from the body of the with statement instead of just the context manager creation. This may look a bit overly complicated for a file, but it quickly becomes more obvious for more complex context managers, e.g. those that may raise more common errors like ValueError or AttributeError. Stefan From cf.natali at gmail.com Sun Mar 3 20:55:15 2013 From: cf.natali at gmail.com (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=) Date: Sun, 3 Mar 2013 20:55:15 +0100 Subject: [Python-ideas] speeding up shutil.copy*() In-Reply-To: <20130303200046.3de4d3a0@pitrou.net> References: <20130303200046.3de4d3a0@pitrou.net> Message-ID: > Can you post your benchmark's code? I could time it on a SSD. Attached (for readinto() and sendfile()). cf -------------- next part -------------- A non-text attachment was scrubbed... Name: copyfile_into.diff Type: text/x-patch Size: 762 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: copyfile_sendfile.diff Type: text/x-patch Size: 981 bytes Desc: not available URL: From solipsis at pitrou.net Sun Mar 3 21:12:01 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 3 Mar 2013 21:12:01 +0100 Subject: [Python-ideas] speeding up shutil.copy*() References: <20130303200046.3de4d3a0@pitrou.net> Message-ID: <20130303211201.02835ba5@pitrou.net> On Sun, 3 Mar 2013 20:55:15 +0100 Charles-Fran?ois Natali wrote: > > Can you post your benchmark's code? I could time it on a SSD. > > Attached (for readinto() and sendfile()). Ok, the readinto() version doesn't seem to make a difference here, only the sendfile() version is beneficial (and the benefits are mostly noticeable from tmpfs to /dev/null, as you point out :-)). Regards Antoine. From greg.ewing at canterbury.ac.nz Sun Mar 3 22:36:52 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 04 Mar 2013 10:36:52 +1300 Subject: [Python-ideas] One-line "try with" statement In-Reply-To: References: <15135910-8FB4-4B82-A93E-41CA48086A1B@breakrs.com> <1362306736.5045.3.camel@rogue.dyndns.info> Message-ID: <5133C274.1030506@canterbury.ac.nz> Stefan Behnel wrote: > The above example therefore strikes me as > useless. If you need a try-except around a with block, then your context > manager is doing something wrong. A with statement is equivalent to try-finally, not try-except, so if you want to catch the exception you still need to put a try-except somewhere. However, I can't see why you're any more likely to want to put try-except directly around a with statement than any other kind of statement. So why ask for try-with in particular, and not try-if, try-while, try-for, ...? -- Greg From greg at krypto.org Sun Mar 3 22:38:05 2013 From: greg at krypto.org (Gregory P. Smith) Date: Sun, 3 Mar 2013 13:38:05 -0800 Subject: [Python-ideas] speeding up shutil.copy*() In-Reply-To: <20130303200046.3de4d3a0@pitrou.net> References: <20130303200046.3de4d3a0@pitrou.net> Message-ID: IMNSHO the *time* is less relevant than the fact that it uses less memory by not repeatedly making copies. In general we should use the more recent non-copying APIs when possible within the standard library but most of that code is pretty old and has not been look at for conversion. Any such changes are welcome in 3.4+. On Sun, Mar 3, 2013 at 11:00 AM, Antoine Pitrou wrote: > On Sun, 3 Mar 2013 19:40:04 +0100 > Charles-Fran?ois Natali > wrote: > > > > > sendfile() is a Linux-only syscall. It's also limited to certain kinds > > > of file descriptors. The limitations have been lifted in recent kernel > > > versions. > > > > No, it's not Linux-only, many BSD also have it, although all don't > > support an arbitrary output file descriptor (Solaris does allow > > regular files too). It would be possible to catch EINVAL/EBADF, and > > fall back to a regular copy loop. > > > > Note that the above benchmark is really biased by writing the data to > > /dev/null: with a real target file, the zero-copy wouldn't bring such > > a large gain, because the bottleneck will really be the I/O devices > > (also a read()/write() loop is more expensive in Python than in C). > > Can you post your benchmark's code? I could time it on a SSD. > > > But I see at least two cases where it could be interesting: when > > reading/writing from/to a tmpfs partition, or when the source and > > target files are on different disks. > > That's already nice. > > Regards > > Antoine. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Sun Mar 3 23:02:57 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 3 Mar 2013 23:02:57 +0100 Subject: [Python-ideas] speeding up shutil.copy*() References: <20130303200046.3de4d3a0@pitrou.net> Message-ID: <20130303230257.7994e798@pitrou.net> On Sun, 3 Mar 2013 13:38:05 -0800 "Gregory P. Smith" wrote: > IMNSHO the *time* is less relevant than the fact that it uses less memory > by not repeatedly making copies. Well, it doesn't repeatedly make copies, it just allocates a new buffer every loop. At best, it will consume 16 KB instead of 32 KB. Regards Antoine. From bruce at leapyear.org Sun Mar 3 23:31:29 2013 From: bruce at leapyear.org (Bruce Leban) Date: Sun, 3 Mar 2013 14:31:29 -0800 Subject: [Python-ideas] One-line "try with" statement In-Reply-To: <15135910-8FB4-4B82-A93E-41CA48086A1B@breakrs.com> References: <15135910-8FB4-4B82-A93E-41CA48086A1B@breakrs.com> Message-ID: On Sat, Mar 2, 2013 at 2:45 PM, Alan Johnson wrote: > > try with context_manager(): > ? bunch of code ? > except: > ? exception handler ? This optimization saves a colon and some white space and mixes two unrelated concepts. The try/except pattern I want to optimize is try: x = expr1 except ValueError: x = expr2 For example: expr1 except ValueError else expr2 or try expr1 except ValueError else expr2 This is particularly useful in cases like this: a = ((try t.x except AttributeError else 0) + (try t.y except AttributeError else 0) + (try t.z except AttributeError else 0)) where standard try/except requires 13 lines and is much harder to read. Yes, this can be done with a function and two lambdas (and I've done it this way): try_except(lambda: expr1, ValueError, lambda: expr2) def try_except(value, exceptions, otherwise): try: return value() except exceptions or Exception: return otherwise() --- Bruce Learn how hackers think: http://j.mp/gruyere-security -------------- next part -------------- An HTML attachment was scrubbed... URL: From cf.natali at gmail.com Sun Mar 3 23:34:42 2013 From: cf.natali at gmail.com (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=) Date: Sun, 3 Mar 2013 23:34:42 +0100 Subject: [Python-ideas] speeding up shutil.copy*() In-Reply-To: <20130303211201.02835ba5@pitrou.net> References: <20130303200046.3de4d3a0@pitrou.net> <20130303211201.02835ba5@pitrou.net> Message-ID: > Ok, the readinto() version doesn't seem to make a difference here, only > the sendfile() version is beneficial (and the benefits are mostly > noticeable from tmpfs to /dev/null, as you point out :-)). OK, in that case I don't think it's worth it for copy. Thanks for testing! From steve at pearwood.info Mon Mar 4 01:31:59 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 04 Mar 2013 11:31:59 +1100 Subject: [Python-ideas] One-line "try with" statement In-Reply-To: References: <15135910-8FB4-4B82-A93E-41CA48086A1B@breakrs.com> Message-ID: <5133EB7F.90509@pearwood.info> On 04/03/13 09:31, Bruce Leban wrote: > The try/except pattern I want to optimize is > > try: > x = expr1 > except ValueError: > x = expr2 > > For example: > > expr1 except ValueError else expr2 > or > try expr1 except ValueError else expr2 That syntax gets a big NO from me, due to confusion with the none one-line try...except...else statement. Written out in full, try blocks look something like this: try: block except ValueError: block else: block finally: block where the else clause runs if no exception occurred. Inserting "else" into the one-liner form, when the "else" doesn't have the same meaning as "else" in the multiline form, is just confusing. Also, I vote -1 on a one-line *statement* (as per the subject line). What's the point of saving one lousy line? We can already do a two-line form: try: statement1 except ValueError: statement2 which is plenty compact enough. But a try...except *expression*, I'm cautiously interested in that idea. It could be analogous to the if...else ternary operator: y = x + (expr1 if condition else expr2) Something like this perhaps? y = x + (try expr1 except Exception: expr2) If you want to catch multiple exceptions, you can use a tuple: y = x + (try expr1 except (Exception, AnotherException): expr2) If you need to refer to the exception: y = x + (try expr1 except Exception as name: expr2) Supporting multiple except clauses would soon get out of hand, I suggestion we restrict the expression form to only a single except clause. Likewise, the else and finally clauses don't really make sense in an expression. This is what I expect the full syntax should be: try_expr ::= "try" expression "except" [expression ["as" target]] ":" expression I'm conflicted about the bare except form. If I had the keys to the time machine, I'd remove bare exceptions from the language. But since they're already supported, I guess we should support it here too. If the "as target" form is used, the name only exists inside the except clause and does not otherwise become visible in the local scope. But of course you can return the exception object should you so choose. > This is particularly useful in cases like this: > > a = ((try t.x except AttributeError else 0) > + (try t.y except AttributeError else 0) > + (try t.z except AttributeError else 0)) This example is not terribly convincing, since it can so easily be re-written: a = sum(getattr(t, name, 0) for name in "xyz") -- Steven From robertc at robertcollins.net Mon Mar 4 03:31:00 2013 From: robertc at robertcollins.net (Robert Collins) Date: Mon, 4 Mar 2013 15:31:00 +1300 Subject: [Python-ideas] BufferedIO and detach In-Reply-To: References: Message-ID: On 4 March 2013 05:16, Benjamin Peterson wrote: > Robert Collins writes: > >> >> There doesn't seem to be a way to safely use detach() on stdin - I'd >> like to get down to the raw stream, but after calling detach(), the >> initial BufferedIOReader is unusable - so you cannot retrieve any >> buffered content) - and unless you detach(), you can't guarantee that >> the buffer will ever be empty. > > Presumably if you call it before anyone else has had a chance to read from it, > you should be okay. Thats hard to guarantee in the general case: consider a library utility that accepts an input stream. To make it concrete, consider dispatching to different processors based on the first few bytes of a stream: you'd have to force raw IO handling everywhere, rather than just the portion of code that needs it... -Rob -- Robert Collins Distinguished Technologist HP Cloud Services From guido at python.org Mon Mar 4 06:50:52 2013 From: guido at python.org (Guido van Rossum) Date: Sun, 3 Mar 2013 21:50:52 -0800 Subject: [Python-ideas] BufferedIO and detach In-Reply-To: References: Message-ID: On Sun, Mar 3, 2013 at 6:31 PM, Robert Collins wrote: > On 4 March 2013 05:16, Benjamin Peterson wrote: >> Robert Collins writes: >> >>> >>> There doesn't seem to be a way to safely use detach() on stdin - I'd >>> like to get down to the raw stream, but after calling detach(), the >>> initial BufferedIOReader is unusable - so you cannot retrieve any >>> buffered content) - and unless you detach(), you can't guarantee that >>> the buffer will ever be empty. >> >> Presumably if you call it before anyone else has had a chance to read from it, >> you should be okay. > > Thats hard to guarantee in the general case: consider a library > utility that accepts an input stream. To make it concrete, consider > dispatching to different processors based on the first few bytes of a > stream: you'd have to force raw IO handling everywhere, rather than > just the portion of code that needs it... The solution would seem obvious: detach before reading anything from the stream. But apparently you're trying to come up with a reason why that's not enough. I think you're concerned about the situation where you have a stream of uncertain origin, and you want to switch to raw, unbuffered I/O. You realize that some of the bytes you are interested in might already have been read into the buffer. So you want access to the contents of the buffer. When the io module was originally designed, this was actually one of the (implied) use cases -- one reason I wanted to stop using C stdio was that I didn't like that there is no standard way to get at the data in the buffer, in similar use cases as you're trying to present. (A use case I could think of would be an http server that forks a subprocess after reading e.g. the first line of the http request, or perhaps after the headers.) It seems that the when the io module was rewritten in C for speed (and I am very grateful that it was, the Python version was way too slow) this use case, being pretty rare, was forgotten. In specific use cases it's usually easy enough to just open the file unbuffered, or detach before reading anything. Can you write C code? If so, perhaps you can come up with a patch. Personally, I'm not sure that your proposed API (a buffered_only flag to read()) is the best way to go about it. Maybe detach() should return the remaining buffered data? (Perhaps only if a new flag is given.) FWIW I think it's also possible that some of the data has made it into the text wrapper already, so you'll have to be able to extract it from there as well. (Good luck.) -- --Guido van Rossum (python.org/~guido) From benjamin at python.org Mon Mar 4 07:01:28 2013 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 4 Mar 2013 06:01:28 +0000 (UTC) Subject: [Python-ideas] BufferedIO and detach References: Message-ID: Guido van Rossum writes: > When the io module was originally designed, this was actually one of > the (implied) use cases -- one reason I wanted to stop using C stdio > was that I didn't like that there is no standard way to get at the > data in the buffer, in similar use cases as you're trying to present. > (A use case I could think of would be an http server that forks a > subprocess after reading e.g. the first line of the http request, or > perhaps after the headers.) What was the API that provided this in the Python version of the io module? (Note it still mostly lives as Lib/_pyio.py) From guido at python.org Mon Mar 4 07:12:26 2013 From: guido at python.org (Guido van Rossum) Date: Sun, 3 Mar 2013 22:12:26 -0800 Subject: [Python-ideas] BufferedIO and detach In-Reply-To: References: Message-ID: On Sunday, March 3, 2013, Benjamin Peterson wrote: > Guido van Rossum writes: > > When the io module was originally designed, this was actually one of > > the (implied) use cases -- one reason I wanted to stop using C stdio > > was that I didn't like that there is no standard way to get at the > > data in the buffer, in similar use cases as you're trying to present. > > (A use case I could think of would be an http server that forks a > > subprocess after reading e.g. the first line of the http request, or > > perhaps after the headers.) > > What was the API that provided this in the Python version of the io module? I think it may not have ben more than accessing private instance variables. :-) (Note it still mostly lives as Lib/_pyio.py) > That won't help a concrete use case though, will it? --Guido -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From robertc at robertcollins.net Mon Mar 4 07:44:27 2013 From: robertc at robertcollins.net (Robert Collins) Date: Mon, 4 Mar 2013 19:44:27 +1300 Subject: [Python-ideas] BufferedIO and detach In-Reply-To: References: Message-ID: On 4 March 2013 18:50, Guido van Rossum wrote: > On Sun, Mar 3, 2013 at 6:31 PM, Robert Collins > wrote: >> On 4 March 2013 05:16, Benjamin Peterson wrote: >>> Robert Collins writes: >>> >>>> >>>> There doesn't seem to be a way to safely use detach() on stdin - I'd >>>> like to get down to the raw stream, but after calling detach(), the >>>> initial BufferedIOReader is unusable - so you cannot retrieve any >>>> buffered content) - and unless you detach(), you can't guarantee that >>>> the buffer will ever be empty. >>> >>> Presumably if you call it before anyone else has had a chance to read from it, >>> you should be okay. >> >> Thats hard to guarantee in the general case: consider a library >> utility that accepts an input stream. To make it concrete, consider >> dispatching to different processors based on the first few bytes of a >> stream: you'd have to force raw IO handling everywhere, rather than >> just the portion of code that needs it... > > The solution would seem obvious: detach before reading anything from the stream. > > But apparently you're trying to come up with a reason why that's not > enough. I think you're concerned about the situation where you have a > stream of uncertain origin, and you want to switch to raw, unbuffered > I/O. You realize that some of the bytes you are interested in might > already have been read into the buffer. So you want access to the > contents of the buffer. Yes exactly. A little more context on how I came to ask the question. I wanted to accumulate all input on an arbitrary stream within 5ms, without blocking for longer. Using raw IO + select, its possible to loop, reading one byte at a time. The io module doesn't have an API (that I could find) for putting an existing stream into non-blocking mode, so reading a larger amount and taking what is returned isn't viable. However, without raw I/O, select() will timeout because it consults the underlying file descriptor bypassing the buffer. So - the only reason to want raw I/O is to be able to use select reliably. An alternative would be being able to drain the buffer with no underlying I/O calls at all, then use select + read1, then rinse and repeat. > When the io module was originally designed, this was actually one of > the (implied) use cases -- one reason I wanted to stop using C stdio > was that I didn't like that there is no standard way to get at the > data in the buffer, in similar use cases as you're trying to present. > (A use case I could think of would be an http server that forks a > subprocess after reading e.g. the first line of the http request, or > perhaps after the headers.) Thats a very similar case as it happens - protocol handling is present in my use case too. > It seems that the when the io module was rewritten in C for speed (and > I am very grateful that it was, the Python version was way too slow) > this use case, being pretty rare, was forgotten. In specific use cases > it's usually easy enough to just open the file unbuffered, or detach > before reading anything. > > Can you write C code? If so, perhaps you can come up with a patch. > Personally, I'm not sure that your proposed API (a buffered_only flag > to read()) is the best way to go about it. Maybe detach() should > return the remaining buffered data? (Perhaps only if a new flag is > given.) > > FWIW I think it's also possible that some of the data has made it into > the text wrapper already, so you'll have to be able to extract it from > there as well. (Good luck.) I can write C code, and if evolving the API is acceptable (it sounds like it is) I'll be more than happy to make a patch. Some variations I can think of... The buffer_only flag I suggested, on read_into, read1, read etc. Have detach return the buffered data as you suggest - that would be incompatible unless we stash it on the raw object somewhere, or do something along those lines. A read0 - analogous to read1, returns data from the buffer, but guarantees no underlying calls. I think exposing the buffer more explicitly is a good principle, independent of whether we change detach or not. > -- > --Guido van Rossum (python.org/~guido) -- Robert Collins Distinguished Technologist HP Cloud Services From ncoghlan at gmail.com Mon Mar 4 09:22:26 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 4 Mar 2013 18:22:26 +1000 Subject: [Python-ideas] One-line "try with" statement In-Reply-To: References: <15135910-8FB4-4B82-A93E-41CA48086A1B@breakrs.com> <1362306736.5045.3.camel@rogue.dyndns.info> <04F2F204-22CC-406D-A033-613BEE889854@yahoo.com> Message-ID: On Mon, Mar 4, 2013 at 5:50 AM, Stefan Behnel wrote: > Andrew Barnert, 03.03.2013 14:24: >> For example, if you do "with open(path) as f:" the context manager >> doesn't (and shouldn't) do anything to protect you from a >> FileNotFoundError in the open, or an IOError reading inside the block. >> If you want to, say, log, or try a backup file, how else would you >> handle that but a with inside a try? > > If you really care about errors when opening the file (i.e. when creating > the context manager), then the correct way to do this is to only wrap the > creation of the context manager in a try-except clause, i.e. > > try: > f = open("somefile.txt") > except FileNotFoundError: > do_stuff() > raise # or return, or whatever > > with f: > do_other_stuff() > > Otherwise, you risk accidentally catching (and potentially shadowing) > exceptions that originated from the body of the with statement instead of > just the context manager creation. > > This may look a bit overly complicated for a file, but it quickly becomes > more obvious for more complex context managers, e.g. those that may raise > more common errors like ValueError or AttributeError. Indeed - complex try blocks in general are a red flag when paired with exception handlers for common exceptions. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Mon Mar 4 10:12:03 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 4 Mar 2013 19:12:03 +1000 Subject: [Python-ideas] BufferedIO and detach In-Reply-To: References: Message-ID: On Mon, Mar 4, 2013 at 4:44 PM, Robert Collins wrote: > Some variations I can think of... > > The buffer_only flag I suggested, on read_into, read1, read etc. > > Have detach return the buffered data as you suggest - that would be > incompatible unless we stash it on the raw object somewhere, or do > something along those lines. > > A read0 - analogous to read1, returns data from the buffer, but > guarantees no underlying calls. > > I think exposing the buffer more explicitly is a good principle, > independent of whether we change detach or not. As Guido noted, you actually have multiple layers of buffering to contend with - for a text stream, you may have already decoded characters and partially decoded data in the codec's internal buffer, in addition to any data in the IO buffer. That's actually one of the interesting problems with supporting a "set_encoding()" method on IO streams (see http://bugs.python.org/issue15216). How does the following API sound for your purposes? (this is based on what set_encoding() effectively has to do under the hood): BufferedReader: def push_data(binary_data): """Prepends contents of 'data' to the internal buffer""" def clear_buffer(): """Clears the internal buffer and returns the previous content as a bytes object""" TextIOWrapper: def push_data(char_data, binary_data=b""): """Prepends contents of 'data' to the internal buffer. If binary_data is provided, it is pushed into the underlying IO buffered reader. Raises UnsupportedOperation if the underlying stream has no "push_data" method.""" def clear_buffer(): """Clears the internal buffers and returns the previous content as a (char_data, binary_data) pair. The binary data includes any data that was queued inside the codec, as well as the contents of the underlying IO buffer""" Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From robertc at robertcollins.net Mon Mar 4 10:19:06 2013 From: robertc at robertcollins.net (Robert Collins) Date: Mon, 4 Mar 2013 22:19:06 +1300 Subject: [Python-ideas] BufferedIO and detach In-Reply-To: References: Message-ID: On 4 March 2013 22:12, Nick Coghlan wrote: > On Mon, Mar 4, 2013 at 4:44 PM, Robert Collins > wrote: >> Some variations I can think of... >> >> The buffer_only flag I suggested, on read_into, read1, read etc. >> >> Have detach return the buffered data as you suggest - that would be >> incompatible unless we stash it on the raw object somewhere, or do >> something along those lines. >> >> A read0 - analogous to read1, returns data from the buffer, but >> guarantees no underlying calls. >> >> I think exposing the buffer more explicitly is a good principle, >> independent of whether we change detach or not. > > As Guido noted, you actually have multiple layers of buffering to > contend with - for a text stream, you may have already decoded > characters and partially decoded data in the codec's internal buffer, > in addition to any data in the IO buffer. That's actually one of the > interesting problems with supporting a "set_encoding()" method on IO > streams (see http://bugs.python.org/issue15216). Indeed. Fun! Caches are useful but add complexity :) > How does the following API sound for your purposes? (this is based on > what set_encoding() effectively has to do under the hood): > > BufferedReader: > > def push_data(binary_data): > """Prepends contents of 'data' to the internal buffer""" > > def clear_buffer(): > """Clears the internal buffer and returns the previous > content as a bytes object""" > > TextIOWrapper: > > def push_data(char_data, binary_data=b""): > """Prepends contents of 'data' to the internal buffer. If > binary_data is provided, it is pushed into the underlying IO buffered > reader. Raises UnsupportedOperation if the underlying stream has no > "push_data" method.""" > > def clear_buffer(): > """Clears the internal buffers and returns the previous > content as a (char_data, binary_data) pair. The binary data includes > any data that was queued inside the codec, as well as the contents of > the underlying IO buffer""" That would make the story of 'get me back to raw IO' straightforward, though the TextIOWrapper's clear_buffer semantics are a little unclear to me from just the docstring. I think having TextIOWrapper only return bytes from clear_buffer and only accept bytes in push_data would be simpler to reason about, if a little more complex on the internals. Now, one could implement 'read0' manually using read1 + clear_buffer + push_data: # first, unwrap back to a bytes layer buffer = textstream.buffer() buffer.push_data(textstream.clear_buffer[1]) def read0(n): data = buffer.clear_buffer() result = data[:n] buffer.push_data(data[n:]) return result But it might be more efficient to define read0 directly on BufferedIOReader. -Rob -- Robert Collins Distinguished Technologist HP Cloud Services From ncoghlan at gmail.com Mon Mar 4 10:52:37 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 4 Mar 2013 19:52:37 +1000 Subject: [Python-ideas] BufferedIO and detach In-Reply-To: References: Message-ID: On 4 Mar 2013 19:19, "Robert Collins" wrote: > > On 4 March 2013 22:12, Nick Coghlan wrote: > > On Mon, Mar 4, 2013 at 4:44 PM, Robert Collins > > wrote: > >> Some variations I can think of... > >> > >> The buffer_only flag I suggested, on read_into, read1, read etc. > >> > >> Have detach return the buffered data as you suggest - that would be > >> incompatible unless we stash it on the raw object somewhere, or do > >> something along those lines. > >> > >> A read0 - analogous to read1, returns data from the buffer, but > >> guarantees no underlying calls. > >> > >> I think exposing the buffer more explicitly is a good principle, > >> independent of whether we change detach or not. > > > > As Guido noted, you actually have multiple layers of buffering to > > contend with - for a text stream, you may have already decoded > > characters and partially decoded data in the codec's internal buffer, > > in addition to any data in the IO buffer. That's actually one of the > > interesting problems with supporting a "set_encoding()" method on IO > > streams (see http://bugs.python.org/issue15216). > > Indeed. Fun! Caches are useful but add complexity :) > > > How does the following API sound for your purposes? (this is based on > > what set_encoding() effectively has to do under the hood): > > > > BufferedReader: > > > > def push_data(binary_data): > > """Prepends contents of 'data' to the internal buffer""" > > > > def clear_buffer(): > > """Clears the internal buffer and returns the previous > > content as a bytes object""" > > > > TextIOWrapper: > > > > def push_data(char_data, binary_data=b""): > > """Prepends contents of 'data' to the internal buffer. If > > binary_data is provided, it is pushed into the underlying IO buffered > > reader. Raises UnsupportedOperation if the underlying stream has no > > "push_data" method.""" > > > > def clear_buffer(): > > """Clears the internal buffers and returns the previous > > content as a (char_data, binary_data) pair. The binary data includes > > any data that was queued inside the codec, as well as the contents of > > the underlying IO buffer""" > > That would make the story of 'get me back to raw IO' straightforward, > though the TextIOWrapper's clear_buffer semantics are a little unclear > to me from just the docstring. I think having TextIOWrapper only > return bytes from clear_buffer and only accept bytes in push_data > would be simpler to reason about, if a little more complex on the > internals. I originally had it defined that way, but as Victor points out in the set_encoding issue, decoding is potentially lossy in the general case, so we can't reliably convert already decoded characters back to bytes. The appropriate way to handle that is going to be application specific, so I changed the proposed API to produce a (str, bytes) 2-tuple. Cheers, Nick. > > Now, one could implement 'read0' manually using read1 + clear_buffer + > push_data: > # first, unwrap back to a bytes layer > buffer = textstream.buffer() > buffer.push_data(textstream.clear_buffer[1]) > def read0(n): > data = buffer.clear_buffer() > result = data[:n] > buffer.push_data(data[n:]) > return result > > But it might be more efficient to define read0 directly on BufferedIOReader. > > -Rob > > -- > Robert Collins > Distinguished Technologist > HP Cloud Services -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Mon Mar 4 10:59:27 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 4 Mar 2013 10:59:27 +0100 Subject: [Python-ideas] BufferedIO and detach References: Message-ID: <20130304105927.37331a4c@pitrou.net> Le Mon, 4 Mar 2013 19:44:27 +1300, Robert Collins a ?crit : > > Yes exactly. A little more context on how I came to ask the question. > I wanted to accumulate all input on an arbitrary stream within 5ms, > without blocking for longer. Using raw IO + select, its possible to > loop, reading one byte at a time. The io module doesn't have an API > (that I could find) for putting an existing stream into non-blocking > mode, so reading a larger amount and taking what is returned isn't > viable. What do you mean exactly by that? > However, without raw I/O, select() will timeout because it consults > the underlying file descriptor bypassing the buffer. So - the only > reason to want raw I/O is to be able to use select reliably. That's a pretty good reason actually. Raw I/O is exactly for those cases. Non-blocking buffered I/O is a hard conceptual problem: http://bugs.python.org/issue13322 > An > alternative would be being able to drain the buffer with no underlying > I/O calls at all, then use select + read1, then rinse and repeat. Have you tried peek()? Regards Antoine. From robertc at robertcollins.net Mon Mar 4 11:15:36 2013 From: robertc at robertcollins.net (Robert Collins) Date: Mon, 4 Mar 2013 23:15:36 +1300 Subject: [Python-ideas] BufferedIO and detach In-Reply-To: <20130304105927.37331a4c@pitrou.net> References: <20130304105927.37331a4c@pitrou.net> Message-ID: On 4 March 2013 22:59, Antoine Pitrou wrote: > Le Mon, 4 Mar 2013 19:44:27 +1300, > Robert Collins > a ?crit : >> >> Yes exactly. A little more context on how I came to ask the question. >> I wanted to accumulate all input on an arbitrary stream within 5ms, >> without blocking for longer. Using raw IO + select, its possible to >> loop, reading one byte at a time. The io module doesn't have an API >> (that I could find) for putting an existing stream into non-blocking >> mode, so reading a larger amount and taking what is returned isn't >> viable. > > What do you mean exactly by that? Just what I said. I'll happily try to rephrase. What bit was unclear? >> However, without raw I/O, select() will timeout because it consults >> the underlying file descriptor bypassing the buffer. So - the only >> reason to want raw I/O is to be able to use select reliably. > > That's a pretty good reason actually. Raw I/O is exactly for those > cases. Non-blocking buffered I/O is a hard conceptual problem: > http://bugs.python.org/issue13322 Sure, it can get tricky to reason about. But - the whole point of libraries like io is to encapsulate common solutions to tricky things - so that we don't have a hundred incompatible not-quite-the-same layers sitting on top. Right now select + BufferedIOReader is plain buggy regardless of non-blocking or not... ; I'd like to fix that - for instance, if select consulted the buffer somehow and returned immediately if the buffer had data, that would be an improvement (as select doesn't say *how much* data can be read). >> An >> alternative would be being able to drain the buffer with no underlying >> I/O calls at all, then use select + read1, then rinse and repeat. > > Have you tried peek()? Per http://docs.python.org/3.2/library/io.html#io.BufferedReader.peek peek may cause I/O. Only one, but still you cannot control it. -Rob -- Robert Collins Distinguished Technologist HP Cloud Services From robertc at robertcollins.net Mon Mar 4 11:24:56 2013 From: robertc at robertcollins.net (Robert Collins) Date: Mon, 4 Mar 2013 23:24:56 +1300 Subject: [Python-ideas] BufferedIO and detach In-Reply-To: References: Message-ID: On 4 March 2013 22:52, Nick Coghlan wrote: > > I originally had it defined that way, but as Victor points out in the > set_encoding issue, decoding is potentially lossy in the general case, so we > can't reliably convert already decoded characters back to bytes. The > appropriate way to handle that is going to be application specific, so I > changed the proposed API to produce a (str, bytes) 2-tuple. I don't quite follow - why would we need to convert decoded characters to bytes? While it is lossy, we know the original bytes. If we keep the original bytes around until their characters are out of the buffer, there is no loss window - and the buffer size in TextIOWrapper is quite small by default isn't it? If we need to be strictly minimal then yes, I can see why your tweaked API would be better. However - two bits of feedback : it should say more clearly that there is no overlap between the text and binary segments: any bytes that have been decoded are in the text segment and only in the text segment. push_data has a wart though, consider a TextIOWrapper with the following buffer: text="foo" binary=b"bar" when you call push_data("quux", b"baz") should you end up with text="quuxfoo" binary=b"bazbar" or text="quux" + b"baz".decode(self.encoding) + "foo" binary=b"bar" The latter is clearly the intent, but the docstring implies the former behaviour. (The latter case does depend on the bytestring being decodable on it's own when there is content in the text buffer - but even a complex buffer that is a sequence of text or byte regions would still have that requirement due to not being able to recode reliably). -Rob -- Robert Collins Distinguished Technologist HP Cloud Services From solipsis at pitrou.net Mon Mar 4 11:45:49 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 4 Mar 2013 11:45:49 +0100 Subject: [Python-ideas] BufferedIO and detach References: <20130304105927.37331a4c@pitrou.net> Message-ID: <20130304114549.30fb772f@pitrou.net> Le Mon, 4 Mar 2013 23:15:36 +1300, Robert Collins a ?crit : > On 4 March 2013 22:59, Antoine Pitrou > wrote: > > Le Mon, 4 Mar 2013 19:44:27 +1300, > > Robert Collins > > a > > ?crit : > >> > >> Yes exactly. A little more context on how I came to ask the > >> question. I wanted to accumulate all input on an arbitrary stream > >> within 5ms, without blocking for longer. Using raw IO + select, > >> its possible to loop, reading one byte at a time. The io module > >> doesn't have an API (that I could find) for putting an existing > >> stream into non-blocking mode, so reading a larger amount and > >> taking what is returned isn't viable. > > > > What do you mean exactly by that? > > Just what I said. I'll happily try to rephrase. What bit was unclear? I don't understand what you mean by "putting an existing stream into non-blocking mode"? What stream exactly is it? And why is reading a larger amount not viable? Regards Antoine. From solipsis at pitrou.net Mon Mar 4 11:47:43 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 4 Mar 2013 11:47:43 +0100 Subject: [Python-ideas] BufferedIO and detach References: Message-ID: <20130304114743.3e8c222f@pitrou.net> Le Mon, 4 Mar 2013 19:12:03 +1000, Nick Coghlan a ?crit : > On Mon, Mar 4, 2013 at 4:44 PM, Robert Collins > wrote: > > Some variations I can think of... > > > > The buffer_only flag I suggested, on read_into, read1, read etc. > > > > Have detach return the buffered data as you suggest - that would be > > incompatible unless we stash it on the raw object somewhere, or do > > something along those lines. > > > > A read0 - analogous to read1, returns data from the buffer, but > > guarantees no underlying calls. > > > > I think exposing the buffer more explicitly is a good principle, > > independent of whether we change detach or not. > > As Guido noted, you actually have multiple layers of buffering to > contend with - for a text stream, you may have already decoded > characters and partially decoded data in the codec's internal buffer, > in addition to any data in the IO buffer. I'd prefer if TextIOWrapper was totally unsupported in that context. Regards Antoine. From akshit.jiit at gmail.com Mon Mar 4 13:06:33 2013 From: akshit.jiit at gmail.com (Akshit Agarwal) Date: Mon, 4 Mar 2013 07:06:33 -0500 Subject: [Python-ideas] Proposal for Algorithms Library Message-ID: I am new to Python Community but I am using Python from around 1 year and I love to do coding on Python. Now I want to introduce an idea that I think should be there in Python which is I want to start working on a *"Algorithms Library"* which would be containing all basic Algorithms in its Intial Phase and then we can include all Algorithms which are listed in Introduction to Algorithms by CLRS and further extending to all possible algorithms which should be included. Implementing this will be very good for Python as Algorithms are used everywhere and developers have to spent a lot of their time in implementing the common algorithms which will be reduced as they will then be able to use them directly importing from Python which will also increase Python user's. I had just started contributing in Open Source and had contributed in SymPy's Symbols Function but now I want to do the above stated work as a Project in "*Google Summer Of Code 2013 in Python*". I need help from Community that how should I start working on it to do this in GSOC 2013. Akshit Agarwal -------------- next part -------------- An HTML attachment was scrubbed... URL: From ubershmekel at gmail.com Mon Mar 4 15:00:10 2013 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Mon, 4 Mar 2013 16:00:10 +0200 Subject: [Python-ideas] Proposal for Algorithms Library In-Reply-To: References: Message-ID: Sounds like a great idea for a package on PyPI. Once it's useful and popular you can propose it for inclusion in the standard library. >From a quick google I found this: https://pypi.python.org/pypi/algorithms/0.1 Though I don't really know which algorithms would go in an "Algorithms Library". You might want to be more specific about that. Good luck, Yuval On Mon, Mar 4, 2013 at 2:06 PM, Akshit Agarwal wrote: > I am new to Python Community but I am using Python from around 1 year and > I love to do coding on Python. > > Now I want to introduce an idea that I think should be there in Python > which is I want to start working on a *"Algorithms Library"* which would > be containing all basic Algorithms in its Intial Phase and then we can > include all Algorithms which are listed in Introduction to Algorithms by > CLRS and further extending to all possible algorithms which should be > included. > > Implementing this will be very good for Python as Algorithms are used > everywhere and developers have to spent a lot of their time in implementing > the common algorithms which will be reduced as they will then be able to > use them directly importing from Python which will also increase Python > user's. > > I had just started contributing in Open Source and had contributed in > SymPy's Symbols Function but now I want to do the above stated work as a > Project in "*Google Summer Of Code 2013 in Python*". > > I need help from Community that how should I start working on it to do > this in GSOC 2013. > > Akshit Agarwal > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Mon Mar 4 17:18:17 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Mon, 04 Mar 2013 18:18:17 +0200 Subject: [Python-ideas] speeding up shutil.copy*() In-Reply-To: References: Message-ID: On 03.03.13 19:02, Charles-Fran?ois Natali wrote: > Without patch: > $ ./python -m timeit -s "import shutil" "shutil.copyfile('/tmp/foo', > '/dev/null')" > 10 loops, best of 3: 218 msec per loop > > With readinto(): > $ ./python -m timeit -s "import shutil" "shutil.copyfile('/tmp/foo', > '/dev/null')" > 10 loops, best of 3: 202 msec per loop 8%. Note that in real cases the difference will be significant less. First, output to real file requires more time than output to /dev/null. Second, you unlikely copy the same input file 30 times in a row. Only first time in the test you read from disk, and 29 times you read from cache. Third, such sources as tarfile has several level between user code and disk file. BufferedIO, GzipFile, internal tarfile wrapper. Every level adds some overhead and in sum this will be many times larger then creating of one bytes object. > With sendfile(): > $ ./python -m timeit -s "import shutil" "shutil.copyfile('/tmp/foo', > '/dev/null')" > 100 loops, best of 3: 5.39 msec per loop This looks more interesting. There are other idea to speedup tarfile extracting. Use dir_fd parameter (if it is available) for opening of target files. It can speedup extracting of a large count of small and deeply nested files. sendfile() should speedup extracting only large files. From phd at phdru.name Sun Mar 3 10:14:36 2013 From: phd at phdru.name (Oleg Broytman) Date: Sun, 3 Mar 2013 13:14:36 +0400 Subject: [Python-ideas] Updated PEP 428 (pathlib) In-Reply-To: References: <20130303005129.26eb0e00@pitrou.net> Message-ID: <20130303091436.GB27811@iskra.aviel.ru> Hi! On Sun, Mar 03, 2013 at 09:46:16AM +0100, Charles-Fran?ois Natali wrote: > >>> p = PureNTPath('c:/Downloads/pathlib.tar.gz') > >>> p.name > 'pathlib.tar.gz' > >>> p.basename > 'pathlib.tar' > >>> p.suffix > '.gz' > > I find the 'p.basename' name confusing: following POSIX conventions, > 'basename' should be 'pathlib.tar.gz'. Yes, and ntpath.py/posixpath.py follow this terminology. > I don't have another 'name' to > propose for the stripped name, though. ntpath.py/posixpath.py name these parts "root" and "extension". "Root", of course, has its own share of different connotations. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From cf.natali at gmail.com Mon Mar 4 20:21:35 2013 From: cf.natali at gmail.com (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=) Date: Mon, 4 Mar 2013 20:21:35 +0100 Subject: [Python-ideas] speeding up shutil.copy*() In-Reply-To: References: Message-ID: > 8%. Note that in real cases the difference will be significant less. First, > output to real file requires more time than output to /dev/null. Second, you > unlikely copy the same input file 30 times in a row. Only first time in the > test you read from disk, and 29 times you read from cache. Third, such > sources as tarfile has several level between user code and disk file. > BufferedIO, GzipFile, internal tarfile wrapper. Every level adds some > overhead and in sum this will be many times larger then creating of one > bytes object. I know, I said it was really biased :-) The proper way to perform a cold cache benchmark would be "echo 3 > /proc/sys/vm/drop_caches" before reading the file. The goal was to highlight the reallocation cost (whose cost can vary depending on the implementation). >> With sendfile(): >> $ ./python -m timeit -s "import shutil" "shutil.copyfile('/tmp/foo', >> '/dev/null')" >> 100 loops, best of 3: 5.39 msec per loop > > > This looks more interesting. Not really, because like above, the extra syscalls and copy loops aren't really the bottleneck, it's still the I/O (try replacing /dev/null with an on-disk file and the gain plummets: it might be different if the source and target files are on different disks, though). Zero-copy really shines when writing data to a socket: a more interesting usage would be in ftplib & Co. cf From robertc at robertcollins.net Mon Mar 4 22:11:12 2013 From: robertc at robertcollins.net (Robert Collins) Date: Tue, 5 Mar 2013 10:11:12 +1300 Subject: [Python-ideas] BufferedIO and detach In-Reply-To: <20130304114743.3e8c222f@pitrou.net> References: <20130304114743.3e8c222f@pitrou.net> Message-ID: On 4 March 2013 23:47, Antoine Pitrou wrote: >> As Guido noted, you actually have multiple layers of buffering to >> contend with - for a text stream, you may have already decoded >> characters and partially decoded data in the codec's internal buffer, >> in addition to any data in the IO buffer. > > I'd prefer if TextIOWrapper was totally unsupported in that context. The problem is that sys.stdin and sys.stdout default to TextIOWrappers, and handling protocols requires bytes, so having a way to drop down to bytes is very convenient. Doing it by command line arguments to Python works as long as a command is always byte orientated (or never) - but thats a very big hammer. -Rob -- Robert Collins Distinguished Technologist HP Cloud Services From greg.ewing at canterbury.ac.nz Mon Mar 4 22:14:08 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 05 Mar 2013 10:14:08 +1300 Subject: [Python-ideas] BufferedIO and detach In-Reply-To: References: Message-ID: <51350EA0.2020301@canterbury.ac.nz> Guido van Rossum wrote: > Personally, I'm not sure that your proposed API (a buffered_only flag > to read()) is the best way to go about it. Maybe detach() should > return the remaining buffered data? Maybe you could be allowed to read() from the buffered stream after detatching the underlying source, which would then return any data remaining in the buffer. -- Greg From solipsis at pitrou.net Mon Mar 4 22:12:01 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 4 Mar 2013 22:12:01 +0100 Subject: [Python-ideas] BufferedIO and detach References: <20130304114743.3e8c222f@pitrou.net> Message-ID: <20130304221201.1175cb24@pitrou.net> On Tue, 5 Mar 2013 10:11:12 +1300 Robert Collins wrote: > On 4 March 2013 23:47, Antoine Pitrou wrote: > > >> As Guido noted, you actually have multiple layers of buffering to > >> contend with - for a text stream, you may have already decoded > >> characters and partially decoded data in the codec's internal buffer, > >> in addition to any data in the IO buffer. > > > > I'd prefer if TextIOWrapper was totally unsupported in that context. > > The problem is that sys.stdin and sys.stdout default to > TextIOWrappers, and handling protocols requires bytes, so having a way > to drop down to bytes is very convenient. Why do you want to drop to bytes *after* having already buffered stuff in sys.{stdin,stdout}? Regards Antoine. From robertc at robertcollins.net Mon Mar 4 22:17:11 2013 From: robertc at robertcollins.net (Robert Collins) Date: Tue, 5 Mar 2013 10:17:11 +1300 Subject: [Python-ideas] BufferedIO and detach In-Reply-To: <20130304114549.30fb772f@pitrou.net> References: <20130304105927.37331a4c@pitrou.net> <20130304114549.30fb772f@pitrou.net> Message-ID: On 4 March 2013 23:45, Antoine Pitrou wrote: > Le Mon, 4 Mar 2013 23:15:36 +1300, > Robert Collins > a ?crit : >> On 4 March 2013 22:59, Antoine Pitrou >> wrote: >> > Le Mon, 4 Mar 2013 19:44:27 +1300, >> > Robert Collins >> > a >> > ?crit : >> >> >> >> Yes exactly. A little more context on how I came to ask the >> >> question. I wanted to accumulate all input on an arbitrary stream >> >> within 5ms, without blocking for longer. Using raw IO + select, >> >> its possible to loop, reading one byte at a time. The io module >> >> doesn't have an API (that I could find) for putting an existing >> >> stream into non-blocking mode, so reading a larger amount and >> >> taking what is returned isn't viable. >> > >> > What do you mean exactly by that? >> >> Just what I said. I'll happily try to rephrase. What bit was unclear? > > I don't understand what you mean by "putting an existing stream into > non-blocking mode"? What stream exactly is it? And why is reading a > larger amount not viable? sys.stdin - starts in blocking mode. How do you convert it to non-blocking mode? Portably? Now, how do you convert it to non-blocking mode when you don't know that it is fd 1, and instead you just have a stream (TextIOWrapper or BufferedReader or even a RawIO instance) ? If you have an fd in blocking mode, and select indicates it is readable, reading one byte won't block. reading two bytes may block. In non-blocking mode, reading will never block, and select tells you whether you can expect any content at all to be available. So reading more than one byte isn't viable when: - the fd is in blocking mode - you don't want to block in your program The reason I run into this is that I have a program that deals with both interactive and bulk traffic on the same file descriptor, and there doesn't seem to be a portable way (where portable means Linux/BSD/MacOSX/Windows) to flip a stream to non-blocking mode (in Python, going by the io module docs). -Rob -- Robert Collins Distinguished Technologist HP Cloud Services From solipsis at pitrou.net Mon Mar 4 22:15:38 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 4 Mar 2013 22:15:38 +0100 Subject: [Python-ideas] BufferedIO and detach References: <20130304105927.37331a4c@pitrou.net> <20130304114549.30fb772f@pitrou.net> Message-ID: <20130304221538.357d89af@pitrou.net> On Tue, 5 Mar 2013 10:17:11 +1300 Robert Collins wrote: > > sys.stdin - starts in blocking mode. How do you convert it to > non-blocking mode? Portably? Now, how do you convert it to > non-blocking mode when you don't know that it is fd 1, and instead you > just have a stream (TextIOWrapper or BufferedReader or even a RawIO > instance) ? How about the fileno() method? From robertc at robertcollins.net Mon Mar 4 22:21:52 2013 From: robertc at robertcollins.net (Robert Collins) Date: Tue, 5 Mar 2013 10:21:52 +1300 Subject: [Python-ideas] BufferedIO and detach In-Reply-To: <20130304221201.1175cb24@pitrou.net> References: <20130304114743.3e8c222f@pitrou.net> <20130304221201.1175cb24@pitrou.net> Message-ID: On 5 March 2013 10:12, Antoine Pitrou wrote: >> The problem is that sys.stdin and sys.stdout default to >> TextIOWrappers, and handling protocols requires bytes, so having a way >> to drop down to bytes is very convenient. > > Why do you want to drop to bytes *after* having already buffered stuff > in sys.{stdin,stdout}? I don't (when reading), and for my purposes having the drop-down process error when reads have been done at thetext layer would be fine. Writing is more ambiguous (for me, not as a problem statement), but also works fine today so nothing is needed from my perspective. -Rob -- Robert Collins Distinguished Technologist HP Cloud Services From greg.ewing at canterbury.ac.nz Mon Mar 4 22:23:46 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 05 Mar 2013 10:23:46 +1300 Subject: [Python-ideas] BufferedIO and detach In-Reply-To: <20130304105927.37331a4c@pitrou.net> References: <20130304105927.37331a4c@pitrou.net> Message-ID: <513510E2.2070106@canterbury.ac.nz> Antoine Pitrou wrote: > Raw I/O is exactly for those > cases. Non-blocking buffered I/O is a hard conceptual problem: I don't think it needs to be all that hard as long as you're willing to give each layer of the protocol stack its own non-blocking I/O calls. Trying to take shortcuts by skipping layers of the stack is asking for pain, though. -- Greg From robertc at robertcollins.net Mon Mar 4 22:29:35 2013 From: robertc at robertcollins.net (Robert Collins) Date: Tue, 5 Mar 2013 10:29:35 +1300 Subject: [Python-ideas] BufferedIO and detach In-Reply-To: <20130304221538.357d89af@pitrou.net> References: <20130304105927.37331a4c@pitrou.net> <20130304114549.30fb772f@pitrou.net> <20130304221538.357d89af@pitrou.net> Message-ID: On 5 March 2013 10:15, Antoine Pitrou wrote: > On Tue, 5 Mar 2013 10:17:11 +1300 > Robert Collins > wrote: >> >> sys.stdin - starts in blocking mode. How do you convert it to >> non-blocking mode? Portably? Now, how do you convert it to >> non-blocking mode when you don't know that it is fd 1, and instead you >> just have a stream (TextIOWrapper or BufferedReader or even a RawIO >> instance) ? > > How about the fileno() method? What about it? Do you mean 'non-blocking mode is entirely defined by the OS level read() behaviour and there is no tracking of that state higher up' ? If so cool (and we should document that somewhere). I'll need to go lookup the windows equivalent to FCNTL, and I still think the current hidden buffer status is problematic. -Rob -- Robert Collins Distinguished Technologist HP Cloud Services From solipsis at pitrou.net Mon Mar 4 22:32:51 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 4 Mar 2013 22:32:51 +0100 Subject: [Python-ideas] BufferedIO and detach References: <20130304105927.37331a4c@pitrou.net> <20130304114549.30fb772f@pitrou.net> <20130304221538.357d89af@pitrou.net> Message-ID: <20130304223251.4c2476b0@pitrou.net> On Tue, 5 Mar 2013 10:29:35 +1300 Robert Collins wrote: > > > > How about the fileno() method? > > What about it? Do you mean 'non-blocking mode is entirely defined by > the OS level read() behaviour and there is no tracking of that state > higher up' ? If so cool (and we should document that somewhere). Yes, I mean that :-) You're right, it should be documented. > I'll > need to go lookup the windows equivalent to FCNTL, and I still think > the current hidden buffer status is problematic. Windows has no notion of non-blocking streams, except for sockets. Regards Antoine. From benjamin at python.org Mon Mar 4 22:41:25 2013 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 4 Mar 2013 21:41:25 +0000 (UTC) Subject: [Python-ideas] BufferedIO and detach References: Message-ID: Guido van Rossum writes: > > > On Sunday, March 3, 2013, Benjamin Peterson wrote:Guido van Rossum What was the API that provided this in the Python version of the io module? > > I think it may not have ben more than accessing private instance variables. It's a bit hard to claim that was ever a "supported" usecase then. > > That won't help a concrete use case though, will it?? No, I was just pointing that out in case you wanted to reference it. From robertc at robertcollins.net Mon Mar 4 22:50:01 2013 From: robertc at robertcollins.net (Robert Collins) Date: Tue, 5 Mar 2013 10:50:01 +1300 Subject: [Python-ideas] BufferedIO and detach In-Reply-To: <20130304223251.4c2476b0@pitrou.net> References: <20130304105927.37331a4c@pitrou.net> <20130304114549.30fb772f@pitrou.net> <20130304221538.357d89af@pitrou.net> <20130304223251.4c2476b0@pitrou.net> Message-ID: On 5 March 2013 10:32, Antoine Pitrou wrote: > On Tue, 5 Mar 2013 10:29:35 +1300 > Robert Collins > wrote: >> > >> > How about the fileno() method? >> >> What about it? Do you mean 'non-blocking mode is entirely defined by >> the OS level read() behaviour and there is no tracking of that state >> higher up' ? If so cool (and we should document that somewhere). > > Yes, I mean that :-) You're right, it should be documented. > >> I'll >> need to go lookup the windows equivalent to FCNTL, and I still think >> the current hidden buffer status is problematic. > > Windows has no notion of non-blocking streams, except for sockets. Hmm, I know the libc emulation layer doesn't - but http://msdn.microsoft.com/en-us/library/ms684961%28VS.85%29.aspx does non-blocking IO (for stdin specifically) - we should be able to hook that in, in principle... and disk files can do nonblocking with overlapped IO (though that is a wholly different beast and clearly offtopic :)). -Rob -- Robert Collins Distinguished Technologist HP Cloud Services From guido at python.org Mon Mar 4 22:55:46 2013 From: guido at python.org (Guido van Rossum) Date: Mon, 4 Mar 2013 13:55:46 -0800 Subject: [Python-ideas] BufferedIO and detach In-Reply-To: References: Message-ID: On Mon, Mar 4, 2013 at 1:41 PM, Benjamin Peterson wrote: > Guido van Rossum writes: >> On Sunday, March 3, 2013, Benjamin Peterson wrote:Guido van Rossum > What was the API that provided this in the Python version of the io module? >> >> I think it may not have ben more than accessing private instance variables. > > It's a bit hard to claim that was ever a "supported" usecase then. True, it was not supported, but it was *possible* (and I had *meant*) to support it by adding a new API to read what's in the buffer in a completely portable way. This was still a step forward compared to using stdio, where the hacks needed to access the buffer would vary by platform and libc version. And that's all I meant by that comment. -- --Guido van Rossum (python.org/~guido) From tjreedy at udel.edu Mon Mar 4 22:59:28 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 04 Mar 2013 16:59:28 -0500 Subject: [Python-ideas] Proposal for Algorithms Library In-Reply-To: References: Message-ID: On 3/4/2013 7:06 AM, Akshit Agarwal wrote: > I am new to Python Community but I am using Python from around 1 year > and I love to do coding on Python. So do I. > Now I want to introduce an idea that I think should be there in Python > which is I want to start working on a *"Algorithms Library"* which would > be containing all basic Algorithms in its Intial Phase and then we can There is no agreed-on set of 'basic algorithms'. Anyway, Python already includes most basic algorithms either built-in or in the stdlib. And the implementation may be *better* than found in any book. An example is timsort, available both and list.sorted and sorted(iterable). hash() has a carefully designed hash algorithm that now takes into account denial-of-service attaches. Python dicts are sophisticated hash tables. The itertools module has basic algorithms for iterables, including .product and .combinations. Beyond this, there are thousands of third-party packages that are nothing but more and more algorithms. > include all Algorithms which are listed in Introduction to Algorithms by > CLRS and further extending to all possible algorithms which should be > included. There is no finite set of 'possible algorithms'. Every function is an algorithm, or if you prefer, implements an algorithm. A typical algorithms text has a grab-bag of algorithms selected for particular didactic purposes. They usually do not form a coherent module or package. Python versions of the algorithms in a particular popular book that does not use Python might be a useful package to put on PyPI, but I would be careful about copyright and intellectual property issues. > Implementing this will be very good for Python as Algorithms are used > everywhere and developers have to spent a lot of their time in > implementing the common algorithms Do you have any particular examples in mind? -- Terry Jan Reedy From zuo at chopin.edu.pl Tue Mar 5 00:33:48 2013 From: zuo at chopin.edu.pl (Jan Kaliszewski) Date: Tue, 05 Mar 2013 00:33:48 +0100 Subject: [Python-ideas] Updated PEP 428 (pathlib) In-Reply-To: References: <20130303005129.26eb0e00@pitrou.net> <20130303114112.403019c6@pitrou.net> Message-ID: Hello, 1. Ad: >>> PurePosixPath('/usr/bin/python').relative('/etc') Traceback (most recent call last): ... ValueError: ... Shouldn't this particular operation return "PurePosixPath('/etc/../usr/bin/python')"? 2. 03.03.2013 15:12, Charles-Fran?ois Natali wrote: >> Yes. We could call it >> "root" (http://en.wikipedia.org/wiki/Root_%28linguistics%29) but in >> this context it would be confusing. Also, it's not exactly the root >> since as you point there can still be a remaining suffix. > > Indeed, "root" would be even more confusing. > >> There's "stem", too (http://en.wikipedia.org/wiki/Word_stem). With >> the >> same provision about not being the actual stem. > > Also, it doesn't sound familiar (at least to me). > > How about "rootname", or "stripped_name" (the last one is a little > too long)? Maybe simply "stripped"? Or "stemname"? Or "unsuffixed"?... Anyway "basename" is IMHO a bad idea because in os.path it is already used for something completely different. Cheers. *j From zuo at chopin.edu.pl Tue Mar 5 00:39:50 2013 From: zuo at chopin.edu.pl (Jan Kaliszewski) Date: Tue, 05 Mar 2013 00:39:50 +0100 Subject: [Python-ideas] Updated PEP 428 (pathlib) In-Reply-To: References: <20130303005129.26eb0e00@pitrou.net> <20130303114112.403019c6@pitrou.net> Message-ID: Me wrote: > Shouldn't this particular operation return > "PurePosixPath('/etc/../usr/bin/python')"? Pardon, I ment: "PurePosixPath('../usr/bin/python')". *j From solipsis at pitrou.net Tue Mar 5 08:23:50 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 5 Mar 2013 08:23:50 +0100 Subject: [Python-ideas] Updated PEP 428 (pathlib) References: <20130303005129.26eb0e00@pitrou.net> <20130303114112.403019c6@pitrou.net> Message-ID: <20130305082350.37b18c22@pitrou.net> On Tue, 05 Mar 2013 00:33:48 +0100 Jan Kaliszewski wrote: > Hello, > > 1. Ad: > >>> PurePosixPath('/usr/bin/python').relative('/etc') > Traceback (most recent call last): > ... > ValueError: ... > > Shouldn't this particular operation return > "PurePosixPath('/etc/../usr/bin/python')"? Think what happens if /etc is a symlink to /var/etc. (not very likely to happen for /etc, but likely to happen in the general case) Regards Antoine. From solipsis at pitrou.net Tue Mar 5 08:31:22 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 5 Mar 2013 08:31:22 +0100 Subject: [Python-ideas] BufferedIO and detach References: <20130304105927.37331a4c@pitrou.net> <20130304114549.30fb772f@pitrou.net> <20130304221538.357d89af@pitrou.net> <20130304223251.4c2476b0@pitrou.net> Message-ID: <20130305083122.61ff329c@pitrou.net> On Tue, 5 Mar 2013 10:50:01 +1300 Robert Collins wrote: > On 5 March 2013 10:32, Antoine Pitrou wrote: > > On Tue, 5 Mar 2013 10:29:35 +1300 > > Robert Collins > > wrote: > >> > > >> > How about the fileno() method? > >> > >> What about it? Do you mean 'non-blocking mode is entirely defined by > >> the OS level read() behaviour and there is no tracking of that state > >> higher up' ? If so cool (and we should document that somewhere). > > > > Yes, I mean that :-) You're right, it should be documented. > > > >> I'll > >> need to go lookup the windows equivalent to FCNTL, and I still think > >> the current hidden buffer status is problematic. > > > > Windows has no notion of non-blocking streams, except for sockets. > > Hmm, I know the libc emulation layer doesn't - but > http://msdn.microsoft.com/en-us/library/ms684961%28VS.85%29.aspx does > non-blocking IO (for stdin specifically) - we should be able to hook > that in, in principle... I didn't know about that. I wonder, what happens if the standard input is redirected? Also, is it able to read actual raw bytes? INPUT_RECORD looks rather specialized: http://msdn.microsoft.com/en-us/library/ms683499%28v=vs.85%29.aspx > and disk files can do nonblocking with > overlapped IO (though that is a wholly different beast and clearly > offtopic :)). It's not non-blocking then, it's asynchronous (it's blocking but in another thread ;-)). Regards Antoine. From robertc at robertcollins.net Tue Mar 5 08:39:55 2013 From: robertc at robertcollins.net (Robert Collins) Date: Tue, 5 Mar 2013 20:39:55 +1300 Subject: [Python-ideas] BufferedIO and detach In-Reply-To: <20130305083122.61ff329c@pitrou.net> References: <20130304105927.37331a4c@pitrou.net> <20130304114549.30fb772f@pitrou.net> <20130304221538.357d89af@pitrou.net> <20130304223251.4c2476b0@pitrou.net> <20130305083122.61ff329c@pitrou.net> Message-ID: On 5 March 2013 20:31, Antoine Pitrou wrote: > On Tue, 5 Mar 2013 10:50:01 +1300 > I didn't know about that. I wonder, what happens if the standard input > is redirected? > Also, is it able to read actual raw bytes? INPUT_RECORD looks rather > specialized: > http://msdn.microsoft.com/en-us/library/ms683499%28v=vs.85%29.aspx I don't know; cygwin's source may, or we could get someone with a Windows machine to do some testing. >> and disk files can do nonblocking with >> overlapped IO (though that is a wholly different beast and clearly >> offtopic :)). > > It's not non-blocking then, it's asynchronous (it's blocking but in > another thread ;-)). Well... its not in another userspace thread - its near-identical in implementation to Linux AIO : the kernel takes care of it. The deliver mechanism is however very different (you sleep and the kernel calls you back). -Rob From solipsis at pitrou.net Tue Mar 5 10:16:38 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 5 Mar 2013 10:16:38 +0100 Subject: [Python-ideas] Updated PEP 428 (pathlib) References: <20130303005129.26eb0e00@pitrou.net> <20130303114112.403019c6@pitrou.net> Message-ID: <20130305101638.2f30d4b3@pitrou.net> Le Sun, 3 Mar 2013 15:12:09 +0100, Charles-Fran?ois Natali a ?crit : > > Yes. We could call it > > "root" (http://en.wikipedia.org/wiki/Root_%28linguistics%29) but in > > this context it would be confusing. Also, it's not exactly the root > > since as you point there can still be a remaining suffix. > > Indeed, "root" would be even more confusing. > > > There's "stem", too (http://en.wikipedia.org/wiki/Word_stem). With > > the same provision about not being the actual stem. > > Also, it doesn't sound familiar (at least to me). > > How about "rootname", or "stripped_name" (the last one is a little > too long)? "rootname" is confusing because of filesystem roots, and the second is too long (not to mention it's not obvious what has been stripped). I really prefer "basename" or, if people are hostile, "stem". > > Yes, I think we could add a `allow_recursive` argument. > > Is there any other DoS issue? > > If by recursive you mean the '**' pattern (cross-directory match), > then I'm afraid that's not enough. > For example, a pattern like '*/../*/../*/../*/../*' would have the > same problem: Mmmh, I don't know how to guard against that. Perhaps by disallowing ".." in glob patterns? But the problem could still appear with symlinks. To be honest I don't think allowing untrusted users to specify a glob pattern is a very good idea. On the other hand, for the common use cases such as configuration files, the user should be trustable. Regards Antoine. From solipsis at pitrou.net Tue Mar 5 10:22:13 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 5 Mar 2013 10:22:13 +0100 Subject: [Python-ideas] BufferedIO and detach References: <20130304105927.37331a4c@pitrou.net> <20130304114549.30fb772f@pitrou.net> <20130304221538.357d89af@pitrou.net> <20130304223251.4c2476b0@pitrou.net> <20130305083122.61ff329c@pitrou.net> Message-ID: <20130305102213.4a3a5dc6@pitrou.net> Le Tue, 5 Mar 2013 20:39:55 +1300, Robert Collins a ?crit : > On 5 March 2013 20:31, Antoine Pitrou > wrote: > > On Tue, 5 Mar 2013 10:50:01 +1300 > > > > I didn't know about that. I wonder, what happens if the standard > > input is redirected? > > Also, is it able to read actual raw bytes? INPUT_RECORD looks rather > > specialized: > > http://msdn.microsoft.com/en-us/library/ms683499%28v=vs.85%29.aspx > > I don't know; cygwin's source may, or we could get someone with a > Windows machine to do some testing. Apparently you need ReadConsole to read bytes, not ReadConsoleInput: http://msdn.microsoft.com/en-us/library/ms684958%28v=vs.85%29.aspx However, none of those functions is technically non-blocking. You can poll the console using one of the wait functions, but there is an important caveat for ReadConsole: ? If the input buffer contains input events other than keyboard events (such as mouse events or window-resizing events), they are discarded. Those events can only be read by using the ReadConsoleInput function. ? So it seems ReadConsole can block even though you think some data is available. > > It's not non-blocking then, it's asynchronous (it's blocking but in > > another thread ;-)). > > Well... its not in another userspace thread - its near-identical in > implementation to Linux AIO : the kernel takes care of it. The deliver > mechanism is however very different (you sleep and the kernel calls > you back). It's still not non-blocking. On a non-blocking stream, a read fails when no data is available; you have to try reading again later. With asynchronous I/O, the blocking read is scheduled in the background, and it will call you back when finished. It's a different mode of operation. Regards Antoine. From solipsis at pitrou.net Tue Mar 5 10:22:54 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 5 Mar 2013 10:22:54 +0100 Subject: [Python-ideas] BufferedIO and detach References: <51350EA0.2020301@canterbury.ac.nz> Message-ID: <20130305102254.7b686018@pitrou.net> Le Tue, 05 Mar 2013 10:14:08 +1300, Greg Ewing a ?crit : > Guido van Rossum wrote: > > Personally, I'm not sure that your proposed API (a buffered_only > > flag to read()) is the best way to go about it. Maybe detach() > > should return the remaining buffered data? > > Maybe you could be allowed to read() from the buffered > stream after detatching the underlying source, which > would then return any data remaining in the buffer. Perhaps detach() can take an optional argument for that indeed. Regards Antoine. From robertc at robertcollins.net Tue Mar 5 11:03:07 2013 From: robertc at robertcollins.net (Robert Collins) Date: Tue, 5 Mar 2013 23:03:07 +1300 Subject: [Python-ideas] BufferedIO and detach In-Reply-To: <20130305102213.4a3a5dc6@pitrou.net> References: <20130304105927.37331a4c@pitrou.net> <20130304114549.30fb772f@pitrou.net> <20130304221538.357d89af@pitrou.net> <20130304223251.4c2476b0@pitrou.net> <20130305083122.61ff329c@pitrou.net> <20130305102213.4a3a5dc6@pitrou.net> Message-ID: On 5 March 2013 22:22, Antoine Pitrou wrote: > Le Tue, 5 Mar 2013 20:39:55 +1300, > Robert Collins > a ?crit : >> On 5 March 2013 20:31, Antoine Pitrou >> wrote: >> > On Tue, 5 Mar 2013 10:50:01 +1300 >> >> >> > I didn't know about that. I wonder, what happens if the standard >> > input is redirected? >> > Also, is it able to read actual raw bytes? INPUT_RECORD looks rather >> > specialized: >> > http://msdn.microsoft.com/en-us/library/ms683499%28v=vs.85%29.aspx >> >> I don't know; cygwin's source may, or we could get someone with a >> Windows machine to do some testing. > > Apparently you need ReadConsole to read bytes, not ReadConsoleInput: > http://msdn.microsoft.com/en-us/library/ms684958%28v=vs.85%29.aspx > > However, none of those functions is technically non-blocking. You can > poll the console using one of the wait functions, but there is an > important caveat for ReadConsole: > > ? If the input buffer contains input events other than keyboard events > (such as mouse events or window-resizing events), they are discarded. > Those events can only be read by using the ReadConsoleInput function. ? > > So it seems ReadConsole can block even though you think some data is > available. http://msdn.microsoft.com/en-us/library/ms685035%28v=vs.85%29.aspx Suggests you can indeed get key events from ReadConsoleInput. I don't know what redirected input does in that case. Anywhich way, its some future work that doesn't affect what can be done now. Thanks for the extended discussion, I think the next stage is for me to make a timeslice to put a patch together. -Rob -- Robert Collins Distinguished Technologist HP Cloud Services From storchaka at gmail.com Tue Mar 5 15:58:57 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 05 Mar 2013 16:58:57 +0200 Subject: [Python-ideas] Updated PEP 428 (pathlib) In-Reply-To: <20130305082350.37b18c22@pitrou.net> References: <20130303005129.26eb0e00@pitrou.net> <20130303114112.403019c6@pitrou.net> <20130305082350.37b18c22@pitrou.net> Message-ID: On 05.03.13 09:23, Antoine Pitrou wrote: > On Tue, 05 Mar 2013 00:33:48 +0100 > Jan Kaliszewski wrote: >> 1. Ad: >> >>> PurePosixPath('/usr/bin/python').relative('/etc') >> Traceback (most recent call last): >> ... >> ValueError: ... >> >> Shouldn't this particular operation return >> "PurePosixPath('/etc/../usr/bin/python')"? > > Think what happens if /etc is a symlink to /var/etc. > (not very likely to happen for /etc, but likely to happen in the > general case) posixpath.relpath('/usr/bin/python', '/etc') returns '../usr/bin/python'. Perhaps pathlib should have an option to provide such compatible behavior. P.S. Pathlib implementation has relative_to() method. relative() method exists too but looks as unrelated. From ethan at stoneleaf.us Tue Mar 5 16:23:46 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 05 Mar 2013 07:23:46 -0800 Subject: [Python-ideas] Updated PEP 428 (pathlib) In-Reply-To: <20130305101638.2f30d4b3@pitrou.net> References: <20130303005129.26eb0e00@pitrou.net> <20130303114112.403019c6@pitrou.net> <20130305101638.2f30d4b3@pitrou.net> Message-ID: <51360E02.7050101@stoneleaf.us> On 03/05/2013 01:16 AM, Antoine Pitrou wrote: > Le Sun, 3 Mar 2013 15:12:09 +0100, > Charles-Fran?ois Natali a ?crit : >> >> How about "rootname", or "stripped_name" (the last one is a little >> too long)? > > I really prefer "basename" [...] +1 -- ~Ethan~ From oscar.j.benjamin at gmail.com Tue Mar 5 16:56:13 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Tue, 5 Mar 2013 15:56:13 +0000 Subject: [Python-ideas] One-line "try with" statement In-Reply-To: <5133C274.1030506@canterbury.ac.nz> References: <15135910-8FB4-4B82-A93E-41CA48086A1B@breakrs.com> <1362306736.5045.3.camel@rogue.dyndns.info> <5133C274.1030506@canterbury.ac.nz> Message-ID: On 3 March 2013 21:36, Greg Ewing wrote: > Stefan Behnel wrote: >> >> The above example therefore strikes me as >> useless. If you need a try-except around a with block, then your context >> manager is doing something wrong. > > A with statement is equivalent to try-finally, not try-except, > so if you want to catch the exception you still need to put > a try-except somewhere. With statements can be used for any kind of exception handling you like, not just try/finally. For example: import contextlib @contextlib.contextmanager def ignore(errorcls): try: yield except errorcls: pass with ignore(ValueError): a = int('a') Oscar From christian at python.org Tue Mar 5 18:05:44 2013 From: christian at python.org (Christian Heimes) Date: Tue, 05 Mar 2013 18:05:44 +0100 Subject: [Python-ideas] Length hinting and preallocation for container types Message-ID: <513625E8.4060201@python.org> Hello, today I came across this slides https://speakerdeck.com/alex/why-python-ruby-and-javascript-are-slow by Alex Gaynor. The slides aren't some random rants on Python. Alex makes some valid points. Coincidentally he is a PyPy developer, too. ;) One of his assertions is about memory (re)allocation. C is faster in some cases because C code usually does fewer allocations and reallocations. Python has no API to easily reallocate a list with 100 items. Code like lst = [] for i in range(100): lst.append(i*i) has to resize the list multiple times. PyPy has a feature to create a preallocated list. https://bitbucket.org/pypy/pypy/commits/2ff5e3c765ef/ Internally CPython already distinguishes between the length of object and the allocation size of an object for some types like list, dict and bytearray. For example PyListObject has `ob_size` for __len__ and `allocated` for the amount of available `ob_item` slots. I suggest that we add two new functions to container types like list, bytearray and dict. obj.__preallocate__(size) increases the internal buffer by size elements. obj.__shrink__() dwindles the internal buffer to len(obj) elements, maybe a bit more. A new context manager aids users with preallocation and shrinking: class LengthHint: def __init__(self, container, hint): self.container = container self.hint = hint self.instance = None def __enter__(self): self.instance = self.container() self.instance.__preallocate__(self.hint) return self.instance def __exit__(self, exc_type, exc_val, exc_tb): self.instance.__shrink__() with LengthHint(list, 200) as lst: # lst has 200 ob_item slots but len(lst) == 0 for i in range(100): lst.append(i*i) # __exit__ shrinks ob_item to 100 The C implementation is trivial as the three types already have all features. Christian From solipsis at pitrou.net Tue Mar 5 18:10:46 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 5 Mar 2013 18:10:46 +0100 Subject: [Python-ideas] Length hinting and preallocation for container types References: <513625E8.4060201@python.org> Message-ID: <20130305181046.55b7f4b7@pitrou.net> Le Tue, 05 Mar 2013 18:05:44 +0100, Christian Heimes a ?crit : > today I came across this slides > https://speakerdeck.com/alex/why-python-ruby-and-javascript-are-slow > by Alex Gaynor. The slides aren't some random rants on Python. Alex > makes some valid points. Coincidentally he is a PyPy developer, > too. ;) Please see http://bugs.python.org/issue17338 > with LengthHint(list, 200) as lst: > # lst has 200 ob_item slots but len(lst) == 0 > for i in range(100): > lst.append(i*i) > # __exit__ shrinks ob_item to 100 So how about benchmark numbers? Regards Antoine. From eliben at gmail.com Tue Mar 5 18:15:37 2013 From: eliben at gmail.com (Eli Bendersky) Date: Tue, 5 Mar 2013 09:15:37 -0800 Subject: [Python-ideas] Length hinting and preallocation for container types In-Reply-To: <513625E8.4060201@python.org> References: <513625E8.4060201@python.org> Message-ID: On Tue, Mar 5, 2013 at 9:05 AM, Christian Heimes wrote: > Hello, > > today I came across this slides > https://speakerdeck.com/alex/why-python-ruby-and-javascript-are-slow by > Alex Gaynor. The slides aren't some random rants on Python. Alex makes > some valid points. Coincidentally he is a PyPy developer, too. ;) > > One of his assertions is about memory (re)allocation. C is faster in > some cases because C code usually does fewer allocations and > reallocations. Python has no API to easily reallocate a list with 100 > items. Code like > > lst = [] > for i in range(100): > lst.append(i*i) > > has to resize the list multiple times. PyPy has a feature to create a > preallocated list. https://bitbucket.org/pypy/pypy/commits/2ff5e3c765ef/ > > Internally CPython already distinguishes between the length of object > and the allocation size of an object for some types like list, dict and > bytearray. For example PyListObject has `ob_size` for __len__ and > `allocated` for the amount of available `ob_item` slots. > > I suggest that we add two new functions to container types like list, > bytearray and dict. obj.__preallocate__(size) increases the internal > buffer by size elements. obj.__shrink__() dwindles the internal buffer > to len(obj) elements, maybe a bit more. > > A new context manager aids users with preallocation and shrinking: > > class LengthHint: > def __init__(self, container, hint): > self.container = container > self.hint = hint > self.instance = None > > def __enter__(self): > self.instance = self.container() > self.instance.__preallocate__(self.hint) > return self.instance > > def __exit__(self, exc_type, exc_val, exc_tb): > self.instance.__shrink__() > > > with LengthHint(list, 200) as lst: > # lst has 200 ob_item slots but len(lst) == 0 > for i in range(100): > lst.append(i*i) > # __exit__ shrinks ob_item to 100 > The real problem is that this code is not idiomatic Python, especially if you want it to be reasonably fast: lst = [] for i in range(100): lst.append(i*i) Why not: lst = [i*i for i in range(100)] If the "append" pattern is complex, just "preallocate" like this: lst = [0] * 100 And then fill it. Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian at python.org Tue Mar 5 19:55:32 2013 From: christian at python.org (Christian Heimes) Date: Tue, 05 Mar 2013 19:55:32 +0100 Subject: [Python-ideas] Length hinting and preallocation for container types In-Reply-To: <20130305181046.55b7f4b7@pitrou.net> References: <513625E8.4060201@python.org> <20130305181046.55b7f4b7@pitrou.net> Message-ID: <51363FA4.7090901@python.org> Am 05.03.2013 18:10, schrieb Antoine Pitrou: > So how about benchmark numbers? The speedup is smallish: $ ./python -m timeit -n1000 "l = []" "l.__preallocate__(10000)" "app = l.append" "for i in range(10000): app(i)" "l.__shrink__()" 1000 loops, best of 3: 3.68 msec per loop $ ./python -m timeit -n1000 "l = []" "app = l.append" "for i in range(10000): app(i)" 1000 loops, best of 3: 3.75 msec per loop From eliben at gmail.com Tue Mar 5 20:44:15 2013 From: eliben at gmail.com (Eli Bendersky) Date: Tue, 5 Mar 2013 11:44:15 -0800 Subject: [Python-ideas] Length hinting and preallocation for container types In-Reply-To: <20130305193031.GA3176@untibox.unti> References: <513625E8.4060201@python.org> <20130305193031.GA3176@untibox.unti> Message-ID: > > The real problem is that this code is not idiomatic Python, > especially if > > you want it to be reasonably fast: > > > > ? lst = [] > > ? for i in range(100): > > ? ? ? lst.append(i*i) > > > > Why not: > > > > lst = [i*i for i in range(100)] > > > > If the "append" pattern is complex, just "preallocate" like this: > > > > lst = [0] * 100 > > > > And then fill it. > > > > Eli > > >How would you replicate the behavior of __exit__ in Christian's example? > > Why would I want to replicate it? Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Tue Mar 5 20:50:46 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 05 Mar 2013 11:50:46 -0800 Subject: [Python-ideas] Length hinting and preallocation for container types In-Reply-To: References: <513625E8.4060201@python.org> <20130305193031.GA3176@untibox.unti> Message-ID: <51364C96.4090509@stoneleaf.us> On 03/05/2013 11:44 AM, Eli Bendersky wrote: > > > The real problem is that this code is not idiomatic Python, especially if > > you want it to be reasonably fast: > > > > ? lst = [] > > ? for i in range(100): > > ? ? ? lst.append(i*i) > > > > Why not: > > > > lst = [i*i for i in range(100)] > > > > If the "append" pattern is complex, just "preallocate" like this: > > > > lst = [0] * 100 > > > > And then fill it. > > > > Eli > > >How would you replicate the behavior of __exit__ in Christian's example? > > > Why would I want to replicate it? I suspect the new behavior would be most useful when you don't know precisely how large the final list will be: overallocate (possibly by a large margin), then __exit__ returns the unused portion back to the pool). -- ~Ethan~ From abarnert at yahoo.com Tue Mar 5 21:29:13 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 5 Mar 2013 12:29:13 -0800 Subject: [Python-ideas] Updated PEP 428 (pathlib) In-Reply-To: <51360E02.7050101@stoneleaf.us> References: <20130303005129.26eb0e00@pitrou.net> <20130303114112.403019c6@pitrou.net> <20130305101638.2f30d4b3@pitrou.net> <51360E02.7050101@stoneleaf.us> Message-ID: <81CDA154-B979-4F5E-BC07-19A9B9A8D821@yahoo.com> Correct me if I'm remembering wrong, but the posix basename(1) tool can strip both dirnames _and_ extensions. So, any confusion here has a solid precedent. Sent from a random iPhone On Mar 5, 2013, at 7:23, Ethan Furman wrote: > On 03/05/2013 01:16 AM, Antoine Pitrou wrote: >> Le Sun, 3 Mar 2013 15:12:09 +0100, >> Charles-Fran?ois Natali a ?crit : >>> >>> How about "rootname", or "stripped_name" (the last one is a little >>> too long)? >> >> I really prefer "basename" [...] > > +1 > > -- > ~Ethan~ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From abarnert at yahoo.com Tue Mar 5 21:42:03 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 5 Mar 2013 12:42:03 -0800 (PST) Subject: [Python-ideas] Length hinting and preallocation for container types In-Reply-To: <51363FA4.7090901@python.org> References: <513625E8.4060201@python.org> <20130305181046.55b7f4b7@pitrou.net> <51363FA4.7090901@python.org> Message-ID: <1362516123.52007.YahooMailNeo@web184704.mail.ne1.yahoo.com> > From: Christian Heimes > > Am 05.03.2013 18:10, schrieb Antoine Pitrou: >> So how about benchmark numbers? > > The speedup is smallish: > > $ ./python -m timeit -n1000 "l = []" > "l.__preallocate__(10000)" "app = > l.append" "for i in range(10000): app(i)" > "l.__shrink__()" > 1000 loops, best of 3: 3.68 msec per loop > > $ ./python -m timeit -n1000 "l = []" "app = l.append" > "for i in > range(10000): app(i)" > 1000 loops, best of 3: 3.75 msec per loop So, that's a 1.8% speedup. While doing things right gives a 20% speedup: $ python3.3 -m timeit -n1000 "l=[]" "app = l.append" "for i in range(10000): app(i)" 1000 loops, best of 3: 557 usec per loop $ python3.3 -m timeit -n1000 "l = [i for i in range(10000)]" 1000 loops, best of 3: 447 usec per loop Or (but obviously this isn't generally applicable): $ python3.3 -m timeit -n1000 "l = list(range(10000))" 1000 loops, best of 3: 236 usec per loop From greg.ewing at canterbury.ac.nz Tue Mar 5 22:49:30 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 06 Mar 2013 10:49:30 +1300 Subject: [Python-ideas] BufferedIO and detach In-Reply-To: <20130305102254.7b686018@pitrou.net> References: <51350EA0.2020301@canterbury.ac.nz> <20130305102254.7b686018@pitrou.net> Message-ID: <5136686A.8040002@canterbury.ac.nz> Antoine Pitrou wrote: > Le Tue, 05 Mar 2013 10:14:08 +1300, > Greg Ewing a > ?crit : >>Maybe you could be allowed to read() from the buffered >>stream after detatching the underlying source, which >>would then return any data remaining in the buffer. > > Perhaps detach() can take an optional argument for that indeed. Does it need to be optional? Is there likely to be any code around that relies on read() *not* working on a detached stream? -- Greg From solipsis at pitrou.net Tue Mar 5 22:50:37 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 5 Mar 2013 22:50:37 +0100 Subject: [Python-ideas] BufferedIO and detach References: <51350EA0.2020301@canterbury.ac.nz> <20130305102254.7b686018@pitrou.net> <5136686A.8040002@canterbury.ac.nz> Message-ID: <20130305225037.44c430ee@pitrou.net> On Wed, 06 Mar 2013 10:49:30 +1300 Greg Ewing wrote: > Antoine Pitrou wrote: > > Le Tue, 05 Mar 2013 10:14:08 +1300, > > Greg Ewing a > > ?crit : > > >>Maybe you could be allowed to read() from the buffered > >>stream after detatching the underlying source, which > >>would then return any data remaining in the buffer. > > > > Perhaps detach() can take an optional argument for that indeed. > > Does it need to be optional? Is there likely to be any > code around that relies on read() *not* working on a > detached stream? detach() closes the stream by default, which is piece of behaviour you can't change nilly-willy. Regards Antoine. From foogod at gmail.com Tue Mar 5 23:03:20 2013 From: foogod at gmail.com (Alex Stewart) Date: Tue, 5 Mar 2013 14:03:20 -0800 Subject: [Python-ideas] Length hinting and preallocation for container types In-Reply-To: <1362516123.52007.YahooMailNeo@web184704.mail.ne1.yahoo.com> References: <513625E8.4060201@python.org> <20130305181046.55b7f4b7@pitrou.net> <51363FA4.7090901@python.org> <1362516123.52007.YahooMailNeo@web184704.mail.ne1.yahoo.com> Message-ID: On Tue, Mar 5, 2013 at 12:42 PM, Andrew Barnert wrote: > So, that's a 1.8% speedup. While doing things right gives a 20% speedup: > > $ python3.3 -m timeit -n1000 "l=[]" "app = l.append" "for i in > range(10000): app(i)" > 1000 loops, best of 3: 557 usec per loop > $ python3.3 -m timeit -n1000 "l = [i for i in range(10000)]" > 1000 loops, best of 3: 447 usec per loop > Yeah, I think it's pretty likely that any case where the proposed change would be applicable is probably going to be dominated by other factors more than by allocation overhead anyway.. The above does beg the question, though: Could we perhaps apply some of this thinking to list comprehensions, and speed them up even more by automatically taking hints from the inputs and preallocating the output list? (or do we already?) Not sure how much work that would be, or whether it would be worth it.. just a random thought.. --Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Wed Mar 6 01:15:01 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 05 Mar 2013 19:15:01 -0500 Subject: [Python-ideas] Updated PEP 434: IDLE Enhancement Exception Message-ID: I rewrote the PEP to better focus on the specific proposal and motivation. I also refined the formatting and added something about backwards compatibility with extensions. Is there any more discussion here before we post on pydev? -------------------------------------------------------------------- PEP: 434 Title: IDLE Enhancement Exception for All Branches Version: $Revision$ Last-Modified: $Date$ Author: Todd Rovito , Terry Reedy BDFL-Delegate: Nick Coghlan Status: Draft Type: Informational Content-Type: text/x-rst Created: 16-Feb-2013 Post-History: 16-Feb-2013 Abstract ======== Most CPython tracker issues are classified as behavior or enhancement. Most behavior patches are backported to branches for existing versions. Enhancement patches are restricted to the default branch that becomes the next Python version. This PEP proposes that the restriction on applying enhancements be relaxed for IDLE code, residing in .../Lib/idlelib/. In practice, this would mean that IDLE developers would not have to classify or agree on the classification of a patch but could instead focus on what is best for IDLE users and future IDLE developement. It would also mean that IDLE patches would not necessarily have to be split into 'bugfix' changes and enhancement changes. The PEP would apply to changes in existing features and addition of small features, such as would require a new menu entry, but not necessarily to possible major re-writes such as switching to themed widgets or tabbed windows. Motivation ========== This PEP was prompted by controversy on both the tracker and pydev list over adding Cut, Copy, and Paste to right-click context menus (Issue 1207589, opened in 2005 [1]_; pydev thread [2]_). The features were available as keyboard shortcuts but not on the context menu. It is standard, at least on Windows, that they should be when applicable (a read-only window would only have Copy), so users do not have to shift to the keyboard after selecting text for cutting or copying or a slice point for pasting. The context menu was not documented until 10 days before the new options were added (Issue 10405 [3]_). Normally, behavior is called a bug if it conflicts with documentation judged to be correct. But if there is no doc, what is the standard? If the code is its own documentation, most IDLE issues on the tracker are enhancement issues. If we substitute reasonable user expectation, (which can, of course, be its own subject of disagreement), many more issues are behavior issues. For context menus, people disagreed on the status of the additions -- bugfix or enhancement. Even people who called it an enhancement disagreed as to whether the patch should be backported. This PEP proposes to make the status disagreement irrelevant by explicitly allowing more liberal backporting than for other stdlib modules. Rationale ========= People primarily use IDLE by running the gui application, rather than by directly importing the effectively private (undocumented) implementation modules in idlelib. Whether they use the shell, the editor, or both, we believe they will benefit more from consistency across the latest releases of current Python versions than from consistency within the bugfix releases for one Python version. This is especially true when existing behavior is clearly unsatisfactory. When people use the standard interpreter, the OS-provided frame works pretty much the same for all Python versions. If, for instance, Microsoft were to upgrade the Command Prompt gui, the improvements would be present regardless of which Python were running within it. Similarly, if one edits Python code with editor X, behaviors such as the right-click context menu and the search-replace box do not depend on the version of Python being edited or even the language being edited. The benefit for IDLE developers is mixed. On the one hand, testing more versions and possibly having to adjust a patch, especially for 2.7, is more work. (There is, of course, the option on not backporting everything. For issue 12510, some changes to calltips for classes were not included in the 2.7 patch because of issues with old-style classes [4]_.) On the other hand, bike-shedding can be an energy drain. If the obvious fix for a bug looks like an enhancement, writing a separate bugfix-only patch is more work. And making the code diverge between versions makes future multi-version patches more difficult. These issue are illustrated by the search-and-replace dialog box. It used to raise an exception for certain user entries [5]_. The uncaught exception caused IDLE to exit. At least on Windows, the exit was silent (no visible traceback) and looked like a crash if IDLE was started normally, from an icon. Was this a bug? IDLE Help (on the current Help submenu) just says "Replace... Open a search-and-replace dialog box", and a box *was* opened. It is not, in general, a bug for a library method to raise an exception. And it is not, in general, a bug for a library method to ignore an exception raised by functions it calls. So if we were to adopt the 'code = doc' philosopy in the absence of detailed docs, one might say 'No'. However, IDLE exiting when it does not need to is definitely obnoxious. So four of us agreed that it should be prevented. But there was still the question of what to do instead? Catch the exception? Just not raise the exception? Beep? Display an error message box? Or try to do something useful with the user's entry? Would replacing a 'crash' with useful behavior be an enhancement, limited to future Python releases? Should IDLE developers have to ask that? Backwards Compatibility ======================= For IDLE, there are three types of users who might be concerned about back compatibility. First are people who run IDLE as an application. We have already discussed them above. Second are people who import one of the idlelib modules. As far as we know, this is only done to start the IDLE application, and we do not propose breaking such use. Otherwise, the modules are undocumented and effectively private implementations. If an IDLE module were defined as public, documented, and perhaps moved to the tkinter package, it would then follow the normal rules. (Documenting the private interfaces for the benefit of people working on the IDLE code is a separate issue.) Third are people who write IDLE extensions. The guaranteed extension interface is given in idlelib/extension.txt. This should be respected at least in existing versions, and not frivolously changed in future versions. But there is a warning that "The extension cannot assume much about this [EditorWindow] argument." This guarantee should rarely be an issue with patches, and the issue is not specific to 'enhancement' versus 'bugfix' patches. As is happens, after the context menu patch was applied, it came up that extensions that added items to the context menu (rare) would be broken because the patch a) added a new item to standard rmenu_specs and b) expected every rmenu_spec to be lengthened. It is not clear whether this violates the guarantee, but there is a second patch that fixes assumption b). It should be applied when it is clear that the first patch will not have to be reverted. References ========== .. [1] IDLE: Right Click Context Menu, Foord, Michael (http://bugs.python.org/issue1207589) .. [2] Cut/Copy/Paste items in IDLE right click context menu (http://mail.python.org/pipermail/python-dev/2012-November/122514.html) .. [3] IDLE breakpoint facility undocumented, Daily, Ned (http://bugs.python.org/issue10405) .. [4] IDLE: calltips mishandle raw strings and other examples, Reedy, Terry (http://bugs.python.org/issue12510) .. [5] IDLE: replace ending with '\' causes crash, Reedy, Terry (http://bugs.python.org/issue13052) Copyright ========= This document has been placed in the public domain. -- Terry Jan Reedy From pjenvey at underboss.org Wed Mar 6 02:06:25 2013 From: pjenvey at underboss.org (Philip Jenvey) Date: Tue, 5 Mar 2013 17:06:25 -0800 Subject: [Python-ideas] Length hinting and preallocation for container types In-Reply-To: References: <513625E8.4060201@python.org> <20130305181046.55b7f4b7@pitrou.net> <51363FA4.7090901@python.org> <1362516123.52007.YahooMailNeo@web184704.mail.ne1.yahoo.com> Message-ID: On Tue, Mar 5, 2013 at 2:03 PM, Alex Stewart wrote: > On Tue, Mar 5, 2013 at 12:42 PM, Andrew Barnert wrote: > >> So, that's a 1.8% speedup. While doing things right gives a 20% speedup: >> >> $ python3.3 -m timeit -n1000 "l=[]" "app = l.append" "for i in >> range(10000): app(i)" >> 1000 loops, best of 3: 557 usec per loop >> $ python3.3 -m timeit -n1000 "l = [i for i in range(10000)]" >> 1000 loops, best of 3: 447 usec per loop >> > > Yeah, I think it's pretty likely that any case where the proposed change > would be applicable is probably going to be dominated by other factors more > than by allocation overhead anyway.. > > The above does beg the question, though: Could we perhaps apply some of > this thinking to list comprehensions, and speed them up even more by > automatically taking hints from the inputs and preallocating the output > list? (or do we already?) > > Not sure how much work that would be, or whether it would be worth it.. > just a random thought.. > This was proposed here: http://bugs.python.org/issue14126 -- Philip Jenvey -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Wed Mar 6 03:23:26 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 06 Mar 2013 11:23:26 +0900 Subject: [Python-ideas] Length hinting and preallocation for container types In-Reply-To: References: <513625E8.4060201@python.org> Message-ID: <876215jmg1.fsf@uwakimon.sk.tsukuba.ac.jp> tl;dr - I'm glad that PyPy exists. But IMHO Python shouldn't grow APIs to manage memory. Rather, compilers should take advantage of existing internal APIs to do a better job of automatic management, and compiler writers should suggest internal, not public, APIs where they would help the compiler writers. Eli Bendersky writes: > On Tue, Mar 5, 2013 at 9:05 AM, Christian Heimes wrote: > > > Hello, > > > > today I came across this slides > > https://speakerdeck.com/alex/why-python-ruby-and-javascript-are-slow by > > Alex Gaynor. The slides aren't some random rants on Python. Alex makes > > some valid points. Coincidentally he is a PyPy developer, > > too. ;) Not at all coincidentally. As a compiler writer (which he refers to in the slides several times) he is offended by poor performance, as measured in CPU/invocation. Good for him! I'm glad he's working on PyPy! But when compiler writers start talking language design for performance, we inevitably end up with C<0.5 wink/>. > > One of his assertions is about memory (re)allocation. C is faster > > in some cases because C code usually does fewer allocations and > > reallocations. Python has no API to easily reallocate a list with > > 100 items. And it shouldn't. But why do you think allocation is slow in the general case? Sure, it involves system calls which indeed do slow things down. But there are ways to arrange that those calls are amortized (eg, preallocation of arenas, exponential reallocation for fast-growing objects). Yes, the few instructions it takes to grab a slab of memory off the heap add up. But is it worth programmer time to actually estimate correctly the size of the list in advance? What happens if the code is reused in an application where the lists are two orders of magnitude larger? Won't performance go in the tank? (According to your benchmark, I don't think anybody would ever notice.) > > Code like > > > > lst = [] > > for i in range(100): > > lst.append(i*i) shouldn't be written. As Eli writes: > The real problem is that this code is not idiomatic Python[...]. +as-many-as-I've-got-in-my-pocket. It's "for s in strings: s0 += s" in another guise. "We gots idioms fo' dat!" > > Internally CPython already distinguishes between the length of > > object and the allocation size of an object for some types like > > list, dict and bytearray. For example PyListObject has `ob_size` > > for __len__ and `allocated` for the amount of available `ob_item` > > slots. Good. Are we done now? Python has the necessary features, let the compiler writers take advantage of them. But exposing them to the user and requiring that the user do ugly and fragile things to get a minor speedup isn't Pythonic. > > with LengthHint(list, 200) as lst: > > # lst has 200 ob_item slots but len(lst) == 0 > > for i in range(100): > > lst.append(i*i) > > # __exit__ shrinks ob_item to 100 But that's ugly, especially the literal "200". The fact that Python doesn't splat explicit memory management in our T-shirts is one of the reasons why we use Python. And Python does grow extensions which optimize such patterns, beautiful features at that. Back to Eli: > [If] you want it to be reasonably fast [...] why not: > > lst = [i*i for i in range(100)] Again: Alex complains that "objects may be dicts, but dicts aren't objects". The performance point is that dicts are slow. True. But again Python (not all implementations?) has an optimization (though not pretty): objects can have slots. It also has a (potentially horrible!) pessimization: properties. The point is that object member access is often the pretty way to code. If it needs to be fast, we can do that. If it needs to be small or obey DRY, we can do that, too. From tjreedy at udel.edu Wed Mar 6 05:06:23 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 05 Mar 2013 23:06:23 -0500 Subject: [Python-ideas] Length hinting and preallocation for container types In-Reply-To: <51364C96.4090509@stoneleaf.us> References: <513625E8.4060201@python.org> <20130305193031.GA3176@untibox.unti> <51364C96.4090509@stoneleaf.us> Message-ID: On 3/5/2013 2:50 PM, Ethan Furman wrote: > I suspect the new behavior would be most useful when you don't know > precisely how large the final list will be: overallocate (possibly by a > large margin), then __exit__ returns the unused portion back to the pool). If one counts the items as they are added, it is easy to delete extras. ll = [None]*200 for i in range(195): ll[i] = i del ll[195:] ll[190:] # [190, 191, 192, 193, 194] del ll[ll.index(None):] would do the same thing and be faster than counting one by one. -- Terry Jan Reedy From pjenvey at underboss.org Wed Mar 6 05:46:48 2013 From: pjenvey at underboss.org (Philip Jenvey) Date: Tue, 5 Mar 2013 20:46:48 -0800 Subject: [Python-ideas] Length hinting and preallocation for container types In-Reply-To: References: <513625E8.4060201@python.org> <20130305193031.GA3176@untibox.unti> <51364C96.4090509@stoneleaf.us> Message-ID: On Tue, Mar 5, 2013 at 8:06 PM, Terry Reedy wrote: > On 3/5/2013 2:50 PM, Ethan Furman wrote: > > I suspect the new behavior would be most useful when you don't know >> precisely how large the final list will be: overallocate (possibly by a >> large margin), then __exit__ returns the unused portion back to the pool). >> > > If one counts the items as they are added, it is easy to delete extras. > > ll = [None]*200 > for i in range(195): > ll[i] = i > del ll[195:] > ll[190:] > # [190, 191, 192, 193, 194] > > del ll[ll.index(None):] > would do the same thing and be faster than counting one by one. > Length hints can lie, so there's a possibility of underallocation. So you need an additional check to use use ll.append instead -- Philip Jenvey -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian at python.org Wed Mar 6 12:47:20 2013 From: christian at python.org (Christian Heimes) Date: Wed, 06 Mar 2013 12:47:20 +0100 Subject: [Python-ideas] Length hinting and preallocation for container types In-Reply-To: <1362516123.52007.YahooMailNeo@web184704.mail.ne1.yahoo.com> References: <513625E8.4060201@python.org> <20130305181046.55b7f4b7@pitrou.net> <51363FA4.7090901@python.org> <1362516123.52007.YahooMailNeo@web184704.mail.ne1.yahoo.com> Message-ID: Am 05.03.2013 21:42, schrieb Andrew Barnert: > So, that's a 1.8% speedup. While doing things right gives a 20% speedup: > > $ python3.3 -m timeit -n1000 "l=[]" "app = l.append" "for i in range(10000): app(i)" > 1000 loops, best of 3: 557 usec per loop > $ python3.3 -m timeit -n1000 "l = [i for i in range(10000)]" > 1000 loops, best of 3: 447 usec per loop > > Or (but obviously this isn't generally applicable): Obviously a list comprehension can't be used in all cases, too. AFAIK the list comprehension already pre-allocates 10000 slots because a range object has a length. From random832 at fastmail.us Wed Mar 6 16:19:31 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Wed, 06 Mar 2013 10:19:31 -0500 Subject: [Python-ideas] BufferedIO and detach In-Reply-To: <20130305102213.4a3a5dc6@pitrou.net> References: <20130304105927.37331a4c@pitrou.net> <20130304114549.30fb772f@pitrou.net> <20130304221538.357d89af@pitrou.net> <20130304223251.4c2476b0@pitrou.net> <20130305083122.61ff329c@pitrou.net> <20130305102213.4a3a5dc6@pitrou.net> Message-ID: <1362583171.16730.140661200776617.640352F5@webmail.messagingengine.com> On Tue, Mar 5, 2013, at 4:22, Antoine Pitrou wrote: > Apparently you need ReadConsole to read bytes, not ReadConsoleInput: > http://msdn.microsoft.com/en-us/library/ms684958%28v=vs.85%29.aspx ReadConsole reads characters. Using ReadConsoleA to get bytes is almost certainly not what you want 90% of the time. Unfortunately, Python does it (or, more likely, uses ReadFile which does the same thing) now, at least in version 2.7. I may post to this list later this week suggesting improvements to the console streams on win32. From random832 at fastmail.us Wed Mar 6 16:25:54 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Wed, 06 Mar 2013 10:25:54 -0500 Subject: [Python-ideas] BufferedIO and detach In-Reply-To: References: <20130304105927.37331a4c@pitrou.net> <20130304114549.30fb772f@pitrou.net> <20130304221538.357d89af@pitrou.net> <20130304223251.4c2476b0@pitrou.net> <20130305083122.61ff329c@pitrou.net> <20130305102213.4a3a5dc6@pitrou.net> Message-ID: <1362583554.19742.140661200778321.3EEC35BF@webmail.messagingengine.com> On Tue, Mar 5, 2013, at 5:03, Robert Collins wrote: > Suggests you can indeed get key events from ReadConsoleInput. I don't > know what redirected input does in that case. Redirected I/O does not, in general, work with console functions. (I haven't tried ReadConsoleInput, but even ReadConsole [which returns characters] and WriteConsole don't work). You would have to detect whether the standard input(/output/etc) handle is a console and behave differently depending on if it is or not. From random832 at fastmail.us Wed Mar 6 16:45:28 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Wed, 06 Mar 2013 10:45:28 -0500 Subject: [Python-ideas] Updated PEP 428 (pathlib) In-Reply-To: <81CDA154-B979-4F5E-BC07-19A9B9A8D821@yahoo.com> References: <20130303005129.26eb0e00@pitrou.net> <20130303114112.403019c6@pitrou.net> <20130305101638.2f30d4b3@pitrou.net> <51360E02.7050101@stoneleaf.us> <81CDA154-B979-4F5E-BC07-19A9B9A8D821@yahoo.com> Message-ID: <1362584728.25097.140661200789829.3588A7CD@webmail.messagingengine.com> On Tue, Mar 5, 2013, at 15:29, Andrew Barnert wrote: > Correct me if I'm remembering wrong, but the posix basename(1) tool can > strip both dirnames _and_ extensions. > > So, any confusion here has a solid precedent. Yes, but it requires you to pass in the extension. So what about p = '/foo/pathlib.tar.gz' p.basename() == 'pathlib.tar.gz' p.basename('.gz') == 'pathlib.tar' p.basename('.tar.gz') == 'pathlib' p.basename('.a') exception? 'pathlib.tar.gz'? p.basename(True) == 'pathlib.tar' p.basename(1) == 'pathlib.tar' p.basename(2) == 'pathlib' ? From jeanpierreda at gmail.com Wed Mar 6 17:31:21 2013 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Wed, 6 Mar 2013 11:31:21 -0500 Subject: [Python-ideas] Updated PEP 428 (pathlib) In-Reply-To: References: <20130303005129.26eb0e00@pitrou.net> <20130303114112.403019c6@pitrou.net> Message-ID: On Sun, Mar 3, 2013 at 9:12 AM, Charles-Fran?ois Natali wrote: > Yes, I meant glob() (fnmatch() implementations can also be subject to > DoS through stack exhaustion, but Python's implementation is based on > regex). I don't know about stack exhaustion, but Python's regular expression implementation is agonizingly slow in the worst case, and fnmatch inherits this. >>> fnmatch.fnmatch('a'*50, '*a*'*50) # weird how the pattern/string order is reversed from re.match That will take about 200 years to complete with CPython. Maybe a little less, if you're running a particularly fast computer. ;) Is that the sort of DoS issue you are looking for? -- Devin From cf.natali at gmail.com Wed Mar 6 18:08:52 2013 From: cf.natali at gmail.com (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=) Date: Wed, 6 Mar 2013 18:08:52 +0100 Subject: [Python-ideas] Updated PEP 428 (pathlib) In-Reply-To: References: <20130303005129.26eb0e00@pitrou.net> <20130303114112.403019c6@pitrou.net> Message-ID: >>>> fnmatch.fnmatch('a'*50, '*a*'*50) # weird how the pattern/string order is reversed from re.match > > That will take about 200 years to complete with CPython. Maybe a > little less, if you're running a particularly fast computer. ;) > > Is that the sort of DoS issue you are looking for? Exactly (the complexity of a typical ad-hoc fnmatch() implementation is the reason some servers like vsftpd use their own version, and it's even worse with a regex-based implementation as you notice). Now, the question is whether we want to try to mitigate this or not... From python at mrabarnett.plus.com Wed Mar 6 18:51:33 2013 From: python at mrabarnett.plus.com (MRAB) Date: Wed, 06 Mar 2013 17:51:33 +0000 Subject: [Python-ideas] Updated PEP 428 (pathlib) In-Reply-To: References: <20130303005129.26eb0e00@pitrou.net> <20130303114112.403019c6@pitrou.net> Message-ID: <51378225.9040901@mrabarnett.plus.com> On 2013-03-06 17:08, Charles-Fran?ois Natali wrote: >>>>> fnmatch.fnmatch('a'*50, '*a*'*50) # weird how the pattern/string order is reversed from re.match >> >> That will take about 200 years to complete with CPython. Maybe a >> little less, if you're running a particularly fast computer. ;) >> >> Is that the sort of DoS issue you are looking for? > > Exactly (the complexity of a typical ad-hoc fnmatch() implementation > is the reason some servers like vsftpd use their own version, and it's > even worse with a regex-based implementation as you notice). > > Now, the question is whether we want to try to mitigate this or not... > It's not something I've ever used, but it doesn't look that difficult compared to regex if all it has is "*", "?", "[...]" and "[!...]". From abarnert at yahoo.com Wed Mar 6 19:28:59 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 6 Mar 2013 10:28:59 -0800 Subject: [Python-ideas] Updated PEP 428 (pathlib) In-Reply-To: <1362584728.25097.140661200789829.3588A7CD@webmail.messagingengine.com> References: <20130303005129.26eb0e00@pitrou.net> <20130303114112.403019c6@pitrou.net> <20130305101638.2f30d4b3@pitrou.net> <51360E02.7050101@stoneleaf.us> <81CDA154-B979-4F5E-BC07-19A9B9A8D821@yahoo.com> <1362584728.25097.140661200789829.3588A7CD@webmail.messagingengine.com> Message-ID: On Mar 6, 2013, at 7:45, random832 at fastmail.us wrote: > On Tue, Mar 5, 2013, at 15:29, Andrew Barnert wrote: >> Correct me if I'm remembering wrong, but the posix basename(1) tool can >> strip both dirnames _and_ extensions. >> >> So, any confusion here has a solid precedent. > > Yes, but it requires you to pass in the extension. > > So what about > > p = '/foo/pathlib.tar.gz' > p.basename() == 'pathlib.tar.gz' > p.basename('.gz') == 'pathlib.tar' > p.basename('.tar.gz') == 'pathlib' > p.basename('.a') exception? 'pathlib.tar.gz'? > p.basename(True) == 'pathlib.tar' > p.basename(1) == 'pathlib.tar' > p.basename(2) == 'pathlib' ? > I was just pointing out that the name already does double duty, and therefore it's a few decades too late for people to complain about confusion, not suggesting that we emulate it. But now that I see your examples, it's not a bad idea. Anyone familiar with unix will immediately understand what basename('.gz') is doing and appreciate the shortcut, and the alternate forms you provided aren't surprising. The problem is that I think people will still want a stripext/stem/root/whatever function that removes the extension without also removing the dirname. In fact, I've got a script here where one of my coworkers has written $(dirname $z)/$(basename $z .zip) to get around it. And I don't think p.basename(True, False) is a viable answer. Besides, what names would you give the two flags? If there were good names for those, there would be good names for the two separate functions, right? Personally, I don't have a problem with the stem or stripext suggestions (although to me, the former implies stripping all extensions, the latter just one), but I guess they haven't gotten much traction. From random832 at fastmail.us Wed Mar 6 20:15:57 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Wed, 06 Mar 2013 14:15:57 -0500 Subject: [Python-ideas] Updated PEP 428 (pathlib) In-Reply-To: References: <20130303005129.26eb0e00@pitrou.net> <20130303114112.403019c6@pitrou.net> <20130305101638.2f30d4b3@pitrou.net> <51360E02.7050101@stoneleaf.us> <81CDA154-B979-4F5E-BC07-19A9B9A8D821@yahoo.com> <1362584728.25097.140661200789829.3588A7CD@webmail.messagingengine.com> Message-ID: <1362597357.18592.140661200879961.661220A3@webmail.messagingengine.com> On Wed, Mar 6, 2013, at 13:28, Andrew Barnert wrote: > The problem is that I think people will still want a > stripext/stem/root/whatever function that removes the extension without > also removing the dirname. In fact, I've got a script here where one of > my coworkers has written $(dirname $z)/$(basename $z .zip) to get around > it. > > And I don't think p.basename(True, False) is a viable answer. Besides, > what names would you give the two flags? If there were good names for > those, there would be good names for the two separate functions, right? > > Personally, I don't have a problem with the stem or stripext suggestions > (although to me, the former implies stripping all extensions, the latter > just one), but I guess they haven't gotten much traction. Well, whatever you do is competing with p[:-len('.gz')] (and in shell, your co-worker could have done ${z%.zip} in ksh/bash/POSIX) - unless you have an example of an OS where the 'extension' component doesn't simply append at the end (well, I suppose they're not case-sensitive on windows - and how much weird long/short filename stuff does pathlib do on windows?), I'm not sure how useful a function for this is. Although, I was actually half-tempted to suggest p.basename - p.extension and propose subtraction as a general "remove matching suffix" operation on strings (or even without that, it could be defined on a class that .basename returns) I'm not sure what the use case is for doing this without already knowing the extension, to be honest. This isn't an operation the basename(1) tool supports, either [well, you could do $(basename $x .${x##*.}), but it doesn't support it by itself] -- Random832 From cf.natali at gmail.com Wed Mar 6 20:21:45 2013 From: cf.natali at gmail.com (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=) Date: Wed, 6 Mar 2013 20:21:45 +0100 Subject: [Python-ideas] Updated PEP 428 (pathlib) In-Reply-To: <51378225.9040901@mrabarnett.plus.com> References: <20130303005129.26eb0e00@pitrou.net> <20130303114112.403019c6@pitrou.net> <51378225.9040901@mrabarnett.plus.com> Message-ID: >> Now, the question is whether we want to try to mitigate this or not... >> > It's not something I've ever used, but it doesn't look that difficult > compared to regex if all it has is "*", "?", "[...]" and "[!...]". What's not difficult? Avoiding DoS with arbitrary glob patterns? If yes, please share your idea :-) From jeanpierreda at gmail.com Wed Mar 6 21:01:27 2013 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Wed, 6 Mar 2013 15:01:27 -0500 Subject: [Python-ideas] Updated PEP 428 (pathlib) In-Reply-To: References: <20130303005129.26eb0e00@pitrou.net> <20130303114112.403019c6@pitrou.net> <51378225.9040901@mrabarnett.plus.com> Message-ID: On Wed, Mar 6, 2013 at 2:21 PM, Charles-Fran?ois Natali wrote: >>> Now, the question is whether we want to try to mitigate this or not... >>> >> It's not something I've ever used, but it doesn't look that difficult >> compared to regex if all it has is "*", "?", "[...]" and "[!...]". > > What's not difficult? > Avoiding DoS with arbitrary glob patterns? > > If yes, please share your idea :-) Compile the glob pattern to an NFA and simulate the NFA efficiently (using Thompson's algorithm or even backtracking with memoization, rather than plain backtracking). For example, see: http://swtch.com/~rsc/regexp/regexp1.html -- Devin From python at mrabarnett.plus.com Wed Mar 6 21:20:29 2013 From: python at mrabarnett.plus.com (MRAB) Date: Wed, 06 Mar 2013 20:20:29 +0000 Subject: [Python-ideas] Updated PEP 428 (pathlib) In-Reply-To: References: <20130303005129.26eb0e00@pitrou.net> <20130303114112.403019c6@pitrou.net> <51378225.9040901@mrabarnett.plus.com> Message-ID: <5137A50D.70504@mrabarnett.plus.com> On 2013-03-06 19:21, Charles-Fran?ois Natali wrote: >>> Now, the question is whether we want to try to mitigate this or >>> not... >>> >> It's not something I've ever used, but it doesn't look that >> difficult compared to regex if all it has is "*", "?", "[...]" and >> "[!...]". > > What's not difficult? Avoiding DoS with arbitrary glob patterns? > > If yes, please share your idea :-) > I wrote an alternative regex implementation (it's on PyPI). It's a lot more resistant to catastrophic backtracking. What's need for fnmatch is a lot simpler than that! :-) From abarnert at yahoo.com Wed Mar 6 22:18:46 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 6 Mar 2013 13:18:46 -0800 (PST) Subject: [Python-ideas] Updated PEP 428 (pathlib) In-Reply-To: <1362597357.18592.140661200879961.661220A3@webmail.messagingengine.com> References: <20130303005129.26eb0e00@pitrou.net> <20130303114112.403019c6@pitrou.net> <20130305101638.2f30d4b3@pitrou.net> <51360E02.7050101@stoneleaf.us> <81CDA154-B979-4F5E-BC07-19A9B9A8D821@yahoo.com> <1362584728.25097.140661200789829.3588A7CD@webmail.messagingengine.com> <1362597357.18592.140661200879961.661220A3@webmail.messagingengine.com> Message-ID: <1362604726.91308.YahooMailNeo@web184706.mail.ne1.yahoo.com> From: "random832 at fastmail.us" Sent: Wednesday, March 6, 2013 11:15 AM > On Wed, Mar 6, 2013, at 13:28, Andrew Barnert wrote: >> The problem is that I think people will still want a >> stripext/stem/root/whatever function that removes the extension without >> also removing the dirname. In fact, I've got a script here where one of >> my coworkers has written $(dirname $z)/$(basename $z .zip) to get around >> it. ? > Well, whatever you do is competing with p[:-len('.gz')] Only the "remove extension by name" is competing with that. The "remove any and all extensions" or "remove 1 extension" or "remove n extensions" cases?which I think are more common?are not.?Of course they _are_ competing with various uses of partition/rpartition/split/rsplit.?But you could make the same argument for everything in a path library?p.dirname() is competing with p.rpartition(pathsep)[0], and 'p / 'foo' is competing with p + pathsep + 'foo', and so on. As I understand it, the reason to have a path API is to provide a "one way to do it" that's obvious, readable, and hard to get wrong. And I think p.stripext('.gz') or p.stripext(2) or p.stripext()?are better than p[:-len('.gz')] or p.rsplit('.', 2)[0] or p.partition('.')[0] in that regard. Except for the name, which nobody's come up with a good suggestion for, and the fact that we probably don't want to cram all three versions of the operation into one function, or even necessarily support all three at all. > (and in shell, > your co-worker could have done ${z%.zip} in ksh/bash/POSIX) - unless you > have an example of an OS where the 'extension' component doesn't > simply > append at the end (well, I suppose they're not case-sensitive on windows > - and how much weird long/short filename stuff does pathlib do on > windows?) Funnily enough, according to a comment, this was intended to handle all-caps .ZIP on Windows under cygwin.?I don't know if it actually works (or, for that matter, if %.zip would have worked), as?I don't have a Windows box with cygwin installed handy.?But that's beside the point. The point is that people want a method that removes extensions (whether by name or otherwise) that doesn't also remove dirnames. And if you force them to come up with such a method themselves, they're not necessarily going to come up with a good one. I'm sure if we had your extended basename but no stripext, someone would write p.dirname()/p.basename('.zip'). From cf.natali at gmail.com Wed Mar 6 23:03:09 2013 From: cf.natali at gmail.com (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=) Date: Wed, 6 Mar 2013 23:03:09 +0100 Subject: [Python-ideas] Updated PEP 428 (pathlib) In-Reply-To: References: <20130303005129.26eb0e00@pitrou.net> <20130303114112.403019c6@pitrou.net> <51378225.9040901@mrabarnett.plus.com> Message-ID: > Compile the glob pattern to an NFA and simulate the NFA efficiently > (using Thompson's algorithm or even backtracking with memoization, > rather than plain backtracking). Well, I'm sure it would be much better, but that would be a rather large piece of code, which would probably belong to a new regex engine, no? > I wrote an alternative regex implementation (it's on PyPI). > It's a lot more resistant to catastrophic backtracking. > What's need for fnmatch is a lot simpler than that! :-) Yes, I know about your regex implementation, but it looks like it's not going in anytime soon. Do you think you could come up with a reasonable - i.e. self-contained - patch for fnmatch? From python at mrabarnett.plus.com Wed Mar 6 23:14:57 2013 From: python at mrabarnett.plus.com (MRAB) Date: Wed, 06 Mar 2013 22:14:57 +0000 Subject: [Python-ideas] Updated PEP 428 (pathlib) In-Reply-To: <1362604726.91308.YahooMailNeo@web184706.mail.ne1.yahoo.com> References: <20130303005129.26eb0e00@pitrou.net> <20130303114112.403019c6@pitrou.net> <20130305101638.2f30d4b3@pitrou.net> <51360E02.7050101@stoneleaf.us> <81CDA154-B979-4F5E-BC07-19A9B9A8D821@yahoo.com> <1362584728.25097.140661200789829.3588A7CD@webmail.messagingengine.com> <1362597357.18592.140661200879961.661220A3@webmail.messagingengine.com> <1362604726.91308.YahooMailNeo@web184706.mail.ne1.yahoo.com> Message-ID: <5137BFE1.6060103@mrabarnett.plus.com> On 2013-03-06 21:18, Andrew Barnert wrote: > From: "random832 at fastmail.us" > > Sent: Wednesday, March 6, 2013 11:15 AM > > >> On Wed, Mar 6, 2013, at 13:28, Andrew Barnert wrote: >>> The problem is that I think people will still want a >>> stripext/stem/root/whatever function that removes the extension >>> without also removing the dirname. In fact, I've got a script >>> here where one of my coworkers has written $(dirname >>> $z)/$(basename $z .zip) to get around it. > ? > > >> Well, whatever you do is competing with p[:-len('.gz')] > > Only the "remove extension by name" is competing with that. The > "remove any and all extensions" or "remove 1 extension" or "remove n > extensions" cases?which I think are more common?are not. Of course > they _are_ competing with various uses of > partition/rpartition/split/rsplit. But you could make the same > argument for everything in a path library?p.dirname() is competing > with p.rpartition(pathsep)[0], and 'p / 'foo' is competing with p + > pathsep + 'foo', and so on. > > As I understand it, the reason to have a path API is to provide a > "one way to do it" that's obvious, readable, and hard to get wrong. > And I think p.stripext('.gz') or p.stripext(2) or p.stripext() are > better than p[:-len('.gz')] or p.rsplit('.', 2)[0] or > p.partition('.')[0] in that regard. > > Except for the name, which nobody's come up with a good suggestion > for, and the fact that we probably don't want to cram all three > versions of the operation into one function, or even necessarily > support all three at all. > >> (and in shell, your co-worker could have done ${z%.zip} in >> ksh/bash/POSIX) - unless you have an example of an OS where the >> 'extension' component doesn't simply append at the end (well, I >> suppose they're not case-sensitive on windows - and how much weird >> long/short filename stuff does pathlib do on windows?) > > Funnily enough, according to a comment, this was intended to handle > all-caps .ZIP on Windows under cygwin. I don't know if it actually > works (or, for that matter, if %.zip would have worked), as I don't > have a Windows box with cygwin installed handy. But that's beside the > point. The point is that people want a method that removes extensions > (whether by name or otherwise) that doesn't also remove dirnames. And > if you force them to come up with such a method themselves, they're > not necessarily going to come up with a good one. I'm sure if we had > your extended basename but no stripext, someone would write > p.dirname()/p.basename('.zip'). > For some reason, p.basename('.zip') feels 'wrong' to me, but p.basename() feels OK. Perhaps it's because it's not immediately obvious what the optional argument is for (or, at least, that's how it seems to me!). On the other hand, p.stripext('.zip') feels OK because it suggests that you're stripping off the '.zip' extension (compare with str.strip), but would that mean that you'd expect p.stripext() also to strip off the extension, whatever it was? From jeanpierreda at gmail.com Wed Mar 6 23:22:16 2013 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Wed, 6 Mar 2013 17:22:16 -0500 Subject: [Python-ideas] Updated PEP 428 (pathlib) In-Reply-To: References: <20130303005129.26eb0e00@pitrou.net> <20130303114112.403019c6@pitrou.net> <51378225.9040901@mrabarnett.plus.com> Message-ID: On Wed, Mar 6, 2013 at 5:03 PM, Charles-Fran?ois Natali wrote: >> Compile the glob pattern to an NFA and simulate the NFA efficiently >> (using Thompson's algorithm or even backtracking with memoization, >> rather than plain backtracking). > > Well, I'm sure it would be much better, but that would be a rather > large piece of code, which would probably belong to a new regex > engine, no? It's a weekend worth of code at most. A simple regex engine is trivial, and glob is even simpler than that. (For example, we can use the glob pattern itself as the NFA during simulation). -- Devin From python at mrabarnett.plus.com Wed Mar 6 23:24:44 2013 From: python at mrabarnett.plus.com (MRAB) Date: Wed, 06 Mar 2013 22:24:44 +0000 Subject: [Python-ideas] Updated PEP 428 (pathlib) In-Reply-To: References: <20130303005129.26eb0e00@pitrou.net> <20130303114112.403019c6@pitrou.net> <51378225.9040901@mrabarnett.plus.com> Message-ID: <5137C22C.9030007@mrabarnett.plus.com> On 2013-03-06 22:03, Charles-Fran?ois Natali wrote: >> Compile the glob pattern to an NFA and simulate the NFA efficiently >> (using Thompson's algorithm or even backtracking with memoization, >> rather than plain backtracking). > > Well, I'm sure it would be much better, but that would be a rather > large piece of code, which would probably belong to a new regex > engine, no? > >> I wrote an alternative regex implementation (it's on PyPI). >> It's a lot more resistant to catastrophic backtracking. >> What's need for fnmatch is a lot simpler than that! :-) > > Yes, I know about your regex implementation, but it looks like it's > not going in anytime soon. > Do you think you could come up with a reasonable - i.e. self-contained > - patch for fnmatch? > I think it's worth looking at! From jeanpierreda at gmail.com Wed Mar 6 23:35:05 2013 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Wed, 6 Mar 2013 17:35:05 -0500 Subject: [Python-ideas] Updated PEP 428 (pathlib) In-Reply-To: <5137A50D.70504@mrabarnett.plus.com> References: <20130303005129.26eb0e00@pitrou.net> <20130303114112.403019c6@pitrou.net> <51378225.9040901@mrabarnett.plus.com> <5137A50D.70504@mrabarnett.plus.com> Message-ID: On Wed, Mar 6, 2013 at 3:20 PM, MRAB wrote: > I wrote an alternative regex implementation (it's on PyPI). It's a lot > more resistant to catastrophic backtracking. How resistant? If it's possible to have catastrophic backtracking even if groups and backreferences aren't involved, then it wouldn't solve the fnmatch DOS problem. -- Devin From python at mrabarnett.plus.com Wed Mar 6 23:40:54 2013 From: python at mrabarnett.plus.com (MRAB) Date: Wed, 06 Mar 2013 22:40:54 +0000 Subject: [Python-ideas] Updated PEP 428 (pathlib) In-Reply-To: References: <20130303005129.26eb0e00@pitrou.net> <20130303114112.403019c6@pitrou.net> <51378225.9040901@mrabarnett.plus.com> <5137A50D.70504@mrabarnett.plus.com> Message-ID: <5137C5F6.3070109@mrabarnett.plus.com> On 2013-03-06 22:35, Devin Jeanpierre wrote: > On Wed, Mar 6, 2013 at 3:20 PM, MRAB wrote: >> I wrote an alternative regex implementation (it's on PyPI). It's a lot >> more resistant to catastrophic backtracking. > > How resistant? If it's possible to have catastrophic backtracking even > if groups and backreferences aren't involved, then it wouldn't solve > the fnmatch DOS problem. > If there aren't capture groups, then you can use a DFA. The re and regex modules use NFA because of the various other features required. From jeanpierreda at gmail.com Wed Mar 6 23:49:00 2013 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Wed, 6 Mar 2013 17:49:00 -0500 Subject: [Python-ideas] Updated PEP 428 (pathlib) In-Reply-To: <5137C5F6.3070109@mrabarnett.plus.com> References: <20130303005129.26eb0e00@pitrou.net> <20130303114112.403019c6@pitrou.net> <51378225.9040901@mrabarnett.plus.com> <5137A50D.70504@mrabarnett.plus.com> <5137C5F6.3070109@mrabarnett.plus.com> Message-ID: On Wed, Mar 6, 2013 at 5:40 PM, MRAB wrote: > If there aren't capture groups, then you can use a DFA. You can use a variant of DFA even if there are capture groups, it just takes more effort. > The re and regex modules use NFA because of the various other features > required. I take it you mean that both the re and regex modules use backtracking search. I was asking whether or not it can reach the exponential time worst-case on regexps without capture groups. If the answer is no, as you seem to be implying, then how does it prevent a DOS attack? -- Devin From python at mrabarnett.plus.com Thu Mar 7 00:25:15 2013 From: python at mrabarnett.plus.com (MRAB) Date: Wed, 06 Mar 2013 23:25:15 +0000 Subject: [Python-ideas] Updated PEP 428 (pathlib) In-Reply-To: References: <20130303005129.26eb0e00@pitrou.net> <20130303114112.403019c6@pitrou.net> <51378225.9040901@mrabarnett.plus.com> <5137A50D.70504@mrabarnett.plus.com> <5137C5F6.3070109@mrabarnett.plus.com> Message-ID: <5137D05B.6050806@mrabarnett.plus.com> On 2013-03-06 22:49, Devin Jeanpierre wrote: > On Wed, Mar 6, 2013 at 5:40 PM, MRAB wrote: >> If there aren't capture groups, then you can use a DFA. > > You can use a variant of DFA even if there are capture groups, it just > takes more effort. > >> The re and regex modules use NFA because of the various other features >> required. > > I take it you mean that both the re and regex modules use backtracking search. > Yes. > I was asking whether or not it can reach the exponential time > worst-case on regexps without capture groups. If the answer is no, as > you seem to be implying, then how does it prevent a DOS attack? > You _can_ have catastrophic backtracking without capture groups. You've already seen an example in ".*a.*". It gets worse when you can have repeated repeats, for example "(?:.*)*". The difference with fnmatch is that you don't care _where_ various parts match (there are no capture groups), only _whether_ it matches, and then only whether or not _all_ of it matches. From tjreedy at udel.edu Thu Mar 7 02:22:55 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 06 Mar 2013 20:22:55 -0500 Subject: [Python-ideas] Updated PEP 434: IDLE Enhancement Exception In-Reply-To: References: Message-ID: <5137EBEF.1030009@udel.edu> On 3/5/2013 7:15 PM, Terry Reedy wrote: > I rewrote the PEP to better focus on the specific proposal and > motivation. I also refined the formatting and added something about > backwards compatibility with extensions. Below are my answers the issues raise by Nick on pydev in response to Todd's initial version, posted there before moving to python-ideas. > be specific about which parts of the code base are covered by the exception I added this to the summary -- everything in idlelib/*. > The rationale needs to be fleshed out a bit more along the lines of "IDLE is primarily used as an application that ships with Python, rather than as a library module used to build Python applications, that's why it is OK for a different standard to apply". Todd added something like this and I reworded to say much the same thing. > Mentioning the point about Linux distros splitting it out into a separate package would also be useful. I know nothing about Linux distros and so that is not a motivating point for me. But if given a sentence that you (Nick) consider valid, and a suggestion of where to put it, I will certainly add it. > no need for extensive cross-OS testing prior to commit, that's a key part of the role of the buildbots I removed much of the discussion about testing as the PEP does not propose to change the rules about test before commit. I already opened an issue about improving automatic testing. http://bugs.python.org/issue15392 > [sparse test suite] Perhaps something for the PEP to elaborate on before we declare open season on Idle improvements in bug fix releases. The alternative to automatic testing is testing by hand rather than no testing. I mentioned that backporting is extra work for developers, that this extra work might mean not backporting, and that both are true regardless of how an improvement might be classified. IDLE's current tests are all within /idlelib, just as tkinter tests live within /tkinter. If this PEP is approved, new tests can be backported without controversy. My understanding is that that is not true for coverage tests for other modules. Backporting a new test_idle.py in Lib/test, similar to test_tkinter.py, is not covered by this PEP. -- Terry Jan Reedy From python at mrabarnett.plus.com Thu Mar 7 03:26:28 2013 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 07 Mar 2013 02:26:28 +0000 Subject: [Python-ideas] Updated PEP 428 (pathlib) In-Reply-To: <51378225.9040901@mrabarnett.plus.com> References: <20130303005129.26eb0e00@pitrou.net> <20130303114112.403019c6@pitrou.net> <51378225.9040901@mrabarnett.plus.com> Message-ID: <5137FAD4.4080608@mrabarnett.plus.com> On 2013-03-06 17:51, MRAB wrote: > On 2013-03-06 17:08, Charles-Fran?ois Natali wrote: >>>>>> fnmatch.fnmatch('a'*50, '*a*'*50) # weird how the pattern/string order is reversed from re.match >>> >>> That will take about 200 years to complete with CPython. Maybe a >>> little less, if you're running a particularly fast computer. ;) >>> >>> Is that the sort of DoS issue you are looking for? >> >> Exactly (the complexity of a typical ad-hoc fnmatch() implementation >> is the reason some servers like vsftpd use their own version, and it's >> even worse with a regex-based implementation as you notice). >> >> Now, the question is whether we want to try to mitigate this or not... >> > It's not something I've ever used, but it doesn't look that difficult > compared to regex if all it has is "*", "?", "[...]" and "[!...]". > Here's a very simple, all-Python, implementation I've just cooked up: def fnmatch(name, pattern): positions = [(0, 0)] while positions: name_pos, pattern_pos = positions.pop() if pattern_pos >= len(pattern): if name_pos >= len(name): return True elif pattern[pattern_pos] == '*': if pattern_pos == len(pattern) - 1: return True positions.append((name_pos, pattern_pos + 1)) if name_pos < len(name): positions.append((name_pos + 1, pattern_pos)) elif pattern[pattern_pos] == '?': if name_pos < len(name): positions.append((name_pos + 1, pattern_pos + 1)) elif pattern[pattern_pos] == '[': if name_pos < len(name): negative = pattern[pattern_pos + 1] == "!" pattern_pos += 2 if negative else 1 close_pos = pattern.find(']', pattern_pos) if close_pos >= 0: if (name[name_pos] in pattern[pattern_pos : close_pos]) != negative: positions.append((name_pos + 1, close_pos + 1)) elif name_pos < len(name) and name[name_pos] == pattern[pattern_pos]: positions.append((name_pos + 1, pattern_pos + 1)) return False From jeanpierreda at gmail.com Thu Mar 7 04:23:57 2013 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Wed, 6 Mar 2013 22:23:57 -0500 Subject: [Python-ideas] Updated PEP 428 (pathlib) In-Reply-To: <5137D05B.6050806@mrabarnett.plus.com> References: <20130303005129.26eb0e00@pitrou.net> <20130303114112.403019c6@pitrou.net> <51378225.9040901@mrabarnett.plus.com> <5137A50D.70504@mrabarnett.plus.com> <5137C5F6.3070109@mrabarnett.plus.com> <5137D05B.6050806@mrabarnett.plus.com> Message-ID: On Wed, Mar 6, 2013 at 6:25 PM, MRAB wrote: > You _can_ have catastrophic backtracking without capture groups. You've > already seen an example in ".*a.*". > > It gets worse when you can have repeated repeats, for example "(?:.*)*". > > The difference with fnmatch is that you don't care _where_ various > parts match (there are no capture groups), only _whether_ it matches, > and then only whether or not _all_ of it matches. We seem to be talking past each other. I already know all this. I am asking you to justify your claim that if glob was based on regex, instead of re, it would be free of DOS attacks. Because of your confusion, I expect you didn't really mean to claim that. I inferred it because when you were asked for an approach that would solve DOS attacks against glob, you replied by saying that you wrote a regex module that is more resistant to such things. I apologize if I misunderstood. -- Devin From jeanpierreda at gmail.com Thu Mar 7 04:48:29 2013 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Wed, 6 Mar 2013 22:48:29 -0500 Subject: [Python-ideas] Updated PEP 428 (pathlib) In-Reply-To: <5137FAD4.4080608@mrabarnett.plus.com> References: <20130303005129.26eb0e00@pitrou.net> <20130303114112.403019c6@pitrou.net> <51378225.9040901@mrabarnett.plus.com> <5137FAD4.4080608@mrabarnett.plus.com> Message-ID: On Wed, Mar 6, 2013 at 9:26 PM, MRAB wrote: >> It's not something I've ever used, but it doesn't look that difficult >> compared to regex if all it has is "*", "?", "[...]" and "[!...]". >> > Here's a very simple, all-Python, implementation I've just cooked up: --snip-- Because positions is never culled of duplicate states, this suffers the exact same problem. In this case, fnmatch('a'*50, '*a*'*50) returns in 6500 years instead of 200. If you want to solve it, you should either affix it with a memoization cache, or use Thompson's algorithm instead of backtracking search. Since this isn't recursive, memoization is a bit annoying, though, so instead I modified it below to use Thompson's algorithm on NFA (with no error checking though): def fnmatch(name, pattern): positions = {0} for char in name: new_positions = set() for pattern_pos in positions: if pattern_pos >= len(pattern): continue pattern_char = pattern[pattern_pos] if pattern_char == '*': if pattern_pos == len(pattern) - 1: return True new_positions.update([pattern_pos, pattern_pos + 1]) elif pattern_char == '?': new_positions.add(pattern_pos + 1) elif pattern[pattern_pos] == '[': negative = pattern[pattern_pos + 1] == "!" pattern_pos += 2 if negative else 1 close_pos = pattern.index(']', pattern_pos) if (char in pattern[pattern_pos : close_pos]) != negative: new_positions.add(close_pos + 1) elif char == pattern_char: new_positions.add(pattern_pos + 1) positions = new_positions return len(pattern) in positions Backseatingly yours, -- Devin From ericsnowcurrently at gmail.com Thu Mar 7 08:10:20 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 7 Mar 2013 00:10:20 -0700 Subject: [Python-ideas] class-only methods without using metaclasses Message-ID: This tweet from Raymond helped distill something that was already on my mind of late: https://twitter.com/raymondh/status/309442149588533248 For some uses of @classmethod it makes sense to expose the methods on instances too. On others I just can't see it. In those cases, using @classmethod is justifiably a practical substitute for putting the methods on a metaclass. In my mind "alternate constructors" fall into this category. Would it be worth trying to get the best of both worlds (not exposed on instances but without metaclasses)? I can imagine providing a classmethod-like decorator that does this and have an implementation below. One benefit to not exposing the class-only methods on instances is that they don't clutter the instance namespace nor run the risk of colliding with instance-specific names. Thoughts? -eric ----------------------------------------------------------------------- class classonlymethod: """Like a classmethod but does not show up on instances. This is an alternative to putting the methods on a metaclass. It is especially meaningful for alternate constructors. """ # XXX or "metamethod" def __init__(self, method): self.method = method self.descr = classmethod(method) def __get__(self, obj, cls): name = self.method.__name__ getattr_static = inspect.getattr_static if obj is not None: # look up the attribute, but skip cls dummy = type(cls.__name__, cls.__bases__, {}) attr = getattr_static(dummy(), name, NOTSET) getter = getattr_static(attr, '__get__', None) # try data descriptors if (getter and getattr_static(attr, '__set__', False)): return getter(attr, obj, cls) # try the instance try: instance_dict = object.__getattribute__(obj, "__dict__") except AttributeError: pass else: try: return dict.__getitem__(instance_dict, name) except KeyError: pass # try non-data descriptors if getter is not None: return getter(attr, obj, cls) raise AttributeError(name) else: descr = vars(self)['descr'] return descr.__get__(obj, cls) From ncoghlan at gmail.com Thu Mar 7 09:21:47 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 7 Mar 2013 18:21:47 +1000 Subject: [Python-ideas] class-only methods without using metaclasses In-Reply-To: References: Message-ID: On Thu, Mar 7, 2013 at 5:10 PM, Eric Snow wrote: > Thoughts? It's too much additional complexity to resolve a largely theoretical problem. Since class methods can be shadowed in instances, the fact they're accessible through the instances really doesn't hurt anything, and the distinction between a class method and a class only method would be too subtle to easily explain to anyone not already steeped in the details of descriptors and metaclasses. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From cf.natali at gmail.com Thu Mar 7 09:23:09 2013 From: cf.natali at gmail.com (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=) Date: Thu, 7 Mar 2013 09:23:09 +0100 Subject: [Python-ideas] Updated PEP 428 (pathlib) In-Reply-To: References: <20130303005129.26eb0e00@pitrou.net> <20130303114112.403019c6@pitrou.net> <51378225.9040901@mrabarnett.plus.com> <5137FAD4.4080608@mrabarnett.plus.com> Message-ID: This looks really promising. Could one of you open an issue on the tracker and attach a patch? Note that there's a problem with a current implementation: AssertionError: False is not true : expected 'abc' to match pattern '???*' From jeanpierreda at gmail.com Thu Mar 7 10:01:08 2013 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Thu, 7 Mar 2013 04:01:08 -0500 Subject: [Python-ideas] Updated PEP 428 (pathlib) In-Reply-To: References: <20130303005129.26eb0e00@pitrou.net> <20130303114112.403019c6@pitrou.net> <51378225.9040901@mrabarnett.plus.com> <5137FAD4.4080608@mrabarnett.plus.com> Message-ID: On Thu, Mar 7, 2013 at 3:23 AM, Charles-Fran?ois Natali wrote: > This looks really promising. > Could one of you open an issue on the tracker and attach a patch? The hard work is in making this a C extension module, which I don't know much about. I could do it this weekend if there's any chance at all it'd be accepted. > Note that there's a problem with a current implementation: > AssertionError: False is not true : expected 'abc' to match pattern '???*' My bad. I always make that mistake. :( --- def eps_closure(pattern, poses): for pos in poses: while pos < len(pattern) and pattern[pos] == '*': yield pos pos += 1 yield pos def fnmatch(name, pattern): positions = set(eps_closure(pattern, {0})) for char in name: new_positions = set() for pattern_pos in positions: if pattern_pos >= len(pattern): continue pattern_char = pattern[pattern_pos] if pattern_char == '*': if pattern_pos == len(pattern) - 1: return True new_positions.update([pattern_pos, pattern_pos + 1]) elif pattern_char == '?': new_positions.add(pattern_pos + 1) elif pattern[pattern_pos] == '[': negative = pattern[pattern_pos + 1] == "!" pattern_pos += 2 if negative else 1 close_pos = pattern.index(']', pattern_pos) if (char in pattern[pattern_pos : close_pos]) != negative: new_positions.add(close_pos + 1) elif char == pattern_char: new_positions.add(pattern_pos + 1) positions = set(eps_closure(pattern, new_positions)) return len(pattern) in positions -- Devin From solipsis at pitrou.net Thu Mar 7 11:24:33 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 7 Mar 2013 11:24:33 +0100 Subject: [Python-ideas] Updated PEP 428 (pathlib) References: <20130303005129.26eb0e00@pitrou.net> <20130303114112.403019c6@pitrou.net> <20130305082350.37b18c22@pitrou.net> Message-ID: <20130307112433.2fc83b10@pitrou.net> Le Tue, 05 Mar 2013 16:58:57 +0200, Serhiy Storchaka a ?crit : > On 05.03.13 09:23, Antoine Pitrou wrote: > > On Tue, 05 Mar 2013 00:33:48 +0100 > > Jan Kaliszewski wrote: > >> 1. Ad: > >> >>> PurePosixPath('/usr/bin/python').relative('/etc') > >> Traceback (most recent call last): > >> ... > >> ValueError: ... > >> > >> Shouldn't this particular operation return > >> "PurePosixPath('/etc/../usr/bin/python')"? > > > > Think what happens if /etc is a symlink to /var/etc. > > (not very likely to happen for /etc, but likely to happen in the > > general case) > > posixpath.relpath('/usr/bin/python', '/etc') returns > '../usr/bin/python'. Perhaps pathlib should have an option to provide > such compatible behavior. I don't think so, since the behaviour is broken in the first place. > P.S. Pathlib implementation has relative_to() method. relative() > method exists too but looks as unrelated. Not in the "pep428" branch. Regards Antoine. From solipsis at pitrou.net Thu Mar 7 11:28:17 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 7 Mar 2013 11:28:17 +0100 Subject: [Python-ideas] Length hinting and preallocation for container types References: <513625E8.4060201@python.org> <876215jmg1.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20130307112817.211fd496@pitrou.net> Le Wed, 06 Mar 2013 11:23:26 +0900, "Stephen J. Turnbull" a ?crit : > > Not at all coincidentally. As a compiler writer (which he refers to > in the slides several times) he is offended by poor performance, as > measured in CPU/invocation. Good for him! I'm glad he's working on > PyPy! But when compiler writers start talking language design for > performance, we inevitably end up with C<0.5 wink/>. I think that's a strong argument indeed. > > > One of his assertions is about memory (re)allocation. C is faster > > > in some cases because C code usually does fewer allocations and > > > reallocations. Python has no API to easily reallocate a list with > > > 100 items. > > And it shouldn't. > > But why do you think allocation is slow in the general case? Sure, it > involves system calls which indeed do slow things down. It depends what one calls a system call. A memory allocation shouldn't always incur a call to the kernel. Library calls can be quite fast. Moreover, in the CPython case, there's also a custom allocator which handles all allocation requests smaller than 512 bytes. (I'm sure PyPy has their own allocator too) Regards Antoine. From python at mrabarnett.plus.com Thu Mar 7 12:50:31 2013 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 07 Mar 2013 11:50:31 +0000 Subject: [Python-ideas] Updated PEP 428 (pathlib) In-Reply-To: References: <20130303005129.26eb0e00@pitrou.net> <20130303114112.403019c6@pitrou.net> <51378225.9040901@mrabarnett.plus.com> <5137A50D.70504@mrabarnett.plus.com> <5137C5F6.3070109@mrabarnett.plus.com> <5137D05B.6050806@mrabarnett.plus.com> Message-ID: <51387F07.1010804@mrabarnett.plus.com> On 07/03/2013 03:23, Devin Jeanpierre wrote: > On Wed, Mar 6, 2013 at 6:25 PM, MRAB wrote: >> You _can_ have catastrophic backtracking without capture groups. You've >> already seen an example in ".*a.*". >> >> It gets worse when you can have repeated repeats, for example "(?:.*)*". >> >> The difference with fnmatch is that you don't care _where_ various >> parts match (there are no capture groups), only _whether_ it matches, >> and then only whether or not _all_ of it matches. > > We seem to be talking past each other. I already know all this. I am > asking you to justify your claim that if glob was based on regex, > instead of re, it would be free of DOS attacks. > > Because of your confusion, I expect you didn't really mean to claim > that. I inferred it because when you were asked for an approach that > would solve DOS attacks against glob, you replied by saying that you > wrote a regex module that is more resistant to such things. I > apologize if I misunderstood. > I didn't say that it should be based on regex. What I meant was that it didn't seem that difficult compared to the regex module. That module is more resistant to catastrophic backtracking and some of its tricks could be used for the much simpler fnmatch to make a new implementation of _that_ more resistant to the problem. I'm currently thinking about the details. From solipsis at pitrou.net Thu Mar 7 12:58:32 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 7 Mar 2013 12:58:32 +0100 Subject: [Python-ideas] Updated PEP 428 (pathlib) References: <20130303005129.26eb0e00@pitrou.net> <20130303114112.403019c6@pitrou.net> <51378225.9040901@mrabarnett.plus.com> <5137A50D.70504@mrabarnett.plus.com> <5137C5F6.3070109@mrabarnett.plus.com> <5137D05B.6050806@mrabarnett.plus.com> <51387F07.1010804@mrabarnett.plus.com> Message-ID: <20130307125832.713fc6d9@pitrou.net> Le Thu, 07 Mar 2013 11:50:31 +0000, MRAB a ?crit : > I didn't say that it should be based on regex. What I meant was that > it didn't seem that difficult compared to the regex module. > > That module is more resistant to catastrophic backtracking and some of > its tricks could be used for the much simpler fnmatch to make a new > implementation of _that_ more resistant to the problem. I'm currently > thinking about the details. Keep in mind it shouldn't slow down the general (non-hostile) use case. Regards Antoine. From storchaka at gmail.com Thu Mar 7 15:30:20 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 07 Mar 2013 16:30:20 +0200 Subject: [Python-ideas] Length hinting and preallocation for container types In-Reply-To: <51364C96.4090509@stoneleaf.us> References: <513625E8.4060201@python.org> <20130305193031.GA3176@untibox.unti> <51364C96.4090509@stoneleaf.us> Message-ID: On 05.03.13 21:50, Ethan Furman wrote: > I suspect the new behavior would be most useful when you don't know > precisely how large the final list will be: overallocate (possibly by a > large margin), then __exit__ returns the unused portion back to the pool). A list comprehension can do this. From python at mrabarnett.plus.com Thu Mar 7 16:13:38 2013 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 07 Mar 2013 15:13:38 +0000 Subject: [Python-ideas] Updated PEP 428 (pathlib) In-Reply-To: <5137FAD4.4080608@mrabarnett.plus.com> References: <20130303005129.26eb0e00@pitrou.net> <20130303114112.403019c6@pitrou.net> <51378225.9040901@mrabarnett.plus.com> <5137FAD4.4080608@mrabarnett.plus.com> Message-ID: <5138AEA2.6010106@mrabarnett.plus.com> On 07/03/2013 02:26, MRAB wrote: > On 2013-03-06 17:51, MRAB wrote: >> On 2013-03-06 17:08, Charles-Fran?ois Natali wrote: >>>>>>> fnmatch.fnmatch('a'*50, '*a*'*50) # weird how the pattern/string order is reversed from re.match >>>> >>>> That will take about 200 years to complete with CPython. Maybe a >>>> little less, if you're running a particularly fast computer. ;) >>>> >>>> Is that the sort of DoS issue you are looking for? >>> >>> Exactly (the complexity of a typical ad-hoc fnmatch() implementation >>> is the reason some servers like vsftpd use their own version, and it's >>> even worse with a regex-based implementation as you notice). >>> >>> Now, the question is whether we want to try to mitigate this or not... >>> >> It's not something I've ever used, but it doesn't look that difficult >> compared to regex if all it has is "*", "?", "[...]" and "[!...]". >> > Here's a very simple, all-Python, implementation I've just cooked up: > [snip] And here's a new implementation: def fnmatch(name, pattern): saved_name_pos, saved_pattern_pos = 1, -1 name_pos, pattern_pos = 0, 0 while True: retry = False if pattern_pos >= len(pattern): if name_pos >= len(name): return True retry = True elif pattern[pattern_pos] == '*': saved_name_pos, saved_pattern_pos = name_pos + 1, pattern_pos pattern_pos += 1 elif pattern[pattern_pos] == '?': if name_pos < len(name): name_pos += 1 pattern_pos += 1 else: retry = True elif pattern[pattern_pos] == '[': if name_pos < len(name): negative = pattern[pattern_pos + 1] == "!" pattern_pos += 2 if negative else 1 close_pos = pattern.find(']', pattern_pos) if close_pos >= 0: if (name[name_pos] in pattern[pattern_pos : close_pos]) != negative: name_pos += 1 pattern_pos = close_pos + 1 else: retry = True else: retry = True else: retry = True elif name_pos < len(name) and name[name_pos] == pattern[pattern_pos]: name_pos += 1 pattern_pos += 1 else: retry = True if retry: if saved_pattern_pos < 0: return False name_pos, pattern_pos = saved_name_pos, saved_pattern_pos From cf.natali at gmail.com Thu Mar 7 16:28:50 2013 From: cf.natali at gmail.com (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=) Date: Thu, 7 Mar 2013 16:28:50 +0100 Subject: [Python-ideas] Updated PEP 428 (pathlib) In-Reply-To: <5138AEA2.6010106@mrabarnett.plus.com> References: <20130303005129.26eb0e00@pitrou.net> <20130303114112.403019c6@pitrou.net> <51378225.9040901@mrabarnett.plus.com> <5137FAD4.4080608@mrabarnett.plus.com> <5138AEA2.6010106@mrabarnett.plus.com> Message-ID: > And here's a new implementation: Thanks. Could you post this on the tracker? Patches sent to mailing lists tend to get lost, and we've already hijacked Antoine's thread long enough :-) From python at mrabarnett.plus.com Thu Mar 7 17:32:52 2013 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 07 Mar 2013 16:32:52 +0000 Subject: [Python-ideas] Updated PEP 428 (pathlib) In-Reply-To: References: <20130303005129.26eb0e00@pitrou.net> <20130303114112.403019c6@pitrou.net> <51378225.9040901@mrabarnett.plus.com> <5137FAD4.4080608@mrabarnett.plus.com> <5138AEA2.6010106@mrabarnett.plus.com> Message-ID: <5138C134.7040804@mrabarnett.plus.com> On 07/03/2013 15:28, Charles-Fran?ois Natali wrote: >> And here's a new implementation: > > Thanks. > Could you post this on the tracker? > Patches sent to mailing lists tend to get lost, and we've already > hijacked Antoine's thread long enough :-) > Done, and apologies to Antoine. :-) From ericsnowcurrently at gmail.com Thu Mar 7 20:35:59 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 7 Mar 2013 12:35:59 -0700 Subject: [Python-ideas] class-only methods without using metaclasses In-Reply-To: References: Message-ID: On Thu, Mar 7, 2013 at 1:21 AM, Nick Coghlan wrote: > and the distinction between a class method and a class only method > would be too subtle to easily explain to anyone not already steeped in > the details of descriptors and metaclasses. This is definitely the big reason why it's not worth it over just using a metaclass or even just sticking with classmethods, marginally imperfect as they are for the theoretical use case. -eric From jsbfox at gmail.com Thu Mar 7 21:41:40 2013 From: jsbfox at gmail.com (Thomas Allen) Date: Thu, 7 Mar 2013 15:41:40 -0500 Subject: [Python-ideas] Official MySQL module Message-ID: Hi, there! Do you plan to add an official module for connecting to MySQL databases? Existing third-party modules are bad-documented or no longer maintained... That's kinda strange, that such a nice language doesn't have it yet. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dustin at v.igoro.us Thu Mar 7 21:59:27 2013 From: dustin at v.igoro.us (Dustin J. Mitchell) Date: Thu, 7 Mar 2013 15:59:27 -0500 Subject: [Python-ideas] Official MySQL module In-Reply-To: References: Message-ID: On Thu, Mar 7, 2013 at 3:41 PM, Thomas Allen wrote: > Hi, there! Do you plan to add an official module for connecting to MySQL > databases? Existing third-party modules are bad-documented or no longer > maintained... That's kinda strange, that such a nice language doesn't have > it yet. Where would such a module come from? The PSF can't wave a magic "official" flag and will software into existence. Someone needs to write it. I suspect from your use of the term "third party", that you come from the world of proprietary software. In OSS, we're all mutual third parties. There are several nice MySQL bindings out there. Just about everyone uses Python-MySQL, but I've recently given my heart to PyMySQL, since it's pure python and thus a lot easier to install. If I recall from the SQLAlchemy docs, there are a few others out there. So I suspect that your basic premise is incorrect: there's lots of good options out there, and in fact several tools to abstract the differences between them (SQLAlchemy being my choice). I don't think the community would be well-served by selecting one as the default implementation. Dustin From mertz at gnosis.cx Thu Mar 7 22:16:26 2013 From: mertz at gnosis.cx (David Mertz) Date: Thu, 7 Mar 2013 13:16:26 -0800 Subject: [Python-ideas] Official MySQL module In-Reply-To: References: Message-ID: I disagree moderately with Dustin. Obviously, it is true that a magic wand doesn't produce a standard-library module. However, having support for MySQL/MariaDB (and PostgreSQL) in the standard library would be desirable. This would bring MySQL support to the same level as we have for SQLite3. In particular, I would NOT WANT such standard library support to include any ORM layer to it; I feel like those should remain as third-party tools (and compete on their various merits). But the basic level of providing a *binding* feels like something desirable (and specifically, a binding that was as close to a drop-in substitute for 'sqlite3' as possible). On Thu, Mar 7, 2013 at 12:59 PM, Dustin J. Mitchell wrote: > On Thu, Mar 7, 2013 at 3:41 PM, Thomas Allen wrote: > > Hi, there! Do you plan to add an official module for connecting to MySQL > > databases? Existing third-party modules are bad-documented or no longer > > maintained... That's kinda strange, that such a nice language doesn't > have > > it yet. > > Where would such a module come from? The PSF can't wave a magic > "official" flag and will software into existence. Someone needs to > write it. > > I suspect from your use of the term "third party", that you come from > the world of proprietary software. In OSS, we're all mutual third > parties. > > There are several nice MySQL bindings out there. Just about everyone > uses Python-MySQL, but I've recently given my heart to PyMySQL, since > it's pure python and thus a lot easier to install. If I recall from > the SQLAlchemy docs, there are a few others out there. So I suspect > that your basic premise is incorrect: there's lots of good options out > there, and in fact several tools to abstract the differences between > them (SQLAlchemy being my choice). I don't think the community would > be well-served by selecting one as the default implementation. > > Dustin > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Thu Mar 7 22:52:25 2013 From: guido at python.org (Guido van Rossum) Date: Thu, 7 Mar 2013 13:52:25 -0800 Subject: [Python-ideas] Official MySQL module In-Reply-To: References: Message-ID: On Thu, Mar 7, 2013 at 1:16 PM, David Mertz wrote: > I disagree moderately with Dustin. Obviously, it is true that a magic wand > doesn't produce a standard-library module. However, having support for > MySQL/MariaDB (and PostgreSQL) in the standard library would be desirable. > This would bring MySQL support to the same level as we have for SQLite3. > > In particular, I would NOT WANT such standard library support to include any > ORM layer to it; I feel like those should remain as third-party tools (and > compete on their various merits). But the basic level of providing a > *binding* feels like something desirable (and specifically, a binding that > was as close to a drop-in substitute for 'sqlite3' as possible). Well, the model should be PEP 249 (db-API 2.0), not sqlite3. -- --Guido van Rossum (python.org/~guido) From dustin at v.igoro.us Thu Mar 7 22:59:35 2013 From: dustin at v.igoro.us (Dustin J. Mitchell) Date: Thu, 7 Mar 2013 16:59:35 -0500 Subject: [Python-ideas] Official MySQL module In-Reply-To: References: Message-ID: On Thu, Mar 7, 2013 at 4:16 PM, David Mertz wrote: > I disagree moderately with Dustin. Obviously, it is true that a magic wand > doesn't produce a standard-library module. However, having support for > MySQL/MariaDB (and PostgreSQL) in the standard library would be desirable. > This would bring MySQL support to the same level as we have for SQLite3. Having a simple database engine in the stdlib is helpful -- it means that apps like Buildbot can use a database without requiring users to set up MySQL or Postgres. I think the Python community is actually better served by having an ecosystem of competing DBAPI implementations for other databases. If the stdlib blesses one, then there's very little point in the others continuing to exist. Would that be Python-MySQL (which is somewhat heavy and requires MySQL libraries, and judging from the sourceforge page the project is moribund and rudderless) or PyMySQL (which, much as I love it, is somewhat less performant, probably buggier, and still lacks some things you'd want for production use, like SO_KEEPALIVE). > In particular, I would NOT WANT such standard library support to include any > ORM layer to it; I feel like those should remain as third-party tools (and > compete on their various merits). But the basic level of providing a > *binding* feels like something desirable (and specifically, a binding that > was as close to a drop-in substitute for 'sqlite3' as possible). I didn't say anything about an ORM! SQLAlchemy has a fantastic query-formulation layer ("core") that manages to work around idiosyncracies with the various DBAPI implementations out there, while not trying to map any objects or relations. And as Guido says, any implementation -- in or out of the stdlib -- is based on PEP249. SQLAlchemy just wraps that with some syntax and compatibility hacks. Here's the list of supported dialects: http://docs.sqlalchemy.org/en/rel_0_8/dialects/ Dustin From greg at krypto.org Thu Mar 7 23:00:52 2013 From: greg at krypto.org (Gregory P. Smith) Date: Thu, 7 Mar 2013 14:00:52 -0800 Subject: [Python-ideas] Official MySQL module In-Reply-To: References: Message-ID: No database connector module should ever be part of the standard library unless that entire database is included as part of Python distributions. MySQL isn't part of Python so all mysql connector modules belong as third party things (perhaps as part of mysql itself if they wanted to get their act together). want sqlite? we bundle it. want something else? you have to install something else separately so you have to install its connector module separately as well. -gps On Thu, Mar 7, 2013 at 12:41 PM, Thomas Allen wrote: > Hi, there! Do you plan to add an official module for connecting to MySQL > databases? Existing third-party modules are bad-documented or no longer > maintained... That's kinda strange, that such a nice language doesn't have > it yet. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Thu Mar 7 23:29:39 2013 From: mertz at gnosis.cx (David Mertz) Date: Thu, 7 Mar 2013 14:29:39 -0800 Subject: [Python-ideas] Official MySQL module In-Reply-To: References: Message-ID: <9499D2B7-A3F0-4383-94AC-11C31DC47ED5@gnosis.cx> On Mar 7, 2013, at 2:00 PM, Gregory P. Smith wrote: > No database connector module should ever be part of the standard library unless that entire database is included as part of Python distributions. MySQL isn't part of Python so all mysql connector modules belong as third party things (perhaps as part of mysql itself if they wanted to get their act together). want sqlite? we bundle it. want something else? you have to install something else separately so you have to install its connector module separately as well. I think I'm convinced by Gregory here, and withdraw my previous opinion. It does feel strange to have a Python module in the standard "batteries-included" distribution simply do *nothing* if other software doesn't happen to be installed. Although, off the top of my head, 'webbrowser' is also exactly such a module (but one can argue that every end-user OS includes at least one such external tool, so it's a kind of "system service"). Obviously, as Guido points out, when I wrote 'sqlite3' I really meant 'db-API 2.0'. But I was just being concrete rather than abstract about it, since one is a module and the other a specification. I.e. it's conceivable that a program might change one line and work: # import sqlite3 as mydb import mysql as mydb But there's no line in a Python program like: import db_api20 as mydb (well, at least two lines really, there's not way the '.connect()' can be DB-independent). -- Dred Scott 1857; Santa Clara 1886; Plessy 1892; Korematsu 1944; Eldred 2003 From eric at trueblade.com Thu Mar 7 22:41:30 2013 From: eric at trueblade.com (Eric V. Smith) Date: Thu, 07 Mar 2013 16:41:30 -0500 Subject: [Python-ideas] Official MySQL module In-Reply-To: References: Message-ID: <5139098A.2090708@trueblade.com> On 3/7/2013 4:16 PM, David Mertz wrote: > I disagree moderately with Dustin. Obviously, it is true that a magic > wand doesn't produce a standard-library module. However, having support > for MySQL/MariaDB (and PostgreSQL) in the standard library would be > desirable. This would bring MySQL support to the same level as we have > for SQLite3. > > In particular, I would NOT WANT such standard library support to include > any ORM layer to it; I feel like those should remain as third-party > tools (and compete on their various merits). But the basic level of > providing a *binding* feels like something desirable (and specifically, > a binding that was as close to a drop-in substitute for 'sqlite3' as > possible). I agree with David on both points. - Surely a MySQL binding is a battery we should consider including, if an author were to offer it to us. I have no experience with them, so I can't offer any advice on which is best. - We don't want to include an ORM. It seems this space is still evolving rapidly. At least, every time I upgrade SQLAlchemy (which I love) it breaks some code. Eric. > > > On Thu, Mar 7, 2013 at 12:59 PM, Dustin J. Mitchell > wrote: > > On Thu, Mar 7, 2013 at 3:41 PM, Thomas Allen > wrote: > > Hi, there! Do you plan to add an official module for connecting to > MySQL > > databases? Existing third-party modules are bad-documented or no > longer > > maintained... That's kinda strange, that such a nice language > doesn't have > > it yet. > > Where would such a module come from? The PSF can't wave a magic > "official" flag and will software into existence. Someone needs to > write it. > > I suspect from your use of the term "third party", that you come from > the world of proprietary software. In OSS, we're all mutual third > parties. > > There are several nice MySQL bindings out there. Just about everyone > uses Python-MySQL, but I've recently given my heart to PyMySQL, since > it's pure python and thus a lot easier to install. If I recall from > the SQLAlchemy docs, there are a few others out there. So I suspect > that your basic premise is incorrect: there's lots of good options out > there, and in fact several tools to abstract the differences between > them (SQLAlchemy being my choice). I don't think the community would > be well-served by selecting one as the default implementation. > > Dustin > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > > > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- Eric. From greg.ewing at canterbury.ac.nz Fri Mar 8 00:02:11 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 08 Mar 2013 12:02:11 +1300 Subject: [Python-ideas] class-only methods without using metaclasses In-Reply-To: References: Message-ID: <51391C73.4030906@canterbury.ac.nz> Nick Coghlan wrote: > It's too much additional complexity to resolve a largely theoretical > problem. In Python 2 you could get class-only methods like this: class Foo(object): class __metaclass__(type): def classmeth(cls): ... I'm mildly disappointed that this can't be done any more in Python 3. Sometimes you need genuine metaclass methods, e.g. __xxx__ methods for a class rather than an instance. -- Greg From greg.ewing at canterbury.ac.nz Fri Mar 8 00:16:34 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 08 Mar 2013 12:16:34 +1300 Subject: [Python-ideas] Updated PEP 428 (pathlib) In-Reply-To: <51387F07.1010804@mrabarnett.plus.com> References: <20130303005129.26eb0e00@pitrou.net> <20130303114112.403019c6@pitrou.net> <51378225.9040901@mrabarnett.plus.com> <5137A50D.70504@mrabarnett.plus.com> <5137C5F6.3070109@mrabarnett.plus.com> <5137D05B.6050806@mrabarnett.plus.com> <51387F07.1010804@mrabarnett.plus.com> Message-ID: <51391FD2.3090705@canterbury.ac.nz> MRAB wrote: > That module is more resistant to catastrophic backtracking and some of > its tricks could be used for the much simpler fnmatch to make a new > implementation of _that_ more resistant to the problem. It shouldn't be necessary to use tricks. A glob pattern describes a regular language, which can always be parsed using a DFA with no backtracking at all. -- Greg From ncoghlan at gmail.com Fri Mar 8 00:20:34 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 8 Mar 2013 09:20:34 +1000 Subject: [Python-ideas] Official MySQL module In-Reply-To: <5139098A.2090708@trueblade.com> References: <5139098A.2090708@trueblade.com> Message-ID: On Fri, Mar 8, 2013 at 7:41 AM, Eric V. Smith wrote: > On 3/7/2013 4:16 PM, David Mertz wrote: >> I disagree moderately with Dustin. Obviously, it is true that a magic >> wand doesn't produce a standard-library module. However, having support >> for MySQL/MariaDB (and PostgreSQL) in the standard library would be >> desirable. This would bring MySQL support to the same level as we have >> for SQLite3. >> >> In particular, I would NOT WANT such standard library support to include >> any ORM layer to it; I feel like those should remain as third-party >> tools (and compete on their various merits). But the basic level of >> providing a *binding* feels like something desirable (and specifically, >> a binding that was as close to a drop-in substitute for 'sqlite3' as >> possible). > > I agree with David on both points. > > - Surely a MySQL binding is a battery we should consider including, if > an author were to offer it to us. I have no experience with them, so I > can't offer any advice on which is best. We've learned the hard way that including database bindings in the standard library is a bad idea, as development on those bindings needs to be synchronised with the release cycle of the corresponding database server, not the Python release cycle. The bsddb bindings in Python 2 were a neverending source of trouble, which is why they were booted out to PyPI for Python 3. sqlite3 is different (and more appropriate for the standard library), as it's a wrapper around a file format rather than a binding to an active database server. The number one question to ask about candidates for standard library inclusion is "Does it make sense for this module to receive new features only once every 18-24 months, and only when you upgrade to a new version of Python?". Interfaces to specific external services (including databases) almost never pass that test. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Fri Mar 8 00:24:35 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 8 Mar 2013 09:24:35 +1000 Subject: [Python-ideas] class-only methods without using metaclasses In-Reply-To: <51391C73.4030906@canterbury.ac.nz> References: <51391C73.4030906@canterbury.ac.nz> Message-ID: On Fri, Mar 8, 2013 at 9:02 AM, Greg Ewing wrote: > Nick Coghlan wrote: >> >> It's too much additional complexity to resolve a largely theoretical >> problem. > > > In Python 2 you could get class-only methods like this: > > class Foo(object): > > class __metaclass__(type): > > def classmeth(cls): > ... > > I'm mildly disappointed that this can't be done any more > in Python 3. Sometimes you need genuine metaclass methods, > e.g. __xxx__ methods for a class rather than an instance. You can still do that, you just have to define the metaclass in advance: class FooMeta(type): def classmeth(cls): ... class Foo(metaclass=FooMeta): ... This is the price we pay for allowing metaclasses to customise the namespace used to execute the class body in __prepare__. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From stephen at xemacs.org Fri Mar 8 00:16:59 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 08 Mar 2013 08:16:59 +0900 Subject: [Python-ideas] Length hinting and preallocation for container types In-Reply-To: <20130307112817.211fd496@pitrou.net> References: <513625E8.4060201@python.org> <876215jmg1.fsf@uwakimon.sk.tsukuba.ac.jp> <20130307112817.211fd496@pitrou.net> Message-ID: <87ip52iyvo.fsf@uwakimon.sk.tsukuba.ac.jp> Antoine Pitrou writes: > Le Wed, 06 Mar 2013 11:23:26 +0900, > "Stephen J. Turnbull" > a ?crit : > > > > Not at all coincidentally. As a compiler writer (which he refers to > > in the slides several times) he is offended by poor performance, as > > measured in CPU/invocation. Good for him! I'm glad he's working on > > PyPy! But when compiler writers start talking language design for > > performance, we inevitably end up with C<0.5 wink/>. > > I think that's a strong argument indeed. Thank you, but it's actually a fallacy (argumentem ad hominem). I think it's true that Alex's complaints and Christian's proposals are colored by that tendency, but it deserves discussion in terms of the proposals themselves. (Thus, the 0.5 wink. I maybe should have deleted it.) > > But why do you think allocation is slow in the general case? Sure, it > > involves system calls which indeed do slow things down. > > It depends what one calls a system call. A memory allocation shouldn't > always incur a call to the kernel. Library calls can be quite fast. If the process itself is growing, eventually it does. I don't know what best practice is, do libraries try to arrange for bounded time cost of allocation by default? And are best practices universally implemented? What about user tuning of the allocator that happens to be a pessimization for Python? As a cross-platform application, Python needs to worry about that. And I suspect that Alex (and Christian!) would consider saving a few CPU cycles in an inner loop very important. My point is that I don't consider that appropriate for user-visible language features in Python. I realize that performance-oriented applications will chafe at the bureaucracy necessary to get optimizations like comprehensions into the language, though. I just think it's a necessary sacrifice (but it's not my sacrifice, I admit). From guido at python.org Fri Mar 8 00:50:49 2013 From: guido at python.org (Guido van Rossum) Date: Thu, 7 Mar 2013 15:50:49 -0800 Subject: [Python-ideas] Official MySQL module In-Reply-To: References: <5139098A.2090708@trueblade.com> Message-ID: On Thu, Mar 7, 2013 at 3:20 PM, Nick Coghlan wrote: > On Fri, Mar 8, 2013 at 7:41 AM, Eric V. Smith wrote: >> On 3/7/2013 4:16 PM, David Mertz wrote: >>> I disagree moderately with Dustin. Obviously, it is true that a magic >>> wand doesn't produce a standard-library module. However, having support >>> for MySQL/MariaDB (and PostgreSQL) in the standard library would be >>> desirable. This would bring MySQL support to the same level as we have >>> for SQLite3. >>> >>> In particular, I would NOT WANT such standard library support to include >>> any ORM layer to it; I feel like those should remain as third-party >>> tools (and compete on their various merits). But the basic level of >>> providing a *binding* feels like something desirable (and specifically, >>> a binding that was as close to a drop-in substitute for 'sqlite3' as >>> possible). >> >> I agree with David on both points. >> >> - Surely a MySQL binding is a battery we should consider including, if >> an author were to offer it to us. I have no experience with them, so I >> can't offer any advice on which is best. > > We've learned the hard way that including database bindings in the > standard library is a bad idea, as development on those bindings needs > to be synchronised with the release cycle of the corresponding > database server, not the Python release cycle. The bsddb bindings in > Python 2 were a neverending source of trouble, which is why they were > booted out to PyPI for Python 3. Hm. It's true that bsddb was a neverending nightmare. But AFAIR that was due to the bsddb C bindings changing regularly. > sqlite3 is different (and more appropriate for the standard library), > as it's a wrapper around a file format rather than a binding to an > active database server. Hardly. SQLite3 is probably as complex as bsddb. It is just better maintaned, and its authors have an incredible dedication to backwards compatibility at the C API level. (Watch this talk: https://www.youtube.com/watch?v=jN_YdMdjVpU ) Honestly, I don't know what the status of the C bindings for MySQL is, but it might well be similar. MySQL is a pretty mature product. (And I don't agree with Greg P Smith's complete rejection of including anything in the stdlib that we don't distribute. There are certainly plenty of optional dependencies, from Tcl/Tk to GNU readline.) > The number one question to ask about candidates for standard library > inclusion is "Does it make sense for this module to receive new > features only once every 18-24 months, and only when you upgrade to a > new version of Python?". Interfaces to specific external services > (including databases) almost never pass that test. I wouldn't mind if someone looked into this in depth for MySQL and Postgres. It's been a while since we last looked at this. If the answer is different for MySQL than for Postgres that shouldn't stop us from including one but not the other. Agreed on the "no ORM in the stdlib" position. -- --Guido van Rossum (python.org/~guido) From ericsnowcurrently at gmail.com Fri Mar 8 02:22:34 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 7 Mar 2013 18:22:34 -0700 Subject: [Python-ideas] class-only methods without using metaclasses In-Reply-To: <51391C73.4030906@canterbury.ac.nz> References: <51391C73.4030906@canterbury.ac.nz> Message-ID: On Thu, Mar 7, 2013 at 4:02 PM, Greg Ewing wrote: > In Python 2 you could get class-only methods like this: > Keep in mind that there is a (perhaps subtle) difference between methods on a metaclass and the class-only methods for which I was advocating. When dealing with metaclasses you have to consider things like metaclass conflicts as well as metaclass inheritance. Also, like classmethods, a class-only method is looked up on the instance's MRO and not on the class's MRO (as happens with metaclass methods). The difference isn't that big a deal in practice, though. -eric From mal at egenix.com Fri Mar 8 09:06:40 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 08 Mar 2013 09:06:40 +0100 Subject: [Python-ideas] Official MySQL module In-Reply-To: References: <5139098A.2090708@trueblade.com> Message-ID: <51399C10.2090409@egenix.com> On 08.03.2013 00:20, Nick Coghlan wrote: > On Fri, Mar 8, 2013 at 7:41 AM, Eric V. Smith wrote: > The number one question to ask about candidates for standard library > inclusion is "Does it make sense for this module to receive new > features only once every 18-24 months, and only when you upgrade to a > new version of Python?". Interfaces to specific external services > (including databases) almost never pass that test. Agreed. The reason why we included sqlite support in the stdlib was to have an easy to use and readily available Python DB-API 2.0 compatible database in Python, so that people can learn how to use SQL databases, implement small projects with it and then upgrade to one of the many client-server databases out there, if they need to. It serves that function well, esp. since most systems come with SQLite pre-installed. On Windows, we even include SQLite together with the Python installation. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 07 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From solipsis at pitrou.net Fri Mar 8 11:07:31 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 8 Mar 2013 11:07:31 +0100 Subject: [Python-ideas] Length hinting and preallocation for container types References: <513625E8.4060201@python.org> <876215jmg1.fsf@uwakimon.sk.tsukuba.ac.jp> <20130307112817.211fd496@pitrou.net> <87ip52iyvo.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20130308110731.45ff8bac@pitrou.net> Le Fri, 08 Mar 2013 08:16:59 +0900, "Stephen J. Turnbull" a ?crit : > > > > But why do you think allocation is slow in the general case? > > > Sure, it involves system calls which indeed do slow things down. > > > > It depends what one calls a system call. A memory allocation > > shouldn't always incur a call to the kernel. Library calls can be > > quite fast. > > If the process itself is growing, eventually it does. I think most allocators would request big chunks of memory from the kernel, and then carve out the small blocks requested by the user from that. Therefore, my intuition is that a long-running process, if not leaky, should end up not stressing mmap / sbrk system calls too much. > And I suspect that Alex (and Christian!) would consider saving a few > CPU cycles in an inner loop very important. They probably would, but that is not a design point for Python. Regards Antoine. From solipsis at pitrou.net Fri Mar 8 11:13:50 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 8 Mar 2013 11:13:50 +0100 Subject: [Python-ideas] with ... except Message-ID: <20130308111350.51baa3f3@pitrou.net> Hello, A common pattern for me is to write a with statement for resource cleanup, but also handle specific errors after that. Right now, this is a bit cumbersome: try: with open("somefile", "rb)" as f: ... except FileNotFoundError: # do something else, perhaps actually create the file or: try: with transaction.commit_on_success(): ... except ObjectDoesNotExist: # do something else, perhaps clean up some internal cache How about adding syntax sugar for the above, in the form of a with ... except clause? It would nicely reduce spurious indentation, as with the try / except / finally which, long ago(!), helped reduce indentation and typing by removing the need to nest a try / except inside a try / finally. Regards Antoine. From masklinn at masklinn.net Fri Mar 8 11:24:02 2013 From: masklinn at masklinn.net (Masklinn) Date: Fri, 8 Mar 2013 11:24:02 +0100 Subject: [Python-ideas] with ... except In-Reply-To: <20130308111350.51baa3f3@pitrou.net> References: <20130308111350.51baa3f3@pitrou.net> Message-ID: On 2013-03-08, at 11:13 , Antoine Pitrou wrote: > Hello, > > A common pattern for me is to write a with statement for resource > cleanup, but also handle specific errors after that. Right now, this is > a bit cumbersome: > > try: > with open("somefile", "rb)" as f: > ... > except FileNotFoundError: > # do something else, perhaps actually create the file > > or: > > try: > with transaction.commit_on_success(): > ... > except ObjectDoesNotExist: > # do something else, perhaps clean up some internal cache > > > How about adding syntax sugar for the above, in the form of a with ... > except clause? It would nicely reduce spurious indentation, as with > the try / except / finally which, long ago(!), helped reduce > indentation and typing by removing the need to nest a try / except > inside a try / finally. Isn't it essentially the same suggestion as Alan Johnson's last week? http://mail.python.org/pipermail/python-ideas/2013-March/019730.html From solipsis at pitrou.net Fri Mar 8 11:33:42 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 8 Mar 2013 11:33:42 +0100 Subject: [Python-ideas] with ... except References: <20130308111350.51baa3f3@pitrou.net> Message-ID: <20130308113342.5f54c381@pitrou.net> Le Fri, 8 Mar 2013 11:24:02 +0100, Masklinn a ?crit : > On 2013-03-08, at 11:13 , Antoine Pitrou wrote: > > > Hello, > > > > A common pattern for me is to write a with statement for resource > > cleanup, but also handle specific errors after that. Right now, > > this is a bit cumbersome: > > > > try: > > with open("somefile", "rb)" as f: > > ... > > except FileNotFoundError: > > # do something else, perhaps actually create the file > > > > or: > > > > try: > > with transaction.commit_on_success(): > > ... > > except ObjectDoesNotExist: > > # do something else, perhaps clean up some internal cache > > > > > > How about adding syntax sugar for the above, in the form of a > > with ... except clause? It would nicely reduce spurious > > indentation, as with the try / except / finally which, long ago(!), > > helped reduce indentation and typing by removing the need to nest a > > try / except inside a try / finally. > > Isn't it essentially the same suggestion as Alan Johnson's last week? > http://mail.python.org/pipermail/python-ideas/2013-March/019730.html Hmm, I hadn't read that thread. "try with" looked sufficiently ugly that I wasn't interested :-) But anyway it seems that discussion was conflating a lot of things. I would only like a shortcut for a "try ... except" around a "with", without any other sophistication. Seems "with" is essentially a "try ... finally", there seems to be a syntactic precedent already. And, yes, I actually want to catch exceptions raised *inside* the "with" block, not just by the "with" statement itself. The database example above makes it clear: I want the "with" to issue a ROLLBACK on an exception inside the block, *and* I want to handle the exception in a specific way after the ROLLBACK. Regards Antoine. From mark.hackett at metoffice.gov.uk Fri Mar 8 11:41:50 2013 From: mark.hackett at metoffice.gov.uk (Mark Hackett) Date: Fri, 8 Mar 2013 10:41:50 +0000 Subject: [Python-ideas] with ... except In-Reply-To: <20130308113342.5f54c381@pitrou.net> References: <20130308111350.51baa3f3@pitrou.net> <20130308113342.5f54c381@pitrou.net> Message-ID: <201303081041.50701.mark.hackett@metoffice.gov.uk> On Friday 08 Mar 2013, Antoine Pitrou wrote: > Le Fri, 8 Mar 2013 11:24:02 +0100, > > Masklinn a ?crit : > > On 2013-03-08, at 11:13 , Antoine Pitrou wrote: > > > Hello, > > > > > > A common pattern for me is to write a with statement for resource > > > cleanup, but also handle specific errors after that. Right now, > > > this is a bit cumbersome: > > > > > > try: > > > with open("somefile", "rb)" as f: > > > ... > > > except FileNotFoundError: > > > # do something else, perhaps actually create the file > > > > > > or: > > > > > > try: > > > with transaction.commit_on_success(): > > > ... > > > except ObjectDoesNotExist: > > > # do something else, perhaps clean up some internal cache > > > > > > > > > How about adding syntax sugar for the above, in the form of a > > > with ... except clause? It would nicely reduce spurious > > > indentation, as with the try / except / finally which, long ago(!), > > > helped reduce indentation and typing by removing the need to nest a > > > try / except inside a try / finally. > > > > Isn't it essentially the same suggestion as Alan Johnson's last week? > > http://mail.python.org/pipermail/python-ideas/2013-March/019730.html > > Hmm, I hadn't read that thread. "try with" looked sufficiently ugly > that I wasn't interested :-) > Ugh, someone is going to suggest we have "try ... without" now... And it's so angocentric. How come the calls are to repeat *English grammar*? German grammar would probably be a lot clearer for a compiler/interpreter. Of course, that wouldn't be possible if we still had the 80-column limit... :-) From steve at pearwood.info Fri Mar 8 11:45:42 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 08 Mar 2013 21:45:42 +1100 Subject: [Python-ideas] class-only methods without using metaclasses In-Reply-To: References: Message-ID: <5139C156.80200@pearwood.info> On 07/03/13 19:21, Nick Coghlan wrote: > On Thu, Mar 7, 2013 at 5:10 PM, Eric Snow wrote: >> Thoughts? > > It's too much additional complexity to resolve a largely theoretical > problem. Since class methods can be shadowed in instances, the fact > they're accessible through the instances really doesn't hurt anything, > and the distinction between a class method and a class only method > would be too subtle to easily explain to anyone not already steeped in > the details of descriptors and metaclasses. Surely that only applies to the implementation, not the concept itself? The interface is trivially easy to explain to anyone even half-way familiar with Python's OO model. Class methods are accessible from either the class or the instance, and receive the class (not the instance) as the first argument. Class-only methods are only accessible from the class. I'm not sure what use class-only methods are or what problems they solve, apart from a general dislike of being able to call classmethods from an instance. I can't think of any case where I would want to actively prohibit calling a classmethod from an instance, so I don't know that this actually solves any problems. But it looks interesting and I think Eric should put it up as a recipe on ActiveState. -- Steven From masklinn at masklinn.net Fri Mar 8 11:54:25 2013 From: masklinn at masklinn.net (Masklinn) Date: Fri, 8 Mar 2013 11:54:25 +0100 Subject: [Python-ideas] with ... except In-Reply-To: <20130308113342.5f54c381@pitrou.net> References: <20130308111350.51baa3f3@pitrou.net> <20130308113342.5f54c381@pitrou.net> Message-ID: <13AFCE2D-F3D9-4DCF-B7F1-0B959930CDF2@masklinn.net> On 2013-03-08, at 11:33 , Antoine Pitrou wrote: > Le Fri, 8 Mar 2013 11:24:02 +0100, > Masklinn a ?crit : >> On 2013-03-08, at 11:13 , Antoine Pitrou wrote: >>> Hello, >>> >>> A common pattern for me is to write a with statement for resource >>> cleanup, but also handle specific errors after that. Right now, >>> this is a bit cumbersome: >>> >>> try: >>> with open("somefile", "rb)" as f: >>> ... >>> except FileNotFoundError: >>> # do something else, perhaps actually create the file >>> >>> or: >>> >>> try: >>> with transaction.commit_on_success(): >>> ... >>> except ObjectDoesNotExist: >>> # do something else, perhaps clean up some internal cache >>> >>> >>> How about adding syntax sugar for the above, in the form of a >>> with ... except clause? It would nicely reduce spurious >>> indentation, as with the try / except / finally which, long ago(!), >>> helped reduce indentation and typing by removing the need to nest a >>> try / except inside a try / finally. >> >> Isn't it essentially the same suggestion as Alan Johnson's last week? >> http://mail.python.org/pipermail/python-ideas/2013-March/019730.html > > Hmm, I hadn't read that thread. "try with" looked sufficiently ugly > that I wasn't interested :-) > > But anyway it seems that discussion was conflating a lot of things. As things usually end up on python-ideas. But the original proposal was pretty much the same as yours, with a slightly different syntax (his kept `try`) > I would only like a shortcut for a "try ... except" around a "with", > without any other sophistication. Seems "with" is essentially a > "try ... finally", there seems to be a syntactic precedent already. > > And, yes, I actually want to catch exceptions raised *inside* the > "with" block, not just by the "with" statement itself. The database > example above makes it clear: I want the "with" to issue a ROLLBACK on > an exception inside the block, *and* I want to handle the exception in > a specific way after the ROLLBACK. > > Regards > > Antoine. From ncoghlan at gmail.com Fri Mar 8 12:15:32 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 8 Mar 2013 21:15:32 +1000 Subject: [Python-ideas] class-only methods without using metaclasses In-Reply-To: <5139C156.80200@pearwood.info> References: <5139C156.80200@pearwood.info> Message-ID: On Fri, Mar 8, 2013 at 8:45 PM, Steven D'Aprano wrote: > Surely that only applies to the implementation, not the concept itself? > The interface is trivially easy to explain to anyone even half-way familiar > with Python's OO model. > > Class methods are accessible from either the class or the instance, and > receive the class (not the instance) as the first argument. > > Class-only methods are only accessible from the class. I agree the technical distinction isn't subtle. > I'm not sure what use class-only methods are or what problems they solve, > apart from a general dislike of being able to call classmethods from an > instance. I can't think of any case where I would want to actively prohibit > calling a classmethod from an instance, so I don't know that this actually > solves any problems. It's the "When would I recommend using this over a normal classmethod?" that I consider subtle. By only providing one of the two options directly, it means people don't even need to ask the question, let alone figure out how to answer it. > But it looks interesting and I think Eric should > put it up as a recipe on ActiveState. Yes, it's certainly a reasonable thing to post as a recipe. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Fri Mar 8 12:43:52 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 8 Mar 2013 21:43:52 +1000 Subject: [Python-ideas] with ... except In-Reply-To: <20130308111350.51baa3f3@pitrou.net> References: <20130308111350.51baa3f3@pitrou.net> Message-ID: On Fri, Mar 8, 2013 at 8:13 PM, Antoine Pitrou wrote: > > Hello, > > A common pattern for me is to write a with statement for resource > cleanup, but also handle specific errors after that. Right now, this is > a bit cumbersome: > > try: > with open("somefile", "rb)" as f: > ... > except FileNotFoundError: > # do something else, perhaps actually create the file The main problem with this kind of construct is that it makes the scope of the exception handler too broad - it's covering the entire body of the with statement, when you really only want to cover the creation of the file object: try: f = open("somefile", "rb") except FileNotFoundError: # Do something else, perhaps including creating the file else: with f: # This is not covered by the except clause... Generalising this to context manages with non-trivial __enter__ methods is actually one of the intended use cases for contextlib.ExitStack (see http://docs.python.org/dev/library/contextlib#catching-exceptions-from-enter-methods). > or: > > try: > with transaction.commit_on_success(): > ... > except ObjectDoesNotExist: > # do something else, perhaps clean up some internal cache This use case is a bit more reasonable in terms of actually wanting the except clause to cover the whole body of the with statement, but trying to lose the extra indentation level suffers from an ambiguity problem. A full try statement looks like: try: ... except: ... else: ... finally: ... The defined semantics of a with statement already include three of those clauses (try, except, finally). Does the except clause still fire if the with statement suppresses the exception? With the nested form, the answer is clearly yes. With the flattened form, the answer is less obvious. Furthermore, if the with statement allows "except", does it also allow else and finally? If not, why not? It's these issues that make me feel this case is more like requests to merge for + if than it is the past merger of the two forms of try statemet. > How about adding syntax sugar for the above, in the form of a with ... > except clause? It would nicely reduce spurious indentation, as with > the try / except / finally which, long ago(!), helped reduce > indentation and typing by removing the need to nest a try / except > inside a try / finally. The difference there was that the indentation truly was redundant - converting between the two forms literally meant dedenting the inner try/except/else and losing the extra "try:" line. For a long time, the AST didn't even have a merged try/except/finally construct (eventually they *were* merged so that source code could be roundtripped through the AST more reliably). The repeated "try:" was also substantially more irritating than "try:" following a "with:" header. It *is* annoying that composing with statements with explicit exception handling is somewhat clumsy, but I don't think this is the way to fix it. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From eric at trueblade.com Fri Mar 8 12:01:12 2013 From: eric at trueblade.com (Eric V. Smith) Date: Fri, 08 Mar 2013 06:01:12 -0500 Subject: [Python-ideas] Official MySQL module In-Reply-To: <51399C10.2090409@egenix.com> References: <5139098A.2090708@trueblade.com> <51399C10.2090409@egenix.com> Message-ID: <5139C4F8.7050301@trueblade.com> On 3/8/2013 3:06 AM, M.-A. Lemburg wrote: > On 08.03.2013 00:20, Nick Coghlan wrote: >> The number one question to ask about candidates for standard library >> inclusion is "Does it make sense for this module to receive new >> features only once every 18-24 months, and only when you upgrade to a >> new version of Python?". Interfaces to specific external services >> (including databases) almost never pass that test. > > Agreed. I agree with this, too. However, I think this is a transient situation, not a permanent one. It's entirely possible that there exists, or will exist, a MySQL binding that meets this criteria. I don't know. But it shouldn't preclude us considering a binding that meets the criteria. As to Greg's point about not including a database binding that requires other software to run, I disagree. It's client/server: do we really need to include the server in order to supply the client? We include nntplib, with no server. We include webbrowser, but no web browser. imaplib, but no imap server. Etc. -- Eric. From amauryfa at gmail.com Fri Mar 8 13:39:17 2013 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Fri, 8 Mar 2013 13:39:17 +0100 Subject: [Python-ideas] Official MySQL module In-Reply-To: <5139C4F8.7050301@trueblade.com> References: <5139098A.2090708@trueblade.com> <51399C10.2090409@egenix.com> <5139C4F8.7050301@trueblade.com> Message-ID: 2013/3/8 Eric V. Smith > On 3/8/2013 3:06 AM, M.-A. Lemburg wrote: > > On 08.03.2013 00:20, Nick Coghlan wrote: > >> The number one question to ask about candidates for standard library > >> inclusion is "Does it make sense for this module to receive new > >> features only once every 18-24 months, and only when you upgrade to a > >> new version of Python?". Interfaces to specific external services > >> (including databases) almost never pass that test. > > > > Agreed. > > I agree with this, too. However, I think this is a transient situation, > not a permanent one. It's entirely possible that there exists, or will > exist, a MySQL binding that meets this criteria. I don't know. But it > shouldn't preclude us considering a binding that meets the criteria. > > As to Greg's point about not including a database binding that requires > other software to run, I disagree. It's client/server: do we really need > to include the server in order to supply the client? We include nntplib, > with no server. We include webbrowser, but no web browser. imaplib, but > no imap server. Etc. > Doesn't a MySQL binding have to link with some client library? libmysql.so? How is it licensed? I found this page: http://www.mysql.com/about/legal/licensing/foss-exception/ which seems to prevent redistribution under non-free licenses. -- Amaury Forgeot d'Arc -------------- next part -------------- An HTML attachment was scrubbed... URL: From dholth at gmail.com Fri Mar 8 14:18:07 2013 From: dholth at gmail.com (Daniel Holth) Date: Fri, 8 Mar 2013 08:18:07 -0500 Subject: [Python-ideas] Length hinting and preallocation for container types In-Reply-To: <20130308110731.45ff8bac@pitrou.net> References: <513625E8.4060201@python.org> <876215jmg1.fsf@uwakimon.sk.tsukuba.ac.jp> <20130307112817.211fd496@pitrou.net> <87ip52iyvo.fsf@uwakimon.sk.tsukuba.ac.jp> <20130308110731.45ff8bac@pitrou.net> Message-ID: On Fri, Mar 8, 2013 at 5:07 AM, Antoine Pitrou wrote: > Le Fri, 08 Mar 2013 08:16:59 +0900, > "Stephen J. Turnbull" > a ?crit : >> >> > > But why do you think allocation is slow in the general case? >> > > Sure, it involves system calls which indeed do slow things down. >> > >> > It depends what one calls a system call. A memory allocation >> > shouldn't always incur a call to the kernel. Library calls can be >> > quite fast. >> >> If the process itself is growing, eventually it does. > > I think most allocators would request big chunks of memory from the > kernel, and then carve out the small blocks requested by the user from > that. Therefore, my intuition is that a long-running process, if not > leaky, should end up not stressing mmap / sbrk system calls too much. > >> And I suspect that Alex (and Christian!) would consider saving a few >> CPU cycles in an inner loop very important. > > They probably would, but that is not a design point for Python. > > Regards > > Antoine. I am a fan of the proposal. Imagine you are programming for a memory-constrained system. By telling the list how big it needs to be you can save precious RAM. It's a pretty standard feature to be able to hint and trim the size of data structures, just like you can usually choose the buffer size for stream operations. From eliben at gmail.com Fri Mar 8 14:27:59 2013 From: eliben at gmail.com (Eli Bendersky) Date: Fri, 8 Mar 2013 05:27:59 -0800 Subject: [Python-ideas] Length hinting and preallocation for container types In-Reply-To: References: <513625E8.4060201@python.org> <876215jmg1.fsf@uwakimon.sk.tsukuba.ac.jp> <20130307112817.211fd496@pitrou.net> <87ip52iyvo.fsf@uwakimon.sk.tsukuba.ac.jp> <20130308110731.45ff8bac@pitrou.net> Message-ID: On Fri, Mar 8, 2013 at 5:18 AM, Daniel Holth wrote: > On Fri, Mar 8, 2013 at 5:07 AM, Antoine Pitrou > wrote: > > Le Fri, 08 Mar 2013 08:16:59 +0900, > > "Stephen J. Turnbull" > > a ?crit : > >> > >> > > But why do you think allocation is slow in the general case? > >> > > Sure, it involves system calls which indeed do slow things down. > >> > > >> > It depends what one calls a system call. A memory allocation > >> > shouldn't always incur a call to the kernel. Library calls can be > >> > quite fast. > >> > >> If the process itself is growing, eventually it does. > > > > I think most allocators would request big chunks of memory from the > > kernel, and then carve out the small blocks requested by the user from > > that. Therefore, my intuition is that a long-running process, if not > > leaky, should end up not stressing mmap / sbrk system calls too much. > > > >> And I suspect that Alex (and Christian!) would consider saving a few > >> CPU cycles in an inner loop very important. > > > > They probably would, but that is not a design point for Python. > > > > Regards > > > > Antoine. > > I am a fan of the proposal. Imagine you are programming for a > memory-constrained system. By telling the list how big it needs to be > you can save precious RAM. It's a pretty standard feature to be able > to hint and trim the size of data structures, just like you can > usually choose the buffer size for stream operations. > __________________________________ > If it's voting time, I'm -1. Having programmed a lot of memory-constrained systems (not in Python, though) - this is not how things usually work there. In a memory-constrained system, you don't "grow and shrink" your data structures. That's because growing often needs to reallocate the whole chunk and do a copy, and shrinking only helps memory fragmentation. In such systems, you usually know in advance or at least limit the size of data structures and pre-allocate, which is perfectly possible in Python today. Shrinking is rarely, if ever, useful. If it is, you can implement concrete data structures for your concrete needs. And Python has a lot of ways to save memory for large arrays of things (array, numpy, encoding in bytes, etc) if one really wants to. I just don't believe the proposal will help in a lot of realistic code, and it certainly goes against the way of Python. Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Fri Mar 8 13:54:52 2013 From: eric at trueblade.com (Eric V. Smith) Date: Fri, 08 Mar 2013 07:54:52 -0500 Subject: [Python-ideas] Official MySQL module In-Reply-To: References: <5139098A.2090708@trueblade.com> <51399C10.2090409@egenix.com> <5139C4F8.7050301@trueblade.com> Message-ID: <5139DF9C.5080308@trueblade.com> On 3/8/2013 7:39 AM, Amaury Forgeot d'Arc wrote: > > > 2013/3/8 Eric V. Smith > > > On 3/8/2013 3:06 AM, M.-A. Lemburg wrote: > > On 08.03.2013 00:20, Nick Coghlan wrote: > >> The number one question to ask about candidates for standard library > >> inclusion is "Does it make sense for this module to receive new > >> features only once every 18-24 months, and only when you upgrade to a > >> new version of Python?". Interfaces to specific external services > >> (including databases) almost never pass that test. > > > > Agreed. > > I agree with this, too. However, I think this is a transient situation, > not a permanent one. It's entirely possible that there exists, or will > exist, a MySQL binding that meets this criteria. I don't know. But it > shouldn't preclude us considering a binding that meets the criteria. > > As to Greg's point about not including a database binding that requires > other software to run, I disagree. It's client/server: do we really need > to include the server in order to supply the client? We include nntplib, > with no server. We include webbrowser, but no web browser. imaplib, but > no imap server. Etc. > > > Doesn't a MySQL binding have to link with some client library? libmysql.so? > How is it licensed? > I found this > page: http://www.mysql.com/about/legal/licensing/foss-exception/ > which seems to prevent redistribution under non-free licenses. It's not true that a client library is required. PyMySQL is pure Python. We could also write a C connection module ourselves, if needed. Again, I'm not saying I know there's a library suitable for stdlib inclusion, or that the time is right for such inclusion. I'm just saying that it's possible, and I think it's desirable. -- Eric. From solipsis at pitrou.net Fri Mar 8 14:35:39 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 8 Mar 2013 14:35:39 +0100 Subject: [Python-ideas] Length hinting and preallocation for container types References: <513625E8.4060201@python.org> <876215jmg1.fsf@uwakimon.sk.tsukuba.ac.jp> <20130307112817.211fd496@pitrou.net> <87ip52iyvo.fsf@uwakimon.sk.tsukuba.ac.jp> <20130308110731.45ff8bac@pitrou.net> Message-ID: <20130308143539.6e79871e@pitrou.net> Le Fri, 8 Mar 2013 08:18:07 -0500, Daniel Holth a ?crit : > > I am a fan of the proposal. Imagine you are programming for a > memory-constrained system. By telling the list how big it needs to be > you can save precious RAM. Is it an actual use case or are you just imagining it? :) I'm asking because, unless you are only allocating that list and all the objects contained it in it already exist, limiting the list's size won't do much for the process' memory occupation. Regards Antoine. From amauryfa at gmail.com Fri Mar 8 14:39:10 2013 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Fri, 8 Mar 2013 14:39:10 +0100 Subject: [Python-ideas] Official MySQL module In-Reply-To: <5139DF9C.5080308@trueblade.com> References: <5139098A.2090708@trueblade.com> <51399C10.2090409@egenix.com> <5139C4F8.7050301@trueblade.com> <5139DF9C.5080308@trueblade.com> Message-ID: 2013/3/8 Eric V. Smith > > Doesn't a MySQL binding have to link with some client library? > libmysql.so? > > How is it licensed? > > I found this > > page: http://www.mysql.com/about/legal/licensing/foss-exception/ > > which seems to prevent redistribution under non-free licenses. > > It's not true that a client library is required. PyMySQL is pure Python. > We could also write a C connection module ourselves, if needed. > Ah, MySQL TCP protocol is public and easy to implement. Good! OTOH it probably makes it more difficult to use semi-advanced features. Prepared statements, for example. -- Amaury Forgeot d'Arc -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Fri Mar 8 14:37:50 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 8 Mar 2013 14:37:50 +0100 Subject: [Python-ideas] with ... except References: <20130308111350.51baa3f3@pitrou.net> Message-ID: <20130308143750.70a649a0@pitrou.net> Le Fri, 8 Mar 2013 21:43:52 +1000, Nick Coghlan a ?crit : > The defined semantics of a with statement already include three of > those clauses (try, except, finally). Does the except clause still > fire if the with statement suppresses the exception? No, it doesn't. > With the nested > form, the answer is clearly yes. With the flattened form, the answer > is less obvious. Furthermore, if the with statement allows "except", > does it also allow else and finally? If not, why not? It doesn't, simply because I don't need it :-) (but, yes, that would be a reasonable request too) > It *is* annoying that composing with statements with explicit > exception handling is somewhat clumsy, but I don't think this is the > way to fix it. Yep, I think we will eventually have to propose something to fix that, er, "wart" :-) Regards Antoine. From dholth at gmail.com Fri Mar 8 15:04:19 2013 From: dholth at gmail.com (Daniel Holth) Date: Fri, 8 Mar 2013 09:04:19 -0500 Subject: [Python-ideas] Length hinting and preallocation for container types In-Reply-To: <20130308143539.6e79871e@pitrou.net> References: <513625E8.4060201@python.org> <876215jmg1.fsf@uwakimon.sk.tsukuba.ac.jp> <20130307112817.211fd496@pitrou.net> <87ip52iyvo.fsf@uwakimon.sk.tsukuba.ac.jp> <20130308110731.45ff8bac@pitrou.net> <20130308143539.6e79871e@pitrou.net> Message-ID: On Fri, Mar 8, 2013 at 8:35 AM, Antoine Pitrou wrote: > Le Fri, 8 Mar 2013 08:18:07 -0500, > Daniel Holth a ?crit : >> >> I am a fan of the proposal. Imagine you are programming for a >> memory-constrained system. By telling the list how big it needs to be >> you can save precious RAM. > > Is it an actual use case or are you just imagining it? :) > I'm asking because, unless you are only allocating that list and all > the objects contained it in it already exist, limiting the list's size > won't do much for the process' memory occupation. It might help if it was a list of integers between -1 and 99 and 1-character strings. From jsbueno at python.org.br Fri Mar 8 15:07:33 2013 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Fri, 8 Mar 2013 11:07:33 -0300 Subject: [Python-ideas] Official MySQL module In-Reply-To: <5139C4F8.7050301@trueblade.com> References: <5139098A.2090708@trueblade.com> <51399C10.2090409@egenix.com> <5139C4F8.7050301@trueblade.com> Message-ID: On 8 March 2013 08:01, Eric V. Smith wrote: > We include webbrowser, but no web browser. imaplib, but > no imap server. Etc. Picking only this point - becuase it highlights what is bothering me: In my lectures, I use to justify Python not including (up to now) bindings for MySQL or PostgreSQL , in contrast to offering ways to interoperate with imap, http, and pop, because the former are "products" and the later are "standards" - and it would not be well for a language standard library to include "other's products" in it (and yes, on my lecture I talked about this in contrast with PHP). So, I do feel a bit uncomfortable with the idea of including bindings for 3rd party databases in the stdlib - but I think that is an emotional thing only , and can easily be rationalized away with "practicality beats purity" - and the mention somewhere else that in Free Software environment, saying "3rd party" can be misleading. Regards, js -><- From eliben at gmail.com Fri Mar 8 15:40:32 2013 From: eliben at gmail.com (Eli Bendersky) Date: Fri, 8 Mar 2013 06:40:32 -0800 Subject: [Python-ideas] Length hinting and preallocation for container types In-Reply-To: References: <513625E8.4060201@python.org> <876215jmg1.fsf@uwakimon.sk.tsukuba.ac.jp> <20130307112817.211fd496@pitrou.net> <87ip52iyvo.fsf@uwakimon.sk.tsukuba.ac.jp> <20130308110731.45ff8bac@pitrou.net> <20130308143539.6e79871e@pitrou.net> Message-ID: On Fri, Mar 8, 2013 at 6:04 AM, Daniel Holth wrote: > On Fri, Mar 8, 2013 at 8:35 AM, Antoine Pitrou > wrote: > > Le Fri, 8 Mar 2013 08:18:07 -0500, > > Daniel Holth a ?crit : > >> > >> I am a fan of the proposal. Imagine you are programming for a > >> memory-constrained system. By telling the list how big it needs to be > >> you can save precious RAM. > > > > Is it an actual use case or are you just imagining it? :) > > I'm asking because, unless you are only allocating that list and all > > the objects contained it in it already exist, limiting the list's size > > won't do much for the process' memory occupation. > > It might help if it was a list of integers between -1 and 99 and > 1-character strings. That's not what you should use lists for if memory consumption matters. Use http://docs.python.org/dev/library/array.html, especially if your integers are in such a limited range. Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From eliben at gmail.com Fri Mar 8 15:43:47 2013 From: eliben at gmail.com (Eli Bendersky) Date: Fri, 8 Mar 2013 06:43:47 -0800 Subject: [Python-ideas] Length hinting and preallocation for container types In-Reply-To: References: <513625E8.4060201@python.org> <876215jmg1.fsf@uwakimon.sk.tsukuba.ac.jp> <20130307112817.211fd496@pitrou.net> <87ip52iyvo.fsf@uwakimon.sk.tsukuba.ac.jp> <20130308110731.45ff8bac@pitrou.net> <20130308143539.6e79871e@pitrou.net> Message-ID: On Fri, Mar 8, 2013 at 6:40 AM, Eli Bendersky wrote: > On Fri, Mar 8, 2013 at 6:04 AM, Daniel Holth wrote: > >> On Fri, Mar 8, 2013 at 8:35 AM, Antoine Pitrou >> wrote: >> > Le Fri, 8 Mar 2013 08:18:07 -0500, >> > Daniel Holth a ?crit : >> >> >> >> I am a fan of the proposal. Imagine you are programming for a >> >> memory-constrained system. By telling the list how big it needs to be >> >> you can save precious RAM. >> > >> > Is it an actual use case or are you just imagining it? :) >> > I'm asking because, unless you are only allocating that list and all >> > the objects contained it in it already exist, limiting the list's size >> > won't do much for the process' memory occupation. >> >> It might help if it was a list of integers between -1 and 99 and >> 1-character strings. > > > That's not what you should use lists for if memory consumption matters. > Use http://docs.python.org/dev/library/array.html, especially if your > integers are in such a limited range. > >>> sys.getsizeof(list(range(100))) 1024 >>> sys.getsizeof(array('i', list(range(100)))) 480 >>> sys.getsizeof(array('b', list(range(100)))) 180 This can help you *way* more than playing with growing and shrinking lists. Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Fri Mar 8 15:48:18 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 08 Mar 2013 15:48:18 +0100 Subject: [Python-ideas] Length hinting and preallocation for container types In-Reply-To: References: <513625E8.4060201@python.org> <876215jmg1.fsf@uwakimon.sk.tsukuba.ac.jp> <20130307112817.211fd496@pitrou.net> <87ip52iyvo.fsf@uwakimon.sk.tsukuba.ac.jp> <20130308110731.45ff8bac@pitrou.net> <20130308143539.6e79871e@pitrou.net> Message-ID: Eli Bendersky, 08.03.2013 15:40: > On Fri, Mar 8, 2013 at 6:04 AM, Daniel Holth wrote: >> On Fri, Mar 8, 2013 at 8:35 AM, Antoine Pitrou wrote: >>> Le Fri, 8 Mar 2013 08:18:07 -0500, >>> Daniel Holth a ?crit : >>>> I am a fan of the proposal. Imagine you are programming for a >>>> memory-constrained system. By telling the list how big it needs to be >>>> you can save precious RAM. >>> >>> Is it an actual use case or are you just imagining it? :) >>> I'm asking because, unless you are only allocating that list and all >>> the objects contained it in it already exist, limiting the list's size >>> won't do much for the process' memory occupation. >> >> It might help if it was a list of integers between -1 and 99 and >> 1-character strings. > > That's not what you should use lists for if memory consumption matters. Use > http://docs.python.org/dev/library/array.html, especially if your integers > are in such a limited range. Yep, and regarding the second part, a string is a very efficient way to store many 1-character strings. Stefan From oscar.j.benjamin at gmail.com Fri Mar 8 15:48:34 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Fri, 8 Mar 2013 14:48:34 +0000 Subject: [Python-ideas] with ... except In-Reply-To: <20130308111350.51baa3f3@pitrou.net> References: <20130308111350.51baa3f3@pitrou.net> Message-ID: On 8 March 2013 10:13, Antoine Pitrou wrote: > > A common pattern for me is to write a with statement for resource > cleanup, but also handle specific errors after that. Right now, this is > a bit cumbersome: > > try: > with open("somefile", "rb)" as f: > ... > except FileNotFoundError: > # do something else, perhaps actually create the file In some cases it might be reasonable to make a context manager that handles errors from the original context manager e.g.: import contextlib @contextlib.contextmanager def handle(errorcls, func, *args, **kwargs): try: yield except errorcls: func(*args, **kwargs) with handle(FileNotFoundError, print, 'error'), open('somefile', 'rb') as f: print('No error') > or: > > try: > with transaction.commit_on_success(): > ... > except ObjectDoesNotExist: > # do something else, perhaps clean up some internal cache Another possibility is a context manager that handles both things, e.g.: @contextmanager def commit_or_clean(errorcls): try: with transaction.commit_on_success(): yield except errorcls: clean() Oscar From dholth at gmail.com Fri Mar 8 15:55:59 2013 From: dholth at gmail.com (Daniel Holth) Date: Fri, 8 Mar 2013 09:55:59 -0500 Subject: [Python-ideas] Length hinting and preallocation for container types In-Reply-To: References: <513625E8.4060201@python.org> <876215jmg1.fsf@uwakimon.sk.tsukuba.ac.jp> <20130307112817.211fd496@pitrou.net> <87ip52iyvo.fsf@uwakimon.sk.tsukuba.ac.jp> <20130308110731.45ff8bac@pitrou.net> <20130308143539.6e79871e@pitrou.net> Message-ID: On Fri, Mar 8, 2013 at 9:48 AM, Stefan Behnel wrote: > Eli Bendersky, 08.03.2013 15:40: >> On Fri, Mar 8, 2013 at 6:04 AM, Daniel Holth wrote: >>> On Fri, Mar 8, 2013 at 8:35 AM, Antoine Pitrou wrote: >>>> Le Fri, 8 Mar 2013 08:18:07 -0500, >>>> Daniel Holth a ?crit : >>>>> I am a fan of the proposal. Imagine you are programming for a >>>>> memory-constrained system. By telling the list how big it needs to be >>>>> you can save precious RAM. >>>> >>>> Is it an actual use case or are you just imagining it? :) >>>> I'm asking because, unless you are only allocating that list and all >>>> the objects contained it in it already exist, limiting the list's size >>>> won't do much for the process' memory occupation. >>> >>> It might help if it was a list of integers between -1 and 99 and >>> 1-character strings. >> >> That's not what you should use lists for if memory consumption matters. Use >> http://docs.python.org/dev/library/array.html, especially if your integers >> are in such a limited range. > > Yep, and regarding the second part, a string is a very efficient way to > store many 1-character strings. > > Stefan I do know C From stefan_ml at behnel.de Fri Mar 8 16:02:01 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 08 Mar 2013 16:02:01 +0100 Subject: [Python-ideas] Length hinting and preallocation for container types In-Reply-To: References: <513625E8.4060201@python.org> <876215jmg1.fsf@uwakimon.sk.tsukuba.ac.jp> <20130307112817.211fd496@pitrou.net> <87ip52iyvo.fsf@uwakimon.sk.tsukuba.ac.jp> <20130308110731.45ff8bac@pitrou.net> <20130308143539.6e79871e@pitrou.net> Message-ID: Daniel Holth, 08.03.2013 15:55: > On Fri, Mar 8, 2013 at 9:48 AM, Stefan Behnel wrote: >> Eli Bendersky, 08.03.2013 15:40: >>> On Fri, Mar 8, 2013 at 6:04 AM, Daniel Holth wrote: >>>> On Fri, Mar 8, 2013 at 8:35 AM, Antoine Pitrou wrote: >>>>> Le Fri, 8 Mar 2013 08:18:07 -0500, >>>>> Daniel Holth a ?crit : >>>>>> I am a fan of the proposal. Imagine you are programming for a >>>>>> memory-constrained system. By telling the list how big it needs to be >>>>>> you can save precious RAM. >>>>> >>>>> Is it an actual use case or are you just imagining it? :) >>>>> I'm asking because, unless you are only allocating that list and all >>>>> the objects contained it in it already exist, limiting the list's size >>>>> won't do much for the process' memory occupation. >>>> >>>> It might help if it was a list of integers between -1 and 99 and >>>> 1-character strings. >>> >>> That's not what you should use lists for if memory consumption matters. Use >>> http://docs.python.org/dev/library/array.html, especially if your integers >>> are in such a limited range. >> >> Yep, and regarding the second part, a string is a very efficient way to >> store many 1-character strings. > > I do know C So do I. This thread is about Python, though. At least, that's what I think it is. Stefan From solipsis at pitrou.net Fri Mar 8 16:15:28 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 8 Mar 2013 16:15:28 +0100 Subject: [Python-ideas] Length hinting and preallocation for container types References: <513625E8.4060201@python.org> <876215jmg1.fsf@uwakimon.sk.tsukuba.ac.jp> <20130307112817.211fd496@pitrou.net> <87ip52iyvo.fsf@uwakimon.sk.tsukuba.ac.jp> <20130308110731.45ff8bac@pitrou.net> <20130308143539.6e79871e@pitrou.net> Message-ID: <20130308161528.38e63873@pitrou.net> Le Fri, 08 Mar 2013 16:02:01 +0100, Stefan Behnel a ?crit : > Daniel Holth, 08.03.2013 15:55: > > On Fri, Mar 8, 2013 at 9:48 AM, Stefan Behnel wrote: > >> Eli Bendersky, 08.03.2013 15:40: > >>> On Fri, Mar 8, 2013 at 6:04 AM, Daniel Holth wrote: > >>>> On Fri, Mar 8, 2013 at 8:35 AM, Antoine Pitrou wrote: > >>>>> Le Fri, 8 Mar 2013 08:18:07 -0500, > >>>>> Daniel Holth a ?crit : > >>>>>> I am a fan of the proposal. Imagine you are programming for a > >>>>>> memory-constrained system. By telling the list how big it > >>>>>> needs to be you can save precious RAM. > >>>>> > >>>>> Is it an actual use case or are you just imagining it? :) > >>>>> I'm asking because, unless you are only allocating that list > >>>>> and all the objects contained it in it already exist, limiting > >>>>> the list's size won't do much for the process' memory > >>>>> occupation. > >>>> > >>>> It might help if it was a list of integers between -1 and 99 and > >>>> 1-character strings. > >>> > >>> That's not what you should use lists for if memory consumption > >>> matters. Use http://docs.python.org/dev/library/array.html, > >>> especially if your integers are in such a limited range. > >> > >> Yep, and regarding the second part, a string is a very efficient > >> way to store many 1-character strings. > > > > I do know C > > So do I. This thread is about Python, though. At least, that's what I > think it is. The way I read it, Daniel's message about small integers and 1-character strings was humorous. Obviously if you are memory-constrained you have better things to do than accumulating many one-character strings in a large Python list. Regards Antoine. From dholth at gmail.com Fri Mar 8 16:17:03 2013 From: dholth at gmail.com (Daniel Holth) Date: Fri, 8 Mar 2013 10:17:03 -0500 Subject: [Python-ideas] Length hinting and preallocation for container types In-Reply-To: References: <513625E8.4060201@python.org> <876215jmg1.fsf@uwakimon.sk.tsukuba.ac.jp> <20130307112817.211fd496@pitrou.net> <87ip52iyvo.fsf@uwakimon.sk.tsukuba.ac.jp> <20130308110731.45ff8bac@pitrou.net> <20130308143539.6e79871e@pitrou.net> Message-ID: On Fri, Mar 8, 2013 at 10:02 AM, Stefan Behnel wrote: > Daniel Holth, 08.03.2013 15:55: >> On Fri, Mar 8, 2013 at 9:48 AM, Stefan Behnel wrote: >>> Eli Bendersky, 08.03.2013 15:40: >>>> On Fri, Mar 8, 2013 at 6:04 AM, Daniel Holth wrote: >>>>> On Fri, Mar 8, 2013 at 8:35 AM, Antoine Pitrou wrote: >>>>>> Le Fri, 8 Mar 2013 08:18:07 -0500, >>>>>> Daniel Holth a ?crit : >>>>>>> I am a fan of the proposal. Imagine you are programming for a >>>>>>> memory-constrained system. By telling the list how big it needs to be >>>>>>> you can save precious RAM. >>>>>> >>>>>> Is it an actual use case or are you just imagining it? :) >>>>>> I'm asking because, unless you are only allocating that list and all >>>>>> the objects contained it in it already exist, limiting the list's size >>>>>> won't do much for the process' memory occupation. >>>>> >>>>> It might help if it was a list of integers between -1 and 99 and >>>>> 1-character strings. >>>> >>>> That's not what you should use lists for if memory consumption matters. Use >>>> http://docs.python.org/dev/library/array.html, especially if your integers >>>> are in such a limited range. >>> >>> Yep, and regarding the second part, a string is a very efficient way to >>> store many 1-character strings. >> >> I do know C > > So do I. This thread is about Python, though. At least, that's what I think > it is. IIUC the JIT is smart enough to give me a very efficient list of unboxed integers without having to change the type, increasing the Pythonicity of my program. From solipsis at pitrou.net Fri Mar 8 16:16:25 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 8 Mar 2013 16:16:25 +0100 Subject: [Python-ideas] with ... except References: <20130308111350.51baa3f3@pitrou.net> Message-ID: <20130308161625.1ba3902e@pitrou.net> Le Fri, 8 Mar 2013 14:48:34 +0000, Oscar Benjamin a ?crit : > > > or: > > > > try: > > with transaction.commit_on_success(): > > ... > > except ObjectDoesNotExist: > > # do something else, perhaps clean up some internal cache > > Another possibility is a context manager that handles both things, > e.g.: > > @contextmanager > def commit_or_clean(errorcls): > try: > with transaction.commit_on_success(): > yield > except errorcls: > clean() That's true, but only if the two things are strongly related, not in the general case. Regards Antoine. From stefan_ml at behnel.de Fri Mar 8 16:28:32 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 08 Mar 2013 16:28:32 +0100 Subject: [Python-ideas] Length hinting and preallocation for container types In-Reply-To: <513625E8.4060201@python.org> References: <513625E8.4060201@python.org> Message-ID: Christian Heimes, 05.03.2013 18:05: > today I came across this slides > https://speakerdeck.com/alex/why-python-ruby-and-javascript-are-slow by > Alex Gaynor. The slides aren't some random rants on Python. Alex makes > some valid points. I just read through them. I'm ok with the first part, but when it comes to the "why it's really slow" section, I get the impression that Alex has misunderstood something about Python (and maybe programming languages in general). There's no need to make "dynamic languages" C-ish, and why should they be? We'll always be using more than one language for what we do, and that's neither good nor bad, just normal. For the few cases (more than 5%? Anyone?) where you really need to do zero-copy-whatever stuff or other low-level-I-really-know-what-I-am-doing-kind-of-things, you can just write them in a language that fits your use case and that allows you to do exactly the kind of zero-copy or bits-here-and-there or immutable data structures operations you need. That may or may not be C. It may be Fortran, it may be Lua, it may be Haskell, it may be Lisp. It depends on what you know and what you need. CPython has a well established reputation as an extremely good and easy to use integration platform, and that's its main selling point. Let's keep using it like that. Stefan From stefan_ml at behnel.de Fri Mar 8 16:45:55 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 08 Mar 2013 16:45:55 +0100 Subject: [Python-ideas] Length hinting and preallocation for container types In-Reply-To: References: <513625E8.4060201@python.org> <876215jmg1.fsf@uwakimon.sk.tsukuba.ac.jp> <20130307112817.211fd496@pitrou.net> <87ip52iyvo.fsf@uwakimon.sk.tsukuba.ac.jp> <20130308110731.45ff8bac@pitrou.net> <20130308143539.6e79871e@pitrou.net> Message-ID: Daniel Holth, 08.03.2013 16:17: > On Fri, Mar 8, 2013 at 10:02 AM, Stefan Behnel wrote: >> Daniel Holth, 08.03.2013 15:55: >>> On Fri, Mar 8, 2013 at 9:48 AM, Stefan Behnel wrote: >>>> Eli Bendersky, 08.03.2013 15:40: >>>>> On Fri, Mar 8, 2013 at 6:04 AM, Daniel Holth wrote: >>>>>> On Fri, Mar 8, 2013 at 8:35 AM, Antoine Pitrou wrote: >>>>>>> Le Fri, 8 Mar 2013 08:18:07 -0500, >>>>>>> Daniel Holth a ?crit : >>>>>>>> I am a fan of the proposal. Imagine you are programming for a >>>>>>>> memory-constrained system. By telling the list how big it needs to be >>>>>>>> you can save precious RAM. >>>>>>> >>>>>>> Is it an actual use case or are you just imagining it? :) >>>>>>> I'm asking because, unless you are only allocating that list and all >>>>>>> the objects contained it in it already exist, limiting the list's size >>>>>>> won't do much for the process' memory occupation. >>>>>> >>>>>> It might help if it was a list of integers between -1 and 99 and >>>>>> 1-character strings. >>>>> >>>>> That's not what you should use lists for if memory consumption matters. Use >>>>> http://docs.python.org/dev/library/array.html, especially if your integers >>>>> are in such a limited range. >>>> >>>> Yep, and regarding the second part, a string is a very efficient way to >>>> store many 1-character strings. >>> >>> I do know C >> >> So do I. This thread is about Python, though. At least, that's what I think >> it is. > > IIUC the JIT is smart enough to give me a very efficient list of > unboxed integers without having to change the type, increasing the > Pythonicity of my program. It may or may not. It's a runtime optimiser, there's no guarantee that it will always perform "as expected". For example, it may decide to optimise your list for integer values up to 255, and when you add a value 256 for some reason, it may have to reallocate and copy the whole list. And when you remove the last 256 value from the list, there is no guarantee that it will shrink your list back to the optimal size, it may just keep wasting memory. Oh, and it may actually waste memory right from the start, by not optimising your list for values up to 255 but for values up to 2**31, although all you actually wanted to store was values between 1 and 99, right? It's always a good idea to put some thoughts into the choice of the right data structure for your use case. So, that being said, should we discuss extending this proposal to add a new API for Python lists that allows defining the maximum value of integer values that you want to store in them? That would allow for some serious optimisations. Stefan From dholth at gmail.com Fri Mar 8 16:50:15 2013 From: dholth at gmail.com (Daniel Holth) Date: Fri, 8 Mar 2013 10:50:15 -0500 Subject: [Python-ideas] Length hinting and preallocation for container types In-Reply-To: References: <513625E8.4060201@python.org> <876215jmg1.fsf@uwakimon.sk.tsukuba.ac.jp> <20130307112817.211fd496@pitrou.net> <87ip52iyvo.fsf@uwakimon.sk.tsukuba.ac.jp> <20130308110731.45ff8bac@pitrou.net> <20130308143539.6e79871e@pitrou.net> Message-ID: On Fri, Mar 8, 2013 at 10:45 AM, Stefan Behnel wrote: > Daniel Holth, 08.03.2013 16:17: >> On Fri, Mar 8, 2013 at 10:02 AM, Stefan Behnel wrote: >>> Daniel Holth, 08.03.2013 15:55: >>>> On Fri, Mar 8, 2013 at 9:48 AM, Stefan Behnel wrote: >>>>> Eli Bendersky, 08.03.2013 15:40: >>>>>> On Fri, Mar 8, 2013 at 6:04 AM, Daniel Holth wrote: >>>>>>> On Fri, Mar 8, 2013 at 8:35 AM, Antoine Pitrou wrote: >>>>>>>> Le Fri, 8 Mar 2013 08:18:07 -0500, >>>>>>>> Daniel Holth a ?crit : >>>>>>>>> I am a fan of the proposal. Imagine you are programming for a >>>>>>>>> memory-constrained system. By telling the list how big it needs to be >>>>>>>>> you can save precious RAM. >>>>>>>> >>>>>>>> Is it an actual use case or are you just imagining it? :) >>>>>>>> I'm asking because, unless you are only allocating that list and all >>>>>>>> the objects contained it in it already exist, limiting the list's size >>>>>>>> won't do much for the process' memory occupation. >>>>>>> >>>>>>> It might help if it was a list of integers between -1 and 99 and >>>>>>> 1-character strings. >>>>>> >>>>>> That's not what you should use lists for if memory consumption matters. Use >>>>>> http://docs.python.org/dev/library/array.html, especially if your integers >>>>>> are in such a limited range. >>>>> >>>>> Yep, and regarding the second part, a string is a very efficient way to >>>>> store many 1-character strings. >>>> >>>> I do know C >>> >>> So do I. This thread is about Python, though. At least, that's what I think >>> it is. >> >> IIUC the JIT is smart enough to give me a very efficient list of >> unboxed integers without having to change the type, increasing the >> Pythonicity of my program. > > It may or may not. It's a runtime optimiser, there's no guarantee that it > will always perform "as expected". For example, it may decide to optimise > your list for integer values up to 255, and when you add a value 256 for > some reason, it may have to reallocate and copy the whole list. And when > you remove the last 256 value from the list, there is no guarantee that it > will shrink your list back to the optimal size, it may just keep wasting > memory. Oh, and it may actually waste memory right from the start, by not > optimising your list for values up to 255 but for values up to 2**31, > although all you actually wanted to store was values between 1 and 99, right? > > It's always a good idea to put some thoughts into the choice of the right > data structure for your use case. > > So, that being said, should we discuss extending this proposal to add a new > API for Python lists that allows defining the maximum value of integer > values that you want to store in them? That would allow for some serious > optimisations. Definitely. list(hint=prime_numbers_only, length=42) From python at mrabarnett.plus.com Fri Mar 8 17:23:11 2013 From: python at mrabarnett.plus.com (MRAB) Date: Fri, 08 Mar 2013 16:23:11 +0000 Subject: [Python-ideas] with ... except In-Reply-To: <201303081041.50701.mark.hackett@metoffice.gov.uk> References: <20130308111350.51baa3f3@pitrou.net> <20130308113342.5f54c381@pitrou.net> <201303081041.50701.mark.hackett@metoffice.gov.uk> Message-ID: <513A106F.3010007@mrabarnett.plus.com> On 08/03/2013 10:41, Mark Hackett wrote: > On Friday 08 Mar 2013, Antoine Pitrou wrote: >> Le Fri, 8 Mar 2013 11:24:02 +0100, >> >> Masklinn a ?crit : >> > On 2013-03-08, at 11:13 , Antoine Pitrou wrote: >> > > Hello, >> > > >> > > A common pattern for me is to write a with statement for resource >> > > cleanup, but also handle specific errors after that. Right now, >> > > this is a bit cumbersome: >> > > >> > > try: >> > > with open("somefile", "rb)" as f: >> > > ... >> > > except FileNotFoundError: >> > > # do something else, perhaps actually create the file >> > > >> > > or: >> > > >> > > try: >> > > with transaction.commit_on_success(): >> > > ... >> > > except ObjectDoesNotExist: >> > > # do something else, perhaps clean up some internal cache >> > > >> > > >> > > How about adding syntax sugar for the above, in the form of a >> > > with ... except clause? It would nicely reduce spurious >> > > indentation, as with the try / except / finally which, long ago(!), >> > > helped reduce indentation and typing by removing the need to nest a >> > > try / except inside a try / finally. >> > >> > Isn't it essentially the same suggestion as Alan Johnson's last week? >> > http://mail.python.org/pipermail/python-ideas/2013-March/019730.html >> >> Hmm, I hadn't read that thread. "try with" looked sufficiently ugly >> that I wasn't interested :-) >> > > Ugh, someone is going to suggest we have "try ... without" now... > > And it's so angocentric. How come the calls are to repeat *English grammar*? > German grammar would probably be a lot clearer for a compiler/interpreter. > German grammar? Surely you mean Dutch grammar! :-) > Of course, that wouldn't be possible if we still had the 80-column limit... > :-) From ericsnowcurrently at gmail.com Fri Mar 8 17:26:12 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 8 Mar 2013 09:26:12 -0700 Subject: [Python-ideas] class-only methods without using metaclasses In-Reply-To: <5139C156.80200@pearwood.info> References: <5139C156.80200@pearwood.info> Message-ID: On Fri, Mar 8, 2013 at 3:45 AM, Steven D'Aprano wrote: > But it looks interesting and I think Eric should > put it up as a recipe on ActiveState. I already had. :) http://code.activestate.com/recipes/578486 -eric From steve at pearwood.info Fri Mar 8 17:37:13 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 09 Mar 2013 03:37:13 +1100 Subject: [Python-ideas] Length hinting and preallocation for container types In-Reply-To: References: <513625E8.4060201@python.org> <876215jmg1.fsf@uwakimon.sk.tsukuba.ac.jp> <20130307112817.211fd496@pitrou.net> <87ip52iyvo.fsf@uwakimon.sk.tsukuba.ac.jp> <20130308110731.45ff8bac@pitrou.net> Message-ID: <513A13B9.6030503@pearwood.info> On 09/03/13 00:27, Eli Bendersky wrote: > If it's voting time, I'm -1. Having programmed a lot of memory-constrained > systems (not in Python, though) - this is not how things usually work > there. In a memory-constrained system, you don't "grow and shrink" your > data structures. That's because growing often needs to reallocate the whole > chunk and do a copy, and shrinking only helps memory fragmentation. In such > systems, you usually know in advance or at least limit the size of data > structures and pre-allocate, which is perfectly possible in Python today. Are you referring to using (say) [None]*n, for some size n? Is this a language guarantee that it won't over-allocate, or just an accident of implementation? -- Steven From ericsnowcurrently at gmail.com Fri Mar 8 17:45:51 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 8 Mar 2013 09:45:51 -0700 Subject: [Python-ideas] class-only methods without using metaclasses In-Reply-To: References: <5139C156.80200@pearwood.info> Message-ID: On Fri, Mar 8, 2013 at 4:15 AM, Nick Coghlan wrote: > It's the "When would I recommend using this over a normal > classmethod?" that I consider subtle. By only providing one of the two > options directly, it means people don't even need to ask the question, > let alone figure out how to answer it. And this is where I agree. I will say that the proper implementation isn't trivial and my recipe is only pretty close. If it were more frequently useful, I'd press for a classonlymethod() in the standard library (not builtins). If that changes I'll bring it up again, but for now I'm not convinced it is worth it. -eric From guido at python.org Fri Mar 8 18:45:58 2013 From: guido at python.org (Guido van Rossum) Date: Fri, 8 Mar 2013 09:45:58 -0800 Subject: [Python-ideas] Official MySQL module In-Reply-To: References: <5139098A.2090708@trueblade.com> <51399C10.2090409@egenix.com> <5139C4F8.7050301@trueblade.com> Message-ID: On Fri, Mar 8, 2013 at 6:07 AM, Joao S. O. Bueno wrote: > On 8 March 2013 08:01, Eric V. Smith wrote: >> We include webbrowser, but no web browser. imaplib, but >> no imap server. Etc. > > > Picking only this point - becuase it highlights what is bothering me: > In my lectures, I use to justify Python not including (up to now) > bindings for MySQL or PostgreSQL , in contrast to offering > ways to interoperate with imap, http, and pop, because the former > are "products" and the later are "standards" - and it would not be > well for a language standard library to include "other's products" in > it (and yes, on my lecture I talked about this in contrast with PHP). That sounds rather an idealized opinion, and not a very useful attitude. Some "standards" represent commercial interests. The "products" you mention are in fact well-respected open source projects. But anyway, Python's standard library (unlike GNU) does not have an ax to grind about what software should or should not be supported from an ethical perspective. It is all about what is useful to a large enough number of Python users, and what is feasible given the available volunteer power and the technical properties of the software in question. > So, I do feel a bit uncomfortable with the idea of including bindings for > 3rd party databases in the stdlib - but I think that is an emotional thing > only , and can easily be rationalized away with > "practicality beats purity" - and the mention somewhere else that > in Free Software environment, saying "3rd party" can be > misleading. I do not understand this last phrase. -- --Guido van Rossum (python.org/~guido) From tjreedy at udel.edu Fri Mar 8 21:06:35 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 08 Mar 2013 15:06:35 -0500 Subject: [Python-ideas] Official MySQL module In-Reply-To: References: <5139098A.2090708@trueblade.com> <51399C10.2090409@egenix.com> <5139C4F8.7050301@trueblade.com> Message-ID: On 3/8/2013 12:45 PM, Guido van Rossum wrote: > But anyway, Python's standard library (unlike GNU) does not have an ax > to grind about what software should or should not be supported from an > ethical perspective. It is all about what is useful to a large enough > number of Python users, and what is feasible given the available > volunteer power and the technical properties of the software in > question. Given that we have a growing list of open issues, now at 3800 (1600 behavior, 1200 enhancement*, 600 not selected (doc?), 400 other), I think perhaps we are already beyond what the current volunteers can handle. * I think the majority of these should either be closed or referred to this list, as they are essentially dead without getting support here. In any case, with Guido having said he is open to the possibility of a mysql client module, and others having expressed opinions on the abstract idea, pro and con, it seems to me that the next step is a concrete proposal. -- Terry Jan Reedy From merwok at netwok.org Fri Mar 8 20:48:41 2013 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Fri, 08 Mar 2013 14:48:41 -0500 Subject: [Python-ideas] Official MySQL module In-Reply-To: References: <5139098A.2090708@trueblade.com> <51399C10.2090409@egenix.com> <5139C4F8.7050301@trueblade.com> Message-ID: <513A4099.1040907@netwok.org> Le 08/03/2013 09:07, Joao S. O. Bueno a ?crit : > the mention somewhere else that in Free Software > environment, saying "3rd party" can be misleading. In Python documentation and materials, we use that term to mean ?not included in the standard library?. The user is one party, the Python distribution is another one, and modules that are not in the stdlib are called ?third-party modules? and available from sources such as PyPI. I don?t think there?s more than that to it. Regards From jsbueno at python.org.br Fri Mar 8 21:27:01 2013 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Fri, 8 Mar 2013 17:27:01 -0300 Subject: [Python-ideas] Official MySQL module In-Reply-To: References: <5139098A.2090708@trueblade.com> <51399C10.2090409@egenix.com> <5139C4F8.7050301@trueblade.com> Message-ID: On 8 March 2013 14:45, Guido van Rossum wrote: > On Fri, Mar 8, 2013 at 6:07 AM, Joao S. O. Bueno wrote: >> On 8 March 2013 08:01, Eric V. Smith wrote: >>> We include webbrowser, but no web browser. imaplib, but >>> no imap server. Etc. >> >> >> Picking only this point - becuase it highlights what is bothering me: >> In my lectures, I use to justify Python not including (up to now) >> bindings for MySQL or PostgreSQL , in contrast to offering >> ways to interoperate with imap, http, and pop, because the former >> are "products" and the later are "standards" - and it would not be >> well for a language standard library to include "other's products" in >> it (and yes, on my lecture I talked about this in contrast with PHP). > > That sounds rather an idealized opinion, and not a very useful > attitude. Some "standards" represent commercial interests. The > "products" you mention are in fact well-respected open source > projects. > > But anyway, Python's standard library (unlike GNU) does not have an ax > to grind about what software should or should not be supported from an > ethical perspective. It is all about what is useful to a large enough > number of Python users, and what is feasible given the available > volunteer power and the technical properties of the software in > question. > >> So, I do feel a bit uncomfortable with the idea of including bindings for >> 3rd party databases in the stdlib - but I think that is an emotional thing >> only , and can easily be rationalized away with >> "practicality beats purity" - and the mention somewhere else that >> in Free Software environment, saying "3rd party" can be >> misleading. > > I do not understand this last phrase. It does not amtter much - it is that I did not remember if it was in this thread or in another list that someone wrote "in Open Source we don't usually refer to other projects as 3rd party" (and it happened it was on another list). Anyway, you just clarified exactly what I intended to mean above with """ The "products" you mention are in fact well-respected open source projects.""" All in all - the argument I cited on the grand-parent message was something I build in lectures to address PHP people querying about built-in MySQL support - and I _do_ agree with "pratically beats purity" - or as you expanded it: """ Python's standard library (unlike GNU) does not have an ax to grind about what software should or should not be supported from an ethical perspective.""" Thanks for the response anyway. Now..onto those PEPs :-) (maybe someone would like to tackle writing CFFI based drivers for PostgreSQL and MySQL? :-) ) js -><- > > -- > --Guido van Rossum (python.org/~guido) From greg.ewing at canterbury.ac.nz Sat Mar 9 00:10:29 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 09 Mar 2013 12:10:29 +1300 Subject: [Python-ideas] with ... except In-Reply-To: <201303081041.50701.mark.hackett@metoffice.gov.uk> References: <20130308111350.51baa3f3@pitrou.net> <20130308113342.5f54c381@pitrou.net> <201303081041.50701.mark.hackett@metoffice.gov.uk> Message-ID: <513A6FE5.7080507@canterbury.ac.nz> Mark Hackett wrote: > And it's so angocentric. How come the calls are to repeat *English grammar*? > German grammar would probably be a lot clearer for a compiler/interpreter. How about Latin? http://www.csse.monash.edu.au/~damian/papers/HTML/Perligata.html -- Greg From greg.ewing at canterbury.ac.nz Sat Mar 9 00:15:57 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 09 Mar 2013 12:15:57 +1300 Subject: [Python-ideas] Official MySQL module In-Reply-To: <5139C4F8.7050301@trueblade.com> References: <5139098A.2090708@trueblade.com> <51399C10.2090409@egenix.com> <5139C4F8.7050301@trueblade.com> Message-ID: <513A712D.901@canterbury.ac.nz> Eric V. Smith wrote: > It's client/server: do we really need > to include the server in order to supply the client? We include nntplib, > with no server. We include webbrowser, but no web browser. imaplib, but > no imap server. Etc. Those client modules are self-contained, though. Database client modules usually rely on a C component that comes with the database and gets updated on the database's release schedule rather than Python's. -- Greg From amauryfa at gmail.com Sat Mar 9 00:54:35 2013 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Sat, 9 Mar 2013 00:54:35 +0100 Subject: [Python-ideas] Official MySQL module In-Reply-To: <513A712D.901@canterbury.ac.nz> References: <5139098A.2090708@trueblade.com> <51399C10.2090409@egenix.com> <5139C4F8.7050301@trueblade.com> <513A712D.901@canterbury.ac.nz> Message-ID: 2013/3/9 Greg Ewing > Eric V. Smith wrote: > >> It's client/server: do we really need >> to include the server in order to supply the client? We include nntplib, >> with no server. We include webbrowser, but no web browser. imaplib, but >> no imap server. Etc. >> > > Those client modules are self-contained, though. Database > client modules usually rely on a C component that comes > with the database and gets updated on the database's > release schedule rather than Python's. Not with PyMySQL which directly implements the MySQL protocol, on top of socket and ssl modules. Not as nice a an official C API, though. -- Amaury Forgeot d'Arc -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sat Mar 9 01:08:29 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 8 Mar 2013 16:08:29 -0800 (PST) Subject: [Python-ideas] with ... except In-Reply-To: <513A6FE5.7080507@canterbury.ac.nz> References: <20130308111350.51baa3f3@pitrou.net> <20130308113342.5f54c381@pitrou.net> <201303081041.50701.mark.hackett@metoffice.gov.uk> <513A6FE5.7080507@canterbury.ac.nz> Message-ID: <1362787709.70548.YahooMailNeo@web184705.mail.ne1.yahoo.com> > From: Greg Ewing > Sent: Friday, March 8, 2013 3:10 PM > > Mark Hackett wrote: >> And it's so angocentric. How come the calls are to repeat *English > grammar*? German grammar would probably be a lot clearer for a > compiler/interpreter. > > How about Latin? > > http://www.csse.monash.edu.au/~damian/papers/HTML/Perligata.html Come on, what's the point of using inflection to get rid of word order if you're not going to also use it to get rid of punctuation and grammatical function words? For example, if you've got separate instrumental and accusative cases, you don't need "with".?And with?a distinction between the imperative mood and something else, like jussive, or mood modifiers that let you create something like conditional-imperative,?you don't need "try". Thus, the entire issue that started this thread would never come up. In a language without case stacking, there are limits to how far you can take this, but there are plenty of languages that have case stacking?or, better, full polysynthesis. With both pervasive argument incorporation and unbounded compound agglutination, an entire function body can be written as a single word. No more indentation rules to break copy/paste on blog comments, no more limitations to the one-line lambda, ? Latin is woefully insufficient. But Chukchi would work. From abarnert at yahoo.com Thu Mar 7 04:15:29 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 6 Mar 2013 19:15:29 -0800 Subject: [Python-ideas] Updated PEP 428 (pathlib) In-Reply-To: <5137BFE1.6060103@mrabarnett.plus.com> References: <20130303005129.26eb0e00@pitrou.net> <20130303114112.403019c6@pitrou.net> <20130305101638.2f30d4b3@pitrou.net> <51360E02.7050101@stoneleaf.us> <81CDA154-B979-4F5E-BC07-19A9B9A8D821@yahoo.com> <1362584728.25097.140661200789829.3588A7CD@webmail.messagingengine.com> <1362597357.18592.140661200879961.661220A3@webmail.messagingengine.com> <1362604726.91308.YahooMailNeo@web184706.mail.ne1.yahoo.com> <5137BFE1.6060103@mrabarnett.plus.com> Message-ID: <89964390-DE1B-4606-8CF7-1CC0F83D6837@yahoo.com> On Mar 6, 2013, at 14:14, MRAB wrote: > On the other hand, p.stripext('.zip') feels OK because it suggests > that you're stripping off the '.zip' extension (compare with str.strip), > but would that mean that you'd expect p.stripext() also to strip off > the extension, whatever it was? Given the earlier responses on this thread, I think it's safe to say that plenty of people, like Paul Moore, would either expect it, or complain about its nonexistence. Anyway, I think getting side-tracked on random's suggestion (which, by the way, is almost entirely my fault) has detracted from the main point. Even if we _did_ have an extended basename that can strip both dirname and extension, with every option you could possible desire, people would _still_ want a method that strips the exception only. And calling that method basename (or root) is ambiguous and misleading to people who expect that name to mean pulling off directories rather than extensions. My original point was that Unix created this confusion by overloading the meaning of basename decades ago. The question is whether that's sufficient justification to use the name in python. I don't think it is (again, I'd prefer stem or splitext), but it was at least worth asking. From abarnert at yahoo.com Fri Mar 8 20:01:27 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 8 Mar 2013 11:01:27 -0800 Subject: [Python-ideas] with ... except In-Reply-To: <20130308161625.1ba3902e@pitrou.net> References: <20130308111350.51baa3f3@pitrou.net> <20130308161625.1ba3902e@pitrou.net> Message-ID: On Mar 8, 2013, at 7:16, Antoine Pitrou wrote: > Le Fri, 8 Mar 2013 14:48:34 +0000, > Oscar Benjamin > a ?crit : >> >>> or: >>> >>> try: >>> with transaction.commit_on_success(): >>> ... >>> except ObjectDoesNotExist: >>> # do something else, perhaps clean up some internal cache >> >> Another possibility is a context manager that handles both things, >> e.g.: >> >> @contextmanager >> def commit_or_clean(errorcls): >> try: >> with transaction.commit_on_success(): >> yield >> except errorcls: >> clean() > > That's true, but only if the two things are strongly related, not in > the general case. I think his point is that if you need any specific case often enough in your app, they will be related, and you can write a single context manager that wraps up your specific case so you can use it. In other words, messy with and try statements can be refactored into functions as needed, very easily. If that's not his point, I apologize--but it was definitely my point when I said the same thing last week in response to the similar try with suggestion. From abarnert at yahoo.com Fri Mar 8 20:05:03 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 8 Mar 2013 11:05:03 -0800 Subject: [Python-ideas] with ... except In-Reply-To: <513A106F.3010007@mrabarnett.plus.com> References: <20130308111350.51baa3f3@pitrou.net> <20130308113342.5f54c381@pitrou.net> <201303081041.50701.mark.hackett@metoffice.gov.uk> <513A106F.3010007@mrabarnett.plus.com> Message-ID: >> And it's so angocentric. How come the calls are to repeat *English grammar*? >> German grammar would probably be a lot clearer for a compiler/interpreter. > German grammar? Surely you mean Dutch grammar! :-) If we used Inuit grammar it would be possible to incorporate almost any complete statement into a single word. This would satisfy the whitespace haters. From abarnert at yahoo.com Fri Mar 8 20:43:28 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 8 Mar 2013 11:43:28 -0800 Subject: [Python-ideas] Official MySQL module In-Reply-To: References: <5139098A.2090708@trueblade.com> <51399C10.2090409@egenix.com> <5139C4F8.7050301@trueblade.com> Message-ID: On Mar 8, 2013, at 9:45, Guido van Rossum wrote: > On Fri, Mar 8, 2013 at 6:07 AM, Joao S. O. Bueno wrote: > >> So, I do feel a bit uncomfortable with the idea of including bindings for >> 3rd party databases in the stdlib - but I think that is an emotional thing >> only , and can easily be rationalized away with >> "practicality beats purity" - and the mention somewhere else that >> in Free Software environment, saying "3rd party" can be >> misleading. > > I do not understand this last phrase. Open source software isn't built by corporations for their own corporate interest, but by people for their own personal needs. The same people could be working on MySQL and Python, and there's positive synergy with no conflict of interest, and no implied exclusion of anyone outside the implicit partnership-of-one-person. This means there's no such word as "third party" in open source. If this all sounds hand-wavy and hyperbolic and based on idealizations rather than reality, well, it's esr, what do you expect? But there is some truth there. Apple isn't going to support MySQL without first talking bizdev with Oracle and the competition. Python isn't a company, and doesn't have to think that way. So, "third party" doesn't have quite the same implications. From stephen at xemacs.org Sat Mar 9 08:56:04 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 09 Mar 2013 16:56:04 +0900 Subject: [Python-ideas] Official MySQL module In-Reply-To: References: <5139098A.2090708@trueblade.com> <51399C10.2090409@egenix.com> <5139C4F8.7050301@trueblade.com> Message-ID: <87mwudgg6j.fsf@uwakimon.sk.tsukuba.ac.jp> Andrew Barnert writes: > This means there's no such word as "third party" in open source. The distinction remains useful. As pointed out earlier, there's the user (first party), the project (second party), and the distributor(s) (third party) of any modules that neither first nor second party controls. The point of the distinction is that the user therefore bears a risk that the second party's product will not work as the user desires because of changes or bugs in the third party project. This can be due to bugs the third party doesn't acknowledge, or because of new features that the user wants/needs that the project doesn't support. True, that risk is mitigated in open source because the first two parties have an additional option: one can *take* responsibility for a third party module (by cooperating with the third party, for example), or even take control (of a forked version). Nevertheless, the risk remains. I don't have an opinion on whether that risk is prohibitive for MySQL (or PostgreSQL). But it is an important consideration, and the term "third party" is a useful abbreviation for it. BTW: > Open source software isn't built by corporations for their own > corporate interest, but by people for their own personal needs. Nobody builds *open source* software. People (including corporate "persons") build *software*, then maybe they release it as a proprietary product, or as open source, or both. The problem with corporations is that they have a fiduciary responsibility to produce income for their owners, which often overrides considerations of voluntary cooperation. From stefan_ml at behnel.de Sat Mar 9 10:11:11 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 09 Mar 2013 10:11:11 +0100 Subject: [Python-ideas] Official MySQL module In-Reply-To: References: Message-ID: Gregory P. Smith, 07.03.2013 23:00: > No database connector module should ever be part of the standard library > unless that entire database is included as part of Python distributions. > MySQL isn't part of Python so all mysql connector modules belong as third > party things (perhaps as part of mysql itself if they wanted to get their > act together). > > want sqlite? we bundle it. want something else? you have to install > something else separately so you have to install its connector module > separately as well. +1 I would also note that stdlib inclusion usually works the other way round: someone, usually the author, has to commit to a) contribute the code and b) maintain it in the stdlib, basically forever. While expressing the wish to have something in the stdlib is ok, it's not really useful unless it makes the "someone" above step up and sign that commitment. Stefan From solipsis at pitrou.net Sat Mar 9 11:44:07 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 9 Mar 2013 11:44:07 +0100 Subject: [Python-ideas] Official MySQL module References: <5139098A.2090708@trueblade.com> <51399C10.2090409@egenix.com> <5139C4F8.7050301@trueblade.com> <87mwudgg6j.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20130309114407.3f13bb23@pitrou.net> On Sat, 09 Mar 2013 16:56:04 +0900 "Stephen J. Turnbull" wrote: > > > Open source software isn't built by corporations for their own > > corporate interest, but by people for their own personal needs. > > Nobody builds *open source* software. Well, of course we do. If Python wasn't open source, I'm sure many of the current core developers wouldn't contribute to it. Being open source is one of the key reasons to contribute to it; it's not an afterthought. Regards Antoine. > People (including corporate > "persons") build *software*, then maybe they release it as a > proprietary product, or as open source, or both. > > The problem with corporations is that they have a fiduciary > responsibility to produce income for their owners, which often > overrides considerations of voluntary cooperation. From solipsis at pitrou.net Sat Mar 9 11:45:26 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 9 Mar 2013 11:45:26 +0100 Subject: [Python-ideas] with ... except References: <20130308111350.51baa3f3@pitrou.net> <20130308161625.1ba3902e@pitrou.net> Message-ID: <20130309114526.493b5000@pitrou.net> On Fri, 8 Mar 2013 11:01:27 -0800 Andrew Barnert wrote: > On Mar 8, 2013, at 7:16, Antoine Pitrou wrote: > > > Le Fri, 8 Mar 2013 14:48:34 +0000, > > Oscar Benjamin > > a ?crit : > >> > >>> or: > >>> > >>> try: > >>> with transaction.commit_on_success(): > >>> ... > >>> except ObjectDoesNotExist: > >>> # do something else, perhaps clean up some internal cache > >> > >> Another possibility is a context manager that handles both things, > >> e.g.: > >> > >> @contextmanager > >> def commit_or_clean(errorcls): > >> try: > >> with transaction.commit_on_success(): > >> yield > >> except errorcls: > >> clean() > > > > That's true, but only if the two things are strongly related, not in > > the general case. > > I think his point is that if you need any specific case often enough in your app, they will be related, and you can write a single context manager that wraps up your specific case so you can use it. In other words, messy with and try statements can be refactored into functions as needed, very easily. Of course, you are right in some way. But it's a bit of a shame to have to resort to a helper function because the syntax isn't powerful enough :-) Regards Antoine. From stefan_ml at behnel.de Sat Mar 9 11:53:45 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 09 Mar 2013 11:53:45 +0100 Subject: [Python-ideas] with ... except In-Reply-To: <20130309114526.493b5000@pitrou.net> References: <20130308111350.51baa3f3@pitrou.net> <20130308161625.1ba3902e@pitrou.net> <20130309114526.493b5000@pitrou.net> Message-ID: Antoine Pitrou, 09.03.2013 11:45: > On Fri, 8 Mar 2013 11:01:27 -0800 > Andrew Barnert wrote: >> On Mar 8, 2013, at 7:16, Antoine Pitrou wrote: >>> Le Fri, 8 Mar 2013 14:48:34 +0000, >>> Oscar Benjamin a ?crit : >>>>> or: >>>>> >>>>> try: >>>>> with transaction.commit_on_success(): >>>>> ... >>>>> except ObjectDoesNotExist: >>>>> # do something else, perhaps clean up some internal cache >>>> >>>> Another possibility is a context manager that handles both things, >>>> e.g.: >>>> >>>> @contextmanager >>>> def commit_or_clean(errorcls): >>>> try: >>>> with transaction.commit_on_success(): >>>> yield >>>> except errorcls: >>>> clean() >>> >>> That's true, but only if the two things are strongly related, not in >>> the general case. >> >> I think his point is that if you need any specific case often enough in your app, they will be related, and you can write a single context manager that wraps up your specific case so you can use it. In other words, messy with and try statements can be refactored into functions as needed, very easily. > > Of course, you are right in some way. But it's a bit of a shame to have > to resort to a helper function because the syntax isn't powerful > enough :-) OTOH, why add syntax for something that is easily done with a helper function? Stefan From ncoghlan at gmail.com Sat Mar 9 15:34:13 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 10 Mar 2013 00:34:13 +1000 Subject: [Python-ideas] with ... except In-Reply-To: References: <20130308111350.51baa3f3@pitrou.net> <20130308161625.1ba3902e@pitrou.net> <20130309114526.493b5000@pitrou.net> Message-ID: On Sat, Mar 9, 2013 at 8:53 PM, Stefan Behnel wrote: > Antoine Pitrou, 09.03.2013 11:45: > On Fri, 8 Mar 2013 11:01:27 -0800 >> Of course, you are right in some way. But it's a bit of a shame to have >> to resort to a helper function because the syntax isn't powerful >> enough :-) > > OTOH, why add syntax for something that is easily done with a helper function? It's not just with statements that don't play nice with supplementary exception handling - for loops actually have the same problem. In both cases, you can fairly easily put an exception handler around just the expression, or around the entire statement, but you can't quite so easily define custom exception handling for the protocol methods invoked implicitly in the statement header (__enter__ in with statements, __iter__ and __next__ in for loops). While loops technically suffer from it as well, but it's rather rare for a bool() invocation to risk triggering an exception. contextlib.ExitStack at least brings context managers close to on par with iterables - you can use either stack.enter_context(cm) and iter(iterable) to lift just the __enter__ or __iter__ call out into a separate try block. Wrapping an exception handler around next() pretty much requires reverting to a while loop, though. Ultimately, though, this may be an inevitable price we pay for the abstraction - you *do* lose flexibility when you design for the typical case, and so you do eventually have to say "sorry, to handle that more complex case you need to drop back down to the lower level syntax" (try/except/else/finally for with statements, while loops for for loops). An important part of minimising language complexity is actually recognising when that limit has been reached and saying, no, sorry, we want to keep the higher level API simple, so that use case won't be supported, since it can already be handled with the lower level API in those cases where it is needed. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From solipsis at pitrou.net Sat Mar 9 19:19:22 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 9 Mar 2013 19:19:22 +0100 Subject: [Python-ideas] Length hinting and preallocation for container types References: <513625E8.4060201@python.org> <876215jmg1.fsf@uwakimon.sk.tsukuba.ac.jp> <20130307112817.211fd496@pitrou.net> <87ip52iyvo.fsf@uwakimon.sk.tsukuba.ac.jp> <20130308110731.45ff8bac@pitrou.net> <513A13B9.6030503@pearwood.info> Message-ID: <20130309191922.26805bc4@pitrou.net> On Sat, 09 Mar 2013 03:37:13 +1100 Steven D'Aprano wrote: > On 09/03/13 00:27, Eli Bendersky wrote: > > > If it's voting time, I'm -1. Having programmed a lot of memory-constrained > > systems (not in Python, though) - this is not how things usually work > > there. In a memory-constrained system, you don't "grow and shrink" your > > data structures. That's because growing often needs to reallocate the whole > > chunk and do a copy, and shrinking only helps memory fragmentation. In such > > systems, you usually know in advance or at least limit the size of data > > structures and pre-allocate, which is perfectly possible in Python today. > > > Are you referring to using (say) [None]*n, for some size n? > > Is this a language guarantee that it won't over-allocate, or just an accident > of implementation? Probably an accident of implementation, but I can't think of any reason to over-allocate here, so probably all implementations allocate exactly. Regards Antoine. From tjreedy at udel.edu Sat Mar 9 21:38:28 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 09 Mar 2013 15:38:28 -0500 Subject: [Python-ideas] with ... except In-Reply-To: References: <20130308111350.51baa3f3@pitrou.net> <20130308161625.1ba3902e@pitrou.net> <20130309114526.493b5000@pitrou.net> Message-ID: On 3/9/2013 9:34 AM, Nick Coghlan wrote: > Ultimately, though, this may be an inevitable price we pay for the > abstraction - you *do* lose flexibility when you design for the > typical case, and so you do eventually have to say "sorry, to handle > that more complex case you need to drop back down to the lower level > syntax" (try/except/else/finally for with statements, while loops for > for loops). An important part of minimising language complexity is > actually recognising when that limit has been reached and saying, no, > sorry, we want to keep the higher level API simple, so that use case > won't be supported, since it can already be handled with the lower > level API in those cases where it is needed. Nicely put. -- Terry Jan Reedy From abarnert at yahoo.com Sun Mar 10 02:59:18 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 9 Mar 2013 17:59:18 -0800 (PST) Subject: [Python-ideas] with ... except In-Reply-To: References: <20130308111350.51baa3f3@pitrou.net> <20130308161625.1ba3902e@pitrou.net> <20130309114526.493b5000@pitrou.net> Message-ID: <1362880758.80582.YahooMailNeo@web184705.mail.ne1.yahoo.com> > From: Nick Coghlan > Sent: Saturday, March 9, 2013 6:34 AM I agree completely with your main point. But: > contextlib.ExitStack at least brings context managers close to on par > with iterables - you can use either stack.enter_context(cm) and > iter(iterable) to lift just the __enter__ or __iter__ call out into a > separate try block. Wrapping an exception handler around next() pretty > much requires reverting to a while loop, though. It's worth noting off the bat that you don't actually need this too often, because most iterators can't be resumed after exception anyway.?But when you do, it's possible. Of course you need a next-wrapper, but that's no different from needing a with-wrapper or an iter-wrapper.?For example, let's say you wanted this fictitious construct: ? ? fooed_it = ((try: x.foo() except: continue) for x in it) You can just do this: ? ? def skip_exceptions(it): ? ? ? ? while True: ? ? ? ? ? ? try: ? ? ? ? ? ? ? ? yield next(it) ? ? ? ? ? ? except StopIteration: ? ? ? ? ? ? ? ? raise ? ? ? ? ? ? except: ? ? ? ? ? ? ? ? pass ? ? fooed_it = (x.foo() for x in skip_exceptions(it)) And needless to say, you can put in a realistic exception handler instead of just a skip-everything clause. I've actually got code similar to this. I've got a C-library enumerator-type function that can return both fatal and non-fatal errors. The first-level wrapper around this provides a next() that raises on non-fatal errors. Then I've got a wrapper around that which logs and continues for all exceptions but StopIteration and fatal errors. > Ultimately, though, this may be an inevitable price we pay for the > abstraction - you *do* lose flexibility when you design for the > typical case, and so you do eventually have to say "sorry, to handle > that more complex case you need to drop back down to the lower level > syntax" (try/except/else/finally for with statements, while loops for > for loops).? I think it's even better than that. You can use the higher-level abstractions even in more complex cases, as long as you build a somewhat complex wrapper. And this has the same tradeoffs as any refactoring?if you save more complexity in the top-level code than you spend in the helper code, it's worth doing. But the end result is the same: Making the abstractions more flexible makes them more complex, and there's a point at which the benefit (not requiring helpers in as many cases) loses to the cost. From ethan at stoneleaf.us Mon Mar 11 17:18:43 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 11 Mar 2013 09:18:43 -0700 Subject: [Python-ideas] PEP XXX - Competitor with PEP 435: Adding an enum type to the Python standard library Message-ID: <513E03E3.6010809@stoneleaf.us> First, I offer my apologies to all who are still battle-weary from the last flurry of enum threads. However, while flufl.enum (PEP 435) is a good package for the use-case it handles, there are plenty of use-cases that it does not handle, or doesn't handle well, that it should not be the implementation in the stdlib. To quote one of my earlier emails: > I'm beginning to see why enums as a class has not yet been added to Python. > We don't want to complicate the language with too many choices, yet there is > no One Obvious Enum to fit the wide variety of use-cases: > > - named int enums (http status codes) > - named str enums (tkinter options) > - named bitmask enums (file-type options) > - named valueless enums (any random set of names) > - named valueless-yet-orderable enums (any not-so-random set of names ;) This new PEP proposes an enum module that handles all those use cases, and makes it possible to handle others as well. If you recognize your idea but don't see your name in acknowledgements, please let me know. Code is available at https://bitbucket.org/stoneleaf/aenum -- ~Ethan~ ============================================================================================== -------------- next part -------------- PEP: xxx Title: Adding an Enum type to the Python standard library Version: $Revision$ Last-Modified: $Date$ Author: Ethan Furman Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 2013-03-08 Python-Version: 3.4 Post-History: 2013-03-08 Abstract ======== This PEP proposes adding an enumeration type to the Python standard library. Portions are derived or excerpted from PEP-0435. An enumeration is a set of symbolic names bound to unique, constant integer values. Depending on the circumstances, the exact integer can be integral to the enums use (the name is simply an easy way to refer to the number) or the integer value may simply serve as a way to select which enumeration to use (e.g. to store or retrieve from a database). Motivation ========== *[Based partly on the Motivation stated in PEP 354 and 435]* The properties of an enumeration are useful for defining an immutable, related set of constant values. Classic examples are days of the week (Sunday through Saturday) and school assessment grades ('A' through 'D', and 'F'). Other examples include error status values and states within a defined process. It is possible to simply define a sequence of values of some other basic type, such as ``int`` or ``str``, to represent discrete arbitrary values. However, an enumeration ensures that such values are distinct from any others including, importantly, values within other enumerations, and that operations without meaning ("Wednesday times two") are not defined for these values. It also provides a convenient printable representation of enum values without requiring tedious repetition while defining them (i.e. no ``GREEN = 'green'``). Rationale ========= Discussions to add enumerations to Python regularly occur, yet one has yet to be added; I suspect this is due to the wide range of use cases:: - named int enums (http status codes) - named str enums (tkinter options) - named bitmask enums (file-type options) - named valueless enums (any random set of names) - named valueless-yet-orderable enums (any not so random set of names) Clearly, to have one Enum fulfill all those roles would make for complicated, hard to maintain code. Alternatives ============ flufl.enum has been proposed by Guido:: pros: established, well-tested, handles its use-case very well (value-less, unordered) cons: only handles one of the use cases well, with some support for a second (named int enums) Tim Delaney, Alex Stewart, and myself have come up with largely similar, alternate enum implementations that better support more of the above use cases.. Proposal ======== This PEP proposes that the enumeration implementation Aenum be accepted as Python's stdlib enum. Aenum has one base Enum, three derived Enums --BitMask, Sequence, and String-- and two options (ORDER and INDEX) to allow users to easily create their own extended types in the few cases where these do not meet their needs. Module and type name ==================== I propose to add a module named ``enum`` to the standard library. The main type exposed by this module is ``Enum``, with subtypes ``Sequence``, ``String``, and ``BitMask``, the options ``INDEX`` and ``ORDER``, and one helper class, ``enum``. ``Enum` - a valueless, unordered type. It's related integer value is merely to allow for database storage and selection from the enumerated class. An ``Enum`` will not compare equal with its integer value, but can compare equal to other enums of which it is a subclass. ``Sequence`` - a named ``int``. The enumerated name is merely to identify the value, and all normal integer operations are supported, but the resulting value will not be an enumeration. A ``Sequence`` will compare equal to its integer value; in fact, it /is/ its integer value, and any mathematical operations will return an ``int``, not an ``enum``. ``String`` - a named ``str``. The enumerated name may be a shortcut for a longer string, but by default the enumerated name is the string it represents. Like ``Enum``s, ``String``s do not compare equal to its integer value; but similar to ``Sequence``s, ``String``s act just like ``str``, and any operations will return a ``str``, not an ``enum``. ``BitMask`` - an ``Enum`` that supports the bitwise operations, and whose integer values are matched accordingly (0, 1, 2, 4, 8, 16, etc.). ``INDEX`` - adding this option adds the ``__index__`` magic method to the enum class; the associated integer is returned. ``ORDER`` - adding this option adds the ``__lt__``, ``__le__``, ``__ge__``, and ``__gt__`` magic methods. ``enum`` - supports auto-numbering enumerated names, as well as assigning other attributes/values to the ``enum`` instance. Proposed semantics for the new enumeration type =============================================== Creating an Enum ---------------- Enumerations are primarily created using the class syntax, which makes them easy to read and write. Every enumeration value must have a unique integer value and the only restriction on their names is that they must be valid Python identifiers. To define an enumeration, derive from the ``Enum``, ``BitMask``, ``Sequence``, or ``String`` classes and add attributes either with assignment to their integer values, or with assignment of the ``enum()`` helper that supports auto-numbering:: >>> from enum import Enum >>> class Colors(Enum): ... black = 0 ... red = 1 ... green = 2 ... blue = 3 or:: >>> from enum import Enum, enum >>> class Colors(Enum): ... black = enum() # auto-numbered at 0 ... red = enum() # at 1 ... green = enum() # at 2 ... blue = enum() # at 3 >>> from enum import BitMask, enum >>> class FieldOptions(BitMask): ... none = enum() # auto-numbered at 0 ... binary = enum() # at 1 ... auto_inc = enum() # at 2 ... unique = enum() # at 4 ... nullable = enum() # at 8 >>> from enum import Sequence, enum >>> class HttpStatus(Sequence): ... ok = enum(doc="request fulfilled", integer=200) ... created = enum(doc="POST success", integer=201) ... redirect = enum(doc="permanent redirect", integer=301) ... forbidden = enum(doc="authorization will not help", integer=403) ... not_implemented = enum(doc="service is not implemented", integer=501) >>> from enum import String, enum >>> class TkLocation(String): ... N = enum(value='north') # auto-numbered at 0 ... S = enum(value='south') # 1 ... E = enum(value='east') # 2 ... W = enum(value='west') # 3 Equal and not-equal work for all enumerations:: >>> Colors.red == Colors.red True >>> FieldOptions.binary != HttpStatus.created True >>> TkLocation.N == 'north' # String enums are instances of str! True >>> HttpStatus.redirect == 301 # Sequence enums are instances of int! True Less-than, less-than-or-equal, greater-than, and greater-than-or-equal only work for ordered enums: >>> HttpStatus.ok < HttpStatus.forbidden True >>> TkLocation.N > TkLocation.E # String enums are str, so ordering False # is str-based Unordered enums raise an exception: >>> Colors.red < Colors.blue Traceback (most recent call last): ... TypeError: unorderable types: Color() < Color() But you can add order if you need it: >>> from enum import Enum, ORDER >>> class Grades(Enum, EnumOptions=ORDER): ... A = enum(doc="Excellent", integer=5) ... B = enum(doc="Above Average", integer=4) ... C = enum(doc="Average", integer=3) ... D = enum(doc="Below Average", integer=2) ... F = enum(doc="Insufficient", integer=1) >>> Grades.A > Grades.B True >>> Grades.D < Grades.C True >>> Grades.F > Grades.C False Typically, enumerations will not compare equal to either their integer value nor their string name:: >>> Colors.red == 1 False >>> FieldOptions.nullable == 8 False >>> TkLocation.E == 3 False >>> Colors.blue == 'blue' False >>> FieldOptions.unique == 'unique' False >>> HttpStatus.ok == 'ok' False >>> TkLocation.N == 'N' False Unless the enum is Sequence or a default String: >>> HttpStatus.ok == 200 True >>> class TkManager(String): ... grid = enum() # auto-numbered at 0, auto-valued at 'grid' ... pack = enum() # auto-numbered at 1, auto-valued at 'pack' >>> TkManager.pack == 'pack' True Enumeration values have nice, human readable string representations:: >>> print(Colors.red) Colors.red ...while their repr has more information:: >>> print(repr(Colors.red)) Colors("red", integer=1) The enumeration instances are available through the class:: >>> for color in Colors: ... print(color) black red green blue Enums also have a property that contains just their item name:: >>> print(Colors.black.__name__) black >>> print(Colors.red.__name__) red >>> print(Colors.green.__name__) green >>> print(Colors.blue.__name__) blue The str and repr of the enumeration class also provides useful information:: >>> print(Colors) 'Colors(black=0, red=1, green=2, blue=3)' >>> print(repr(Colors)) 'Colors(black=0, red=1, green=2, blue=3)' You can extend previously defined Enums by subclassing:: >>> class MoreColors(Colors): ... cyan = 4 ... magenta = 5 ... yellow = 6 When extended in this way, the base enumeration's values are identical to the same named values in the derived class:: >>> Colors.red == MoreColors.red True >>> Colors.blue == MoreColors.blue True However, if you define an enumeration that is not subclassing, with similar item names and/or integer values, they will not be identical:: >>> class OtherColors(Enum): ... red = 1 ... blue = 2 ... yellow = 3 >>> Colors.red == OtherColors.red False >>> Colors.blue != OtherColors.blue True >>> MoreColors.yellow == OtherColors.yellow False These enumerations are not equal, nor do they hash equally:: >>> Colors.red == OtherColors.red False >>> len(set((Colors.red, OtherColors.red))) 2 When you need the integer equivalent values, you can convert enumerations explicitly using the ``int()`` built-in. This is quite convenient for storing enums in a database, as well as for interoperability with C extensions that expect integers:: >>> int(colors.black) 0 >>> int(Colors.red) 1 >>> int(Colors.green) 2 >>> int(Colors.blue) 3 You can also convert back to the enumeration value by calling the Enum subclass, passing in the integer value for the item you want:: >>> Colors(0) Colors("black", integer=0) >>> Colors(3) Colors("blue", integer=3) >>> Colors(1) == Colors.red True The Enum subclass also accepts the string name of the enumeration value:: >>> Colors('green') Colors("green", integer=2) >>> Colors('blue') == Colors.blue True You get exceptions though, if you try to use invalid arguments:: >>> Colors('magenta') Traceback (most recent call last): ... enum.InvalidEnum: magenta is not a valid Color >>> Colors(99) Traceback (most recent call last): ... enum.InvalidEnum: 99 is not a valid Color ValueError: 99 The integer equivalent values serve another purpose. You may not define two enumeration values with the same integer value:: >>> class Bad(Enum): ... cartman = 1 ... stan = 2 ... kyle = 3 ... kenny = 3 # Oops! ... butters = 4 Traceback (most recent call last): ... TypeError: Multiple enum values: 3 You also may not duplicate values in derived enumerations:: >>> class BadColors(Colors): ... yellow = 4 ... chartreuse = 2 # Oops! Traceback (most recent call last): ... TypeError: Multiple enum values: 2 The Enum class support iteration. Enumeration values are returned in the sorted order of their integer equivalent values:: >>> [v.__name__ for v in MoreColors] ['black', 'red', 'green', 'blue', 'cyan', 'magenta', 'yellow'] >>> [int(v) for v in MoreColors] [0, 1, 2, 3, 4, 5, 6] Enumeration values are hashable, so they can be used in dictionaries and sets:: >>> apples = {} >>> apples[Colors.red] = 'red delicious' >>> apples[Colors.green] = 'granny smith' >>> for color in sorted(apples, key=int): ... print(color.name, '->', apples[color]) red -> red delicious green -> granny smith Pickling (not yet implemented) -------- Enumerations created with the class syntax can also be pickled and unpickled:: >>> from enum.tests.fruit import Fruit >>> from pickle import dumps, loads >>> Fruit.tomato is loads(dumps(Fruit.tomato)) True Convenience API --------------- You can also create enumerations using the class method ``create()``, which takes either a string of space-separated names or an iterable object of ``(name, value)`` pairs. The first argument to ``create()`` is the name of the enumeration. The second argument is a *source* which can be either a string of space-separated names, or an iterable of ``(name, value)`` pairs. In the most basic usage, *source* is a sequence of strings which name the enumeration items. In this case, the values are automatically assigned starting from 0:: >>> from enum import Enum >>> Enum.create('Animals', 'ant bee cat dog')) The items in source can also be 2-tuples, where the first item is the enumeration value name and the second is the integer value to assign to the value. If 2-tuples are used, all items must be 2-tuples:: >>> from enum import Sequence >>> Sequence.create('Row', (('name', 0), ('mobile', 1), ('email', 2), ('url', 3)) Row(name=0, mobile=1, email=2, url=3 Proposed variations =================== Some variations were proposed during the discussions in the mailing list. Here's some of the more popular ones. Not having to specify values for enums -------------------------------------- Michael Foord proposed (and Tim Delaney provided a proof-of-concept implementation) to use metaclass magic that makes this possible:: class Color(Enum): red, green, blue The values get actually assigned only when first looked up. Pros: cleaner syntax that requires less typing for a very common task (just listing enumeration names without caring about the values). Cons: involves much magic in the implementation, which makes even the definition of such enums baffling when first seen. If a name is duplicated the magic won't see it (as it already exists) so it won't get the next integer, which can lead to hard to find bugs. Using special names or forms to auto-assign enum values ------------------------------------------------------- A different approach to avoid specifying enum values is to use a special name or form to auto assign them. For example:: class Color(Enum): red = None # auto-assigned to 0 green = None # auto-assigned to 1 blue = None # auto-assigned to 2 More flexibly:: class Color(Enum): red = 7 green = None # auto-assigned to 8 blue = 19 purple = None # auto-assigned to 20 Some variations on this theme: #. A special name ``auto`` imported from the enum package. #. Georg Brandl proposed ellipsis (``...``) instead of ``None`` to achieve the same effect. Pros: no need to manually enter values. Makes it easier to change the enum and extend it, especially for large enumerations. Cons: actually longer to type in many simple cases. The argument of explicit vs. implicit applies here as well. The variation that won ---------------------- While only saving on typing if you use cut and paste, the winning idea was to use a helper function, ``enum()``, that would allow the integer to not be specified. It also allows for other arbitrary attributes to be set on the enumerated values (most importantly, a doc string). Use-cases in the standard library ================================= The Python standard library has many places where the usage of enums would be beneficial to replace other idioms currently used to represent them. Such usages can be divided to two categories: user-code facing constants, and internal constants. User-code facing constants like ``os.SEEK_*``, ``socket`` module constants, decimal rounding modes, HTML error codes, etc., would benefit by being converted to enums. Because they are now either ``int``s or ``str``s the ``Sequence`` or ``String`` type enums would have to be used, as they are interchangeable with ``int``s and ``str``s and thus would not break backward- compatibility. Internal constants are not seen by user code but are employed internally by stdlib modules. It appears that nothing should stand in the way of implementing such constants with the standard Enum. Some examples uncovered by a very partial skim through the stdlib: ``binhex``, ``imaplib``, ``http/client``, ``urllib/robotparser``, ``idlelib``, ``concurrent.futures``, ``turtledemo``. In addition, looking at the code of the Twisted library, there are many use cases for replacing internal state constants with enums. The same can be said about a lot of networking code (especially implementation of protocols) and can be seen in test protocols written with the Tulip library as well. Differences from PEP 354 and 435 ================================ Unlike PEP 354, enumeration values are not defined as a sequence of strings, but as attributes of a class. This design was chosen because it was felt that class syntax is more readable. Unlike PEP 354, enumeration values require an integer value. This difference recognizes that enumerations often represent real-world values, or must interoperate with external real-world systems. For example, to store an enumeration in a database, it is better to convert it to an integer on the way in and back to an enumeration on the way out. Providing an integer value also provides an explicit ordering. Unlike PEP 354, this implementation does use a metaclass to define the enumeration's syntax, and allows for extended base-enumerations so that the common values in derived classes are comparable. Like PEP 435 enumerations within a class are singletons, but unlike PEP 435 subclassed values, while comparing equal, are not the same object. Acknowledgments =============== This PEP describes the ``aenum`` module by Ethan Furman. ``aenum`` is based on examples and ideas from Michael Foord, Tim Delaney, Alex Stewart, Yuri Selivanov and ``flufl.enum`` by Barry Warsaw. Ben Finney is the author of the earlier enumeration PEP 354 [1]_, and Barry Warsaw and Eli Bendersky are the authors of the competing PEP 435 [2]_. References ========== .. [1] http://www.python.org/dev/peps/pep-0354/ .. [2] http://www.python.org/dev/peps/pep-0435/ Copyright ========= This document has been placed in the public domain. Todo ==== * Mark PEP 354 "superseded by" this one, if accepted * New package name within stdlib - enum? (top-level) * Verify (and add support if need be) that other base-types will function as enums; e.g. ``class Constants(float, metaclass=EnumType)`` .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: From p.f.moore at gmail.com Mon Mar 11 17:55:28 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 11 Mar 2013 16:55:28 +0000 Subject: [Python-ideas] PEP XXX - Competitor with PEP 435: Adding an enum type to the Python standard library In-Reply-To: <513E03E3.6010809@stoneleaf.us> References: <513E03E3.6010809@stoneleaf.us> Message-ID: On 11 March 2013 16:18, Ethan Furman wrote: > First, I offer my apologies to all who are still battle-weary from the last > flurry of enum threads. [...] > This new PEP proposes an enum module that handles all those use cases, and > makes it possible to handle others as well. One thing that is not discussed in the PEP (at least from my quick reading of it) is the issue of transitivity of equality (untested code, sorry, but I believe this is the intention of the PEP): >>> class E1(Enum): ... example = 1 >>> class E2(Enum): ... example = 1 >>> E1.example == 1 True >>> E2.example == 1 True >>> E1.example == E2.example False >>> E1.example == 1 == E2.example True >>> E1.example == E2.example == 1 False Personally, I find this behaviour unacceptable. The lack of a good way of dealing with this issue (I don't particularly like the "just use int()" or special value property approaches either) seems to be key to why people are failing to agree on an implementation... At the very least, there should be a section in the PEP addressing the discussion over this. The motivation section isn't particularly strong, either. There's nothing there that would persuade me to use enums. In general, the proposal seems geared mainly at people who *already* want to use enums, but for some reason aren't comfortable with using a 3rd party package. But maybe that just implies that I'm not part of the target audience for the feature, in which case fine (I don't have to use it, obviously...) Paul. From ethan at stoneleaf.us Mon Mar 11 18:25:11 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 11 Mar 2013 10:25:11 -0700 Subject: [Python-ideas] PEP XXX - Competitor with PEP 435: Adding an enum type to the Python standard library In-Reply-To: References: <513E03E3.6010809@stoneleaf.us> Message-ID: <513E1377.60506@stoneleaf.us> On 03/11/2013 09:55 AM, Paul Moore wrote: > On 11 March 2013 16:18, Ethan Furman wrote: >> First, I offer my apologies to all who are still battle-weary from the last >> flurry of enum threads. > [...] >> This new PEP proposes an enum module that handles all those use cases, and >> makes it possible to handle others as well. > > One thing that is not discussed in the PEP (at least from my quick > reading of it) is the issue of transitivity of equality There are four types you can inherit from: Enum, BitMask, Sequence, and String. Enum, BitMask, and String will not compare equal with integers, and Enum, BitMask and Sequence will not compare equal with strings; in fact, Enum and Bitmask, not being based on an existing data type, will not compare equal with anything that is not in their own enumeration group or its superclass. Sequences will compare equal with ints, because they are ints; they will also compare equal against other Sequence enumerations, as they are also ints. Same deal with Strings and strs. Those two are basically NamedValues in a fancy enum package. > Personally, I find this behaviour unacceptable. The lack of a good way > of dealing with this issue (I don't particularly like the "just use > int()" or special value property approaches either) seems to be key to > why people are failing to agree on an implementation... Hopefully that's clearer now. > At the very least, there should be a section in the PEP addressing the > discussion over this. I'll get it added (probably next weekend, when I make any other necessary changes as well). > The motivation section isn't particularly strong, either. There's > nothing there that would persuade me to use enums. In general, the > proposal seems geared mainly at people who *already* want to use > enums, but for some reason aren't comfortable with using a 3rd party > package. But maybe that just implies that I'm not part of the target > audience for the feature, in which case fine (I don't have to use it, > obviously...) My main use case is an easy mapping from names to numbers, but that's not everyone's. -- ~Ethan~ From p.f.moore at gmail.com Mon Mar 11 19:31:29 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 11 Mar 2013 18:31:29 +0000 Subject: [Python-ideas] PEP XXX - Competitor with PEP 435: Adding an enum type to the Python standard library In-Reply-To: <513E1377.60506@stoneleaf.us> References: <513E03E3.6010809@stoneleaf.us> <513E1377.60506@stoneleaf.us> Message-ID: On 11 March 2013 17:25, Ethan Furman wrote: > >> Personally, I find this behaviour unacceptable. The lack of a good way >> of dealing with this issue (I don't particularly like the "just use >> int()" or special value property approaches either) seems to be key to >> why people are failing to agree on an implementation... > > > Hopefully that's clearer now. Yes, it is - thanks. I missed the significance of "Sequence" in the PEP (it's not a very obvious name for this functionality). Maybe there's a better name, or if not then at least making this aspect more prominent in the PEP would probably help. Paul From g.rodola at gmail.com Mon Mar 11 20:20:28 2013 From: g.rodola at gmail.com (=?ISO-8859-1?Q?Giampaolo_Rodol=E0?=) Date: Mon, 11 Mar 2013 20:20:28 +0100 Subject: [Python-ideas] Unify addresses passed to bind() and getaddrinfo() Message-ID: >From http://docs.python.org/3/library/socket.html#example ...when using bind(): HOST = "" # Symbolic name meaning all available interfaces ...when using getaddrinfo(): HOST = None # Symbolic name meaning all available interfaces I recently got bitten by this difference in that I passed "" (empy string) to getaddrinfo() and since my DNS server was temporarily down getaddrinfo() hung indefinitely (cset where I fixed this: https://code.google.com/p/pyftpdlib/source/detail?r=1194#). It took me quite some time to figure out what was wrong as to me "" has always been an alias for "all interfaces" and didn't know getaddrinfo() behaved differently than bind() in this regard. If from getaddrinfo() standpoint "" (empty string) has no particular meaning I propose to unify the two APIs and make getaddrinfo() assume "" is an alias for "all interfaces". Thoughts? --- Giampaolo https://code.google.com/p/pyftpdlib/ https://code.google.com/p/psutil/ https://code.google.com/p/pysendfile/ From greg.ewing at canterbury.ac.nz Tue Mar 12 01:07:30 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 12 Mar 2013 13:07:30 +1300 Subject: [Python-ideas] PEP XXX - Competitor with PEP 435: Adding an enum type to the Python standard library In-Reply-To: References: <513E03E3.6010809@stoneleaf.us> <513E1377.60506@stoneleaf.us> Message-ID: <513E71C2.9090006@canterbury.ac.nz> Paul Moore wrote: > I missed the significance of "Sequence" in the > PEP (it's not a very obvious name for this functionality). It seems like an extremely bad name to me, since the word "sequence" already has a quite different technical meaning in Python. Maybe "Ordinal"? -- Greg From stephen at xemacs.org Tue Mar 12 01:20:17 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 12 Mar 2013 09:20:17 +0900 Subject: [Python-ideas] PEP XXX - Competitor with PEP 435: Adding an enum type to the Python standard library In-Reply-To: References: <513E03E3.6010809@stoneleaf.us> <513E1377.60506@stoneleaf.us> Message-ID: <87sj41fozi.fsf@uwakimon.sk.tsukuba.ac.jp> Paul Moore writes: > Yes, it is - thanks. I missed the significance of "Sequence" in the > PEP Overall the PEP's presentation suffers severely from class-instance confusion. In the *enumeration* Color(red, green, blue), Color.red is not an "enumeration", it is an *enumerator* (or element or member). Ditto for all the concrete class names. I'm not sure this really matters (I'm the kind of pedant who is enraged by the use of "it's" as a possessive), but it bugs me. In the case of "Sequence", I would assume that the actual behavior is equivalent to range() with names. Based on past discussion of Ethan's use case, I suppose that that's not quite accurate. Rather, I expect that Sequence actually addresses the question of using an enumerator as an index to a Python sequence (list, etc.) From ethan at stoneleaf.us Tue Mar 12 01:14:44 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 11 Mar 2013 17:14:44 -0700 Subject: [Python-ideas] PEP XXX - Competitor with PEP 435: Adding an enum type to the Python standard library In-Reply-To: <513E71C2.9090006@canterbury.ac.nz> References: <513E03E3.6010809@stoneleaf.us> <513E1377.60506@stoneleaf.us> <513E71C2.9090006@canterbury.ac.nz> Message-ID: <513E7374.5000702@stoneleaf.us> On 03/11/2013 05:07 PM, Greg Ewing wrote: > Paul Moore wrote: >> I missed the significance of "Sequence" in the >> PEP (it's not a very obvious name for this functionality). > > It seems like an extremely bad name to me, since the word > "sequence" already has a quite different technical meaning > in Python. > > Maybe "Ordinal"? Naming has never been my strong point. How about `Integer` to match `String`? -- ~Ethan~ From ethan at stoneleaf.us Tue Mar 12 01:47:23 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 11 Mar 2013 17:47:23 -0700 Subject: [Python-ideas] PEP XXX - Competitor with PEP 435: Adding an enum type to the Python standard library In-Reply-To: <87sj41fozi.fsf@uwakimon.sk.tsukuba.ac.jp> References: <513E03E3.6010809@stoneleaf.us> <513E1377.60506@stoneleaf.us> <87sj41fozi.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <513E7B1B.6080702@stoneleaf.us> On 03/11/2013 05:20 PM, Stephen J. Turnbull wrote: > Paul Moore writes: > > > Yes, it is - thanks. I missed the significance of "Sequence" in the > > PEP > > Overall the PEP's presentation suffers severely from class-instance > confusion. In the *enumeration* Color(red, green, blue), Color.red is > not an "enumeration", it is an *enumerator* (or element or member). > Ditto for all the concrete class names. I'm not sure this really > matters (I'm the kind of pedant who is enraged by the use of "it's" as > a possessive), but it bugs me. I would not be sad if you chose to go through and fix those things -- it seems I lack the necessary vocabulary. Although I usually get `its` correct. ;) > In the case of "Sequence", I would assume that the actual behavior is > equivalent to range() with names. Based on past discussion of Ethan's > use case, I suppose that that's not quite accurate. Rather, I expect > that Sequence actually addresses the question of using an enumerator > as an index to a Python sequence (list, etc.) That's pretty much what I use it for, but it also solves Antoine's use case of, e.g., http status codes that otherwise act like normal ints. ~Ethan~ From eliben at gmail.com Tue Mar 12 13:23:58 2013 From: eliben at gmail.com (Eli Bendersky) Date: Tue, 12 Mar 2013 05:23:58 -0700 Subject: [Python-ideas] PEP XXX - Competitor with PEP 435: Adding an enum type to the Python standard library In-Reply-To: <513E1377.60506@stoneleaf.us> References: <513E03E3.6010809@stoneleaf.us> <513E1377.60506@stoneleaf.us> Message-ID: On Mon, Mar 11, 2013 at 10:25 AM, Ethan Furman wrote: > On 03/11/2013 09:55 AM, Paul Moore wrote: > >> On 11 March 2013 16:18, Ethan Furman wrote: >> >>> First, I offer my apologies to all who are still battle-weary from the >>> last >>> flurry of enum threads. >>> >> [...] >> >>> This new PEP proposes an enum module that handles all those use cases, >>> and >>> makes it possible to handle others as well. >>> >> >> One thing that is not discussed in the PEP (at least from my quick >> reading of it) is the issue of transitivity of equality >> > > There are four types you can inherit from: Enum, BitMask, Sequence, and > String. > Enum, BitMask, and String will not compare equal with integers, and Enum, > BitMask > and Sequence will not compare equal with strings; in fact, Enum and > Bitmask, not > being based on an existing data type, will not compare equal with anything > that is > not in their own enumeration group or its superclass. Sequences will > compare equal with ints, because they are ints; they will also compare > equal > against other Sequence enumerations, as they are also ints. Same deal with > Strings and strs. Those two are basically NamedValues in a fancy enum > package. > First of all, thanks for working on this. It's healthy to have several approaches to solve the same problem. That said, I'm very much against the alternative you propose. The reason boils down to basic Pythonic principles. I imagine myself a few years from now reading Python 3.4+ code and seeing usage of these Enum, BitMask, Sequence (Integer?) and String classes, all slightly different in subtle ways, and that imaginary self will no doubt reach for the reference documentation on every occasion. That's because the 4 are similar yet different, and because they have no parallel in the C/C++ world (at least most of them don't). On the other hand, with flufl.enum, *because* it's so simple, it's very easy to grasp pretty immediately since it has few well defined uses cases that are similar in spirit to C's enum. Yes, in some cases I won't be able to use flufl.enum, and I'll fall back to the current "solution" of not having an enum package at all. But in the cases where I use it, I'll at least know that my code is becoming more readable. To summarize, my personal preference in priority order is: 1. get flufl.enum into stdlib, or a similarly simple proposal 2. don't get any enum package into stdlib at this point 3. get this alternative 4-class approach into stdlib Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Tue Mar 12 15:36:22 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 12 Mar 2013 07:36:22 -0700 Subject: [Python-ideas] PEP XXX - Competitor with PEP 435: Adding an enum type to the Python standard library In-Reply-To: References: <513E03E3.6010809@stoneleaf.us> <513E1377.60506@stoneleaf.us> Message-ID: <513F3D66.1080000@stoneleaf.us> On 03/12/2013 05:23 AM, Eli Bendersky wrote: > > On Mon, Mar 11, 2013 at 10:25 AM, Ethan Furman > wrote: > > On 03/11/2013 09:55 AM, Paul Moore wrote: > > On 11 March 2013 16:18, Ethan Furman > wrote: > > First, I offer my apologies to all who are still battle-weary from the last > flurry of enum threads. > > [...] > > This new PEP proposes an enum module that handles all those use cases, and > makes it possible to handle others as well. > > > One thing that is not discussed in the PEP (at least from my quick > reading of it) is the issue of transitivity of equality > > > There are four types you can inherit from: Enum, BitMask, Sequence, and String. > Enum, BitMask, and String will not compare equal with integers, and Enum, BitMask > and Sequence will not compare equal with strings; in fact, Enum and Bitmask, not > being based on an existing data type, will not compare equal with anything that is > not in their own enumeration group or its superclass. Sequences will > compare equal with ints, because they are ints; they will also compare equal > against other Sequence enumerations, as they are also ints. Same deal with > Strings and strs. Those two are basically NamedValues in a fancy enum package. > > > First of all, thanks for working on this. It's healthy to have several approaches to solve the same problem. That said, > I'm very much against the alternative you propose. The reason boils down to basic Pythonic principles. I imagine myself > a few years from now reading Python 3.4+ code and seeing usage of these Enum, BitMask, Sequence (Integer?) and String > classes, all slightly different in subtle ways, and that imaginary self will no doubt reach for the reference > documentation on every occasion. That's because the 4 are similar yet different, and because they have no parallel in > the C/C++ world (at least most of them don't). > > On the other hand, with flufl.enum, *because* it's so simple, it's very easy to grasp pretty immediately since it has > few well defined uses cases that are similar in spirit to C's enum. Yes, in some cases I won't be able to use > flufl.enum, and I'll fall back to the current "solution" of not having an enum package at all. But in the cases where I > use it, I'll at least know that my code is becoming more readable. Would it be easier to accept if it was called "Enums and other Named Constants"? Currently, if I want to store a list of homogeneous values I can use: - list - tuple - bytes - bytearray - array.array Are these not "slightly different in subtle ways"? If simple is what we want, then do away with the "an enum is not an int" idea, toss out the "Color.red != OtherColor.red", and suddenly my four classes are down to two (int vs str), or flufl.enum suddenly handles many many more cases than it did before. But, quite frankly, I see value in having different enumerations being different types, and in having NamedConstants... perhaps some renaming would make things clearer? Currently: EnumType Enum(metaclass=EnumType) BitMask(metaclass=EnumType) Sequence(metaclass=EnumType) String(metaclass=EnumType) Could be instead: NamedConstantType Enum(metaclass=NamedConstantType) NamedInt(int, metaclass=NamedConstantType) NamedStr(str, metaclass=NamedConstantType) with available add-ons of BITMASK, INDEX, and ORDER. I don't know about you, but I like that a lot better. :) -- ~Ethan~ From eliben at gmail.com Tue Mar 12 16:36:56 2013 From: eliben at gmail.com (Eli Bendersky) Date: Tue, 12 Mar 2013 08:36:56 -0700 Subject: [Python-ideas] PEP XXX - Competitor with PEP 435: Adding an enum type to the Python standard library In-Reply-To: <513F3D66.1080000@stoneleaf.us> References: <513E03E3.6010809@stoneleaf.us> <513E1377.60506@stoneleaf.us> <513F3D66.1080000@stoneleaf.us> Message-ID: > Currently, if I want to store a list of homogeneous values I can use: > > - list > - tuple > - bytes > - bytearray > - array.array > > Are these not "slightly different in subtle ways"? > > If simple is what we want, then do away with the "an enum is not an int" > idea, toss out the "Color.red != OtherColor.red", and suddenly my four > classes are down to two (int vs str), or flufl.enum suddenly handles many > many more cases than it did before. > > But, quite frankly, I see value in having different enumerations being > different types, and in having NamedConstants... perhaps some renaming > would make things clearer? > > Currently: > > EnumType > > Enum(metaclass=EnumType) > > BitMask(metaclass=EnumType) > > Sequence(metaclass=EnumType) > > String(metaclass=EnumType) > > > Could be instead: > > NamedConstantType > > Enum(metaclass=**NamedConstantType) > > NamedInt(int, metaclass=NamedConstantType) > > NamedStr(str, metaclass=NamedConstantType) > > with available add-ons of BITMASK, INDEX, and ORDER. > > I don't know about you, but I like that a lot better. :) > It is actually better, because it emphasizes that NamedInt is just that, not a kind of Enum. There's just one enum. Moreover, I'm not sure why strings need to be named (they name themselves just fine). And moreover+, Bitmask IMHO is completely unnecessary in Python. So if that leaves us with NamedInt / NamedFloat (in a similar vein to Nick's proposals) and Enum (with the semantics of PEP 435) I don't have major objections. Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Tue Mar 12 17:59:57 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 12 Mar 2013 09:59:57 -0700 Subject: [Python-ideas] PEP XXX - Competitor with PEP 435: Adding an enum type to the Python standard library In-Reply-To: References: <513E03E3.6010809@stoneleaf.us> <513E1377.60506@stoneleaf.us> <513F3D66.1080000@stoneleaf.us> Message-ID: <0CBC64AA-97FD-4C7E-825E-15EE86C9EA2C@yahoo.com> On Mar 12, 2013, at 8:36, Eli Bendersky wrote: > It is actually better, because it emphasizes that NamedInt is just that, not a kind of Enum. There's just one enum. Moreover, I'm not sure why strings need to be named (they name themselves just fine). And moreover+, Bitmask IMHO is completely unnecessary in Python. It's necessary everywhere we interface with C APIs and binary formats that use them. Even the stdlib is full of candidates--the flags in os, stat, etc. are all bitmasks. I could see arguing that these shouldn't be considered the same kind of thing as an enum, or are too complex to handle in what should be a simple enum library. But I can't see how you can say we don't need them. The situation is identical to ordered named int--same status quo, same downsides to the status quo, etc.--except for the implementation complexity. From eliben at gmail.com Tue Mar 12 18:28:43 2013 From: eliben at gmail.com (Eli Bendersky) Date: Tue, 12 Mar 2013 10:28:43 -0700 Subject: [Python-ideas] PEP XXX - Competitor with PEP 435: Adding an enum type to the Python standard library In-Reply-To: <0CBC64AA-97FD-4C7E-825E-15EE86C9EA2C@yahoo.com> References: <513E03E3.6010809@stoneleaf.us> <513E1377.60506@stoneleaf.us> <513F3D66.1080000@stoneleaf.us> <0CBC64AA-97FD-4C7E-825E-15EE86C9EA2C@yahoo.com> Message-ID: On Tue, Mar 12, 2013 at 9:59 AM, Andrew Barnert wrote: > On Mar 12, 2013, at 8:36, Eli Bendersky wrote: > > > It is actually better, because it emphasizes that NamedInt is just that, > not a kind of Enum. There's just one enum. Moreover, I'm not sure why > strings need to be named (they name themselves just fine). And moreover+, > Bitmask IMHO is completely unnecessary in Python. > > It's necessary everywhere we interface with C APIs and binary formats that > use them. Even the stdlib is full of candidates--the flags in os, stat, > etc. are all bitmasks. > I think that viewing the Python programmer community at large, very few actually interact with C APIs that have bitmasked flags. Moreover, a NamedInt can fit the bill without needing a specific bitmask flag. If you have "names" for your flag constituents you can just join them with '|' as in C. This is similar to what's currently being done in modules like os and stat, but provides conveniently printable names for the magic numbers. The benefits of a specific bitmasking class in the stdlib are imho very marginal. Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Tue Mar 12 21:07:05 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 12 Mar 2013 13:07:05 -0700 (PDT) Subject: [Python-ideas] PEP XXX - Competitor with PEP 435: Adding an enum type to the Python standard library In-Reply-To: References: <513E03E3.6010809@stoneleaf.us> <513E1377.60506@stoneleaf.us> <513F3D66.1080000@stoneleaf.us> <0CBC64AA-97FD-4C7E-825E-15EE86C9EA2C@yahoo.com> Message-ID: <1363118825.93348.YahooMailNeo@web184705.mail.ne1.yahoo.com> From: Eli Bendersky Sent: Tuesday, March 12, 2013 10:28 AM >On Tue, Mar 12, 2013 at 9:59 AM, Andrew Barnert wrote: > >On Mar 12, 2013, at 8:36, Eli Bendersky wrote: >> >>> It is actually better, because it emphasizes that NamedInt is just that, not a kind of Enum. There's just one enum. Moreover, I'm not sure why strings need to be named (they name themselves just fine). And moreover+, Bitmask IMHO is completely unnecessary in Python. >> >>It's necessary everywhere we interface with C APIs and binary formats that use them. Even the stdlib is full of candidates--the flags in os, stat, etc. are all bitmasks. > >I think that viewing the Python programmer community at large, very few actually interact with C APIs that have bitmasked flags. I don't see why this even needs to be established. We have cases like, e.g., mmap.mmap all over the stdlib that take bitmasked flags. Are you arguing that these functions are too uncommon to belong in the stdlib or need to be redesigned? And, even if you don't touch any of those parts of the stdlib, there are many outside libraries that follow its example. From the "first steps" of the wx tutorial: window = wx.Frame(None, style=wx.MAXIMIZE_BOX | wx.RESIZE_BORDER? | wx.SYSTEM_MENU | wx.CAPTION | wx.CLOSE_BOX) This isn't some design quirk of wx; this is how nearly every GUI framework except tkinter works. And image processing, audio processing, platform glue like win32api and PyObjC, and countless other application areas. >Moreover, a NamedInt can fit the bill without needing a specific bitmask flag. >If you have "names" for your flag constituents you can just join them with '|' as in C. This is similar to what's currently being done in modules like os and stat, but provides conveniently printable names for the magic numbers.?The benefits of a specific bitmasking class in the stdlib are imho very marginal. The benefits of a bitmasking enum class are exactly the same as the benefits of an ordered enum class.?Compare: ? ? background, edge = color.RED, side.BOTTOM ? ? print(background, edge) ? ? print(background < edge) The benefits of NamedInt are that this doesn't print "1 4" and then "True". Now: ? ? style =?wx.styles.MAXIMIZE_BOX | wx.styles.RESIZE_BORDER ? ? print(style) ? ? print(style & wx.keycodes.TAB) The benefits of a NamedInt are that this doesn't print?"2098176" and then "True". If we don't have these benefits, there is no reason to add NamedInt in the first place, because it's nothing more than an alternate way to construct integer constants. Again, you could argue that making NamedInt work for the ordered case is trivial, making it work for the bitmask case is complicated, and therefore it's not worth doing even though it would be useful. But it clearly would be useful, for exactly the same reasons as in the ordered case. From ethan at stoneleaf.us Tue Mar 12 21:40:03 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 12 Mar 2013 13:40:03 -0700 Subject: [Python-ideas] PEP XXX - Competitor with PEP 435: Adding an enum type to the Python standard library In-Reply-To: <1363118825.93348.YahooMailNeo@web184705.mail.ne1.yahoo.com> References: <513E03E3.6010809@stoneleaf.us> <513E1377.60506@stoneleaf.us> <513F3D66.1080000@stoneleaf.us> <0CBC64AA-97FD-4C7E-825E-15EE86C9EA2C@yahoo.com> <1363118825.93348.YahooMailNeo@web184705.mail.ne1.yahoo.com> Message-ID: <513F92A3.9060007@stoneleaf.us> On 03/12/2013 01:07 PM, Andrew Barnert wrote: > From: Eli Bendersky > Sent: Tuesday, March 12, 2013 10:28 AM > > >> On Tue, Mar 12, 2013 at 9:59 AM, Andrew Barnert wrote: >> >> On Mar 12, 2013, at 8:36, Eli Bendersky wrote: >>> >>>> It is actually better, because it emphasizes that NamedInt is just that, not a kind of Enum. There's just one enum. Moreover, I'm not sure why strings need to be named (they name themselves just fine). And moreover+, Bitmask IMHO is completely unnecessary in Python. >>> >>> It's necessary everywhere we interface with C APIs and binary formats that use them. Even the stdlib is full of candidates--the flags in os, stat, etc. are all bitmasks. >> >> I think that viewing the Python programmer community at large, very few actually interact with C APIs that have bitmasked flags. > > I don't see why this even needs to be established. We have cases like, e.g., mmap.mmap all over the stdlib that take bitmasked flags. Are you arguing that these functions are too uncommon to belong in the stdlib or need to be redesigned? > > > > And, even if you don't touch any of those parts of the stdlib, there are many outside libraries that follow its example. From the "first steps" of the wx tutorial: > > window = wx.Frame(None, style=wx.MAXIMIZE_BOX | wx.RESIZE_BORDER > | wx.SYSTEM_MENU | wx.CAPTION | wx.CLOSE_BOX) With a BitMask this could be: window = wx.Frame(None, style=wx.Style('MAXIMIZE_BOX|RESIZE_BORDER|SYSTEM_MENU|CAPTION|CLOSE_BOX')) > This isn't some design quirk of wx; this is how nearly every GUI framework except tkinter works. And image processing, audio processing, platform glue like win32api and PyObjC, and countless other application areas. > >> Moreover, a NamedInt can fit the bill without needing a specific bitmask flag. > > >> If you have "names" for your flag constituents you can just join them with '|' as in C. This is similar to what's currently being done in modules like os and stat, but provides conveniently printable names for the magic numbers. The benefits of a specific bitmasking class in the stdlib are imho very marginal. > > > The benefits of a bitmasking enum class are exactly the same as the benefits of an ordered enum class. Compare: > > background, edge = color.RED, side.BOTTOM > print(background, edge) > print(background < edge) > > The benefits of NamedInt are that this doesn't print "1 4" and then "True". Hopefully you meant BitMask, as a NamedInt would result in True (unless either color or side were not NamedInts). While a BitMask is not directly an int, Barry showed how it turns into an int when passed to a C library. > Now: > > > style = wx.styles.MAXIMIZE_BOX | wx.styles.RESIZE_BORDER > print(style) > print(style & wx.keycodes.TAB) > > The benefits of a NamedInt are that this doesn't print "2098176" and then "True". > > If we don't have these benefits, there is no reason to add NamedInt in the first place, because it's nothing more than an alternate way to construct integer constants. How valuable is a name? And a doc string? Currently I'm working with OpenERP code, which uses constants in a lot of key places. I would love to know what 5 stands for, and 6. I knew 6 once, kind of, for about ten minutes while I worked with it, and now I've forgotten. -- ~Ethan~ From foogod at gmail.com Tue Mar 12 22:31:29 2013 From: foogod at gmail.com (Alex Stewart) Date: Tue, 12 Mar 2013 14:31:29 -0700 Subject: [Python-ideas] PEP XXX - Competitor with PEP 435: Adding an enum type to the Python standard library In-Reply-To: <513F92A3.9060007@stoneleaf.us> References: <513E03E3.6010809@stoneleaf.us> <513E1377.60506@stoneleaf.us> <513F3D66.1080000@stoneleaf.us> <0CBC64AA-97FD-4C7E-825E-15EE86C9EA2C@yahoo.com> <1363118825.93348.YahooMailNeo@web184705.mail.ne1.yahoo.com> <513F92A3.9060007@stoneleaf.us> Message-ID: Sigh.. I apologize for coming back in a bit late and still not really up to speed, but various real-world things ended up sidetracking me and over the past week or two and I haven't been able to spend as much time here as I'd like to.. I haven't yet had a chance to read the proposed PEP, but based on the discussions I'm seeing, I have to admit I'm a little disappointed that I thought we had already managed to come together with a pretty common list of properties that we all agreed on for enum-like things in Python, and I had started putting together a sample implementation based on those principles (which I never really heard anybody say anything substantially bad about, so I assumed it was sorta on the right track) and now it sounds like we're going off in significantly different directions again, and I'm not sure why. If this is in fact the case, it seems to me that crafting yet another PEP which doesn't mesh with half the community's expectations is not really useful and is probably premature, and we should probably go back to talking more about conceptually what we're actually trying to accomplish here first. FYI, I have actually been working on a newer version of my enum implementation which includes support for compound enums (i.e. "oring" enum values together), and I believe covers almost all of the properties that I've heard people express a desire for on this list, and avoids most of the things people seem to have major issues with, so I would be interested to know if people actually have problems with it. I'll see if I can get it cleaned up a bit and up on github in the next day or two so folks at least have an idea of how I've been looking at this stuff and can comment on what they think.. Regarding the issue of bitmask-enums, I do agree that they are common enough in various APIs that it is important that we be able to support them easily. *However,* I have yet to see how or why they are actually different than int-enums in any practical way. I don't see why we need to treat them as a different category at all and I see no value in doing so. They're all just* *int-enums. Problem solved. (I might have some thoughts on some of the rest of this stuff too, but I want to try to find a bit of time and read up on the new proposed-PEP and discussions before I respond further..) --Alex On Tue, Mar 12, 2013 at 1:40 PM, Ethan Furman wrote: > On 03/12/2013 01:07 PM, Andrew Barnert wrote: > >> From: Eli Bendersky >> Sent: Tuesday, March 12, 2013 10:28 AM >> >> >> On Tue, Mar 12, 2013 at 9:59 AM, Andrew Barnert >>> wrote: >>> >>> On Mar 12, 2013, at 8:36, Eli Bendersky wrote: >>> >>>> >>>> It is actually better, because it emphasizes that NamedInt is just >>>>> that, not a kind of Enum. There's just one enum. Moreover, I'm not sure why >>>>> strings need to be named (they name themselves just fine). And moreover+, >>>>> Bitmask IMHO is completely unnecessary in Python. >>>>> >>>> >>>> It's necessary everywhere we interface with C APIs and binary formats >>>> that use them. Even the stdlib is full of candidates--the flags in os, >>>> stat, etc. are all bitmasks. >>>> >>> >>> I think that viewing the Python programmer community at large, very few >>> actually interact with C APIs that have bitmasked flags. >>> >> >> I don't see why this even needs to be established. We have cases like, >> e.g., mmap.mmap all over the stdlib that take bitmasked flags. Are you >> arguing that these functions are too uncommon to belong in the stdlib or >> need to be redesigned? >> >> >> >> And, even if you don't touch any of those parts of the stdlib, there are >> many outside libraries that follow its example. From the "first steps" of >> the wx tutorial: >> >> window = wx.Frame(None, style=wx.MAXIMIZE_BOX | wx.RESIZE_BORDER >> | wx.SYSTEM_MENU | wx.CAPTION | wx.CLOSE_BOX) >> > > With a BitMask this could be: > > window = wx.Frame(None, style=wx.Style('MAXIMIZE_BOX|** > RESIZE_BORDER|SYSTEM_MENU|**CAPTION|CLOSE_BOX')) > > > > This isn't some design quirk of wx; this is how nearly every GUI >> framework except tkinter works. And image processing, audio processing, >> platform glue like win32api and PyObjC, and countless other application >> areas. >> >> Moreover, a NamedInt can fit the bill without needing a specific bitmask >>> flag. >>> >> >> >> If you have "names" for your flag constituents you can just join them >>> with '|' as in C. This is similar to what's currently being done in modules >>> like os and stat, but provides conveniently printable names for the magic >>> numbers. The benefits of a specific bitmasking class in the stdlib are imho >>> very marginal. >>> >> >> >> The benefits of a bitmasking enum class are exactly the same as the >> benefits of an ordered enum class. Compare: >> >> background, edge = color.RED, side.BOTTOM >> print(background, edge) >> print(background < edge) >> >> The benefits of NamedInt are that this doesn't print "1 4" and then >> "True". >> > > Hopefully you meant BitMask, as a NamedInt would result in True (unless > either color or side were not NamedInts). > > While a BitMask is not directly an int, Barry showed how it turns into an > int when passed to a C library. > > > > Now: >> >> >> style = wx.styles.MAXIMIZE_BOX | wx.styles.RESIZE_BORDER >> print(style) >> print(style & wx.keycodes.TAB) >> >> The benefits of a NamedInt are that this doesn't print "2098176" and then >> "True". >> >> If we don't have these benefits, there is no reason to add NamedInt in >> the first place, because it's nothing more than an alternate way to >> construct integer constants. >> > > How valuable is a name? And a doc string? Currently I'm working with > OpenERP code, which uses constants in a lot of key places. I would love to > know what 5 stands for, and 6. I knew 6 once, kind of, for about ten > minutes while I worked with it, and now I've forgotten. > > -- > ~Ethan~ > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Tue Mar 12 23:16:40 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 12 Mar 2013 15:16:40 -0700 Subject: [Python-ideas] PEP XXX - Competitor with PEP 435: Adding an enum type to the Python standard library In-Reply-To: References: <513E03E3.6010809@stoneleaf.us> <513E1377.60506@stoneleaf.us> <513F3D66.1080000@stoneleaf.us> <0CBC64AA-97FD-4C7E-825E-15EE86C9EA2C@yahoo.com> <1363118825.93348.YahooMailNeo@web184705.mail.ne1.yahoo.com> <513F92A3.9060007@stoneleaf.us> Message-ID: <513FA948.60604@stoneleaf.us> On 03/12/2013 02:31 PM, Alex Stewart wrote: > I haven't yet had a chance to read the proposed PEP, but based on the discussions I'm seeing, I have to admit I'm a > little disappointed that I thought we had already managed to come together with a pretty common list of properties that > we all agreed on for enum-like things in Python, Sadly, no. There seems to be two basic camps: those that think an enum should be valueless, and have nothing to do with an integer besides using it to select the appropriate enumerator (that just looks strange -- I hope you're right Stephen!); and those for whom the integer is an integral part of the enumeration, whether for sorting, comparing, selecting an index, or whatever. The critical aspect of using or not using an integer as the base type is: what happens when an enumerator from one class is compared to an enumerator from another class? If the base type is int and they both have the same value, they'll be equal -- so much for type safety; if the base type is object, they won't be equal, but then you lose your easy to use int aspect, your sorting, etc. Worse, if you have the base type be an int, but check for enumeration membership such that Color.red == 1 == Fruit.apple, but Color.red != Fruit.apple, you open a big can of worms because you just broke equality transitivity (or whatever it's called). We don't want that. > FYI, I have actually been working on a newer version of my enum implementation which includes support for compound enums > (i.e. "oring" enum values together), and I believe covers almost all of the properties that I've heard people express a > desire for on this list, and avoids most of the things people seem to have major issues with, so I would be interested > to know if people actually have problems with it. I'll see if I can get it cleaned up a bit and up on github in the > next day or two so folks at least have an idea of how I've been looking at this stuff and can comment on what they think.. Looking forward to it. > Regarding the issue of bitmask-enums, I do agree that they are common enough in various APIs that it is important that > we be able to support them easily. /However,/ I have yet to see how or why they are actually different than int-enums in > any practical way. I don't see why we need to treat them as a different category at all and I see no value in doing so. > They're all just//int-enums. Problem solved. What you gain with a dedicated BitMask is better reprs -- the same thing you gain with an enum, after all. Plus, with a standard enum you're using every number, but with a BitMask you are skipping most of them. --> class Color(Enum): ... black = 0 ... red = 1 ... green = 2 ... blue = 3 ... pink = 4 --> Color.red | Color.green # what do we see? Color.blue? Color.red|green? --> Color.blue | Color.pink # and what here? 7? Color.blue|Color.pink? -- ~Ethan~ From foogod at gmail.com Tue Mar 12 23:17:59 2013 From: foogod at gmail.com (Alex Stewart) Date: Tue, 12 Mar 2013 15:17:59 -0700 Subject: [Python-ideas] PEP XXX - Competitor with PEP 435: Adding an enum type to the Python standard library In-Reply-To: References: <513E03E3.6010809@stoneleaf.us> <513E1377.60506@stoneleaf.us> <513F3D66.1080000@stoneleaf.us> <0CBC64AA-97FD-4C7E-825E-15EE86C9EA2C@yahoo.com> <1363118825.93348.YahooMailNeo@web184705.mail.ne1.yahoo.com> <513F92A3.9060007@stoneleaf.us> Message-ID: On Tue, Mar 12, 2013 at 2:31 PM, Alex Stewart wrote: > Regarding the issue of bitmask-enums, I do agree that they are common > enough in various APIs that it is important that we be able to support them > easily. *However,* I have yet to see how or why they are actually > different than int-enums in any practical way. I don't see why we need to > treat them as a different category at all and I see no value in doing so. > They're all just* *int-enums. Problem solved. > It just occurred to me that as I was skimming some of the previous discussion and I was also coming from the perspective of my own implementation, which has not yet been shared with the rest of the group, this response may have been overly terse and not really communicated what I meant very well. For clarification: I believe that what we're fundamentally talking about here is the question of "compounding" enum values into what are effectively multi-enum sets (they are not really the same thing anymore as a single enum value, because they do not have a one-to-one correspondence with any one enum value). With ints, this compounding operation is typically a "binary or" operation. This is what we call "bitmasks", but really they're just a particular way of compounding ints together. In my opinion, the concept of "compound enum" is a larger abstract concept that could, and probably should, apply to any type of enum, not just ints. In that context, bitmasks are really nothing special, they are just the compound form of int-enums: class Color (Enum): RED, GREEN, BLUE = __ * 3 class WxStyle (IntEnum): RESIZE_BORDER = 64 MAXIMIZE_BOX = 512 >>> x = Color.RED | Color.BLUE >>> repr(x) 'Color.RED | Color.BLUE' >>> Color.RED in x True >>> Color.GREEN in x False >>> type(x) >>> int(x) Traceback (most recent call last): File "", line 1, in TypeError: int() argument must be a string or a number, not 'CompoundEnum' >>> x = WxStyle.MAXIMIZE_BOX | WxStyle.RESIZE_BORDER >>> repr(x) WxStyle.MAXIMIZE_BOX | WxStyle.RESIZE_BORDER >>> WxStyle.MAXIMIZE_BOX in x True >>> type(x) >>> int(x) 576 (This may become a bit clearer when you guys can all see the implementation I've been coding up) --Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From foogod at gmail.com Tue Mar 12 23:46:24 2013 From: foogod at gmail.com (Alex Stewart) Date: Tue, 12 Mar 2013 15:46:24 -0700 Subject: [Python-ideas] PEP XXX - Competitor with PEP 435: Adding an enum type to the Python standard library In-Reply-To: <513FA948.60604@stoneleaf.us> References: <513E03E3.6010809@stoneleaf.us> <513E1377.60506@stoneleaf.us> <513F3D66.1080000@stoneleaf.us> <0CBC64AA-97FD-4C7E-825E-15EE86C9EA2C@yahoo.com> <1363118825.93348.YahooMailNeo@web184705.mail.ne1.yahoo.com> <513F92A3.9060007@stoneleaf.us> <513FA948.60604@stoneleaf.us> Message-ID: On Tue, Mar 12, 2013 at 3:16 PM, Ethan Furman wrote: > Sadly, no. There seems to be two basic camps: those that think an enum > should be valueless, and have nothing to do with an integer besides using > it to select the appropriate enumerator (that just looks strange -- I hope > you're right Stephen!); and those for whom the integer is an integral part > of the enumeration, whether for sorting, comparing, selecting an index, or > whatever. > To be honest, I really don't think these two camps are as irreconcilable as lots of people seem to be treating them. I don't think either view is necessarily wrong, and I firmly believe that we can find a common ground that works for both ways of approaching this issue. More importantly, however, I believe we *must* try to find some way to coexist or anything we come up with will be fundamentally inadequate for a substantial portion of the community, and in that case, I don't think it belongs in the stdlib. If we can't compromise a bit, I think we're all doomed to failure. I do recognize that the issue of transitive equality is something we're going to need to work out. One thing I think folks need to keep in mind, though, is that that issue really has nothing to do with int-enums vs. valueless-enums at all. It will still be a point of contention even if we go with solely the "enums are ints" way of looking at things, so that really won't solve it. (The fundamental issue is: int-enums are (by definition) both ints, and named-objects. When not explicitly or implicitly being used as an int, should they still behave like ints first and named-objects second, or should they be named-objects first and ints second? Both ways have some problems, and we will ultimately need to choose which set of problems will be the least annoying in the long term, but that's a completely separate discussion than the int/valued/valueless question, IMHO.) --Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Wed Mar 13 00:11:43 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 12 Mar 2013 16:11:43 -0700 Subject: [Python-ideas] PEP XXX - Competitor with PEP 435: Adding an enum type to the Python standard library In-Reply-To: References: <513E03E3.6010809@stoneleaf.us> <513E1377.60506@stoneleaf.us> <513F3D66.1080000@stoneleaf.us> <0CBC64AA-97FD-4C7E-825E-15EE86C9EA2C@yahoo.com> <1363118825.93348.YahooMailNeo@web184705.mail.ne1.yahoo.com> <513F92A3.9060007@stoneleaf.us> <513FA948.60604@stoneleaf.us> Message-ID: <513FB62F.5010703@stoneleaf.us> On 03/12/2013 03:46 PM, Alex Stewart wrote: > On Tue, Mar 12, 2013 at 3:16 PM, Ethan Furman > wrote: > >> Sadly, no. There seems to be two basic camps: those that think an enum should be valueless, and have nothing to do >> with an integer besides using it to select the appropriate enumerator (that just looks strange -- I hope you're >> right Stephen!); and those for whom the integer is an integral part of the enumeration, whether for sorting, >> comparing, selecting an index, or whatever. > > To be honest, I really don't think these two camps are as irreconcilable as lots of people seem to be treating them. I > don't think either view is necessarily wrong, and I firmly believe that we can find a common ground that works for both > ways of approaching this issue. More importantly, however, I believe we /must/ try to find some way to coexist or > anything we come up with will be fundamentally inadequate for a substantial portion of the community, and in that case, > I don't think it belongs in the stdlib. If we can't compromise a bit, I think we're all doomed to failure. > > I do recognize that the issue of transitive equality is something we're going to need to work out. One thing I think > folks need to keep in mind, though, is that that issue really has nothing to do with int-enums vs. valueless-enums at > all. It will still be a point of contention even if we go with solely the "enums are ints" way of looking at things, so > that really won't solve it. (The fundamental issue is: int-enums are (by definition) both ints, and named-objects. > When not explicitly or implicitly being used as an int, should they still behave like ints first and named-objects > second, or should they be named-objects first and ints second? Both ways have some problems, and we will ultimately > need to choose which set of problems will be the least annoying in the long term, but that's a completely separate > discussion than the int/valued/valueless question, IMHO.) It sounds to me like you might be saying it could be okay to break transitive equality in the case of enums. I'll quote Terry: On 02/26/2013 07:01 AM, Terry Reedy wrote: > On 2/25/2013 6:53 PM, Greg Ewing wrote: >> Barry Warsaw wrote: >>> --> Colors = make('Colors', 'red green blue'.split()) >>> --> Animals = make('Animals', 'ant bee cat'.split()) >>> --> Colors.green == Animals.bee >> >> The currently suggested solution to that seems to be to >> make comparison non-transitive, so that Colors.green == 1 >> and Animals.bee == 1 but Colors.green != Animals.bee. >> And then hope that this does not create a quantum black >> hole that sucks us all into a logical singularity... > > But it will;-). > To repeat myself, transitivity of equality is basic to thought, logic, and sets and we should not deliver Python with it > broken. (The non-reflexivity of NAN is a different issue, but NANs are intentionally insane.) > > Decimal(0) == 0 == 0.0 != Decimal(0) != Fraction(0) == 0 > was a problem we finally fixed by making integer-valued decimals > compare equal to the same valued floats and fractions. In 3.3: > > --> from decimal import Decimal as D > --> from fractions import Fraction as F > --> 0 == 0.0 == D(0) == F(0) > True > > http://bugs.python.org/issue4087 > http://bugs.python.org/issue4090 > explain the practical problems. We should NOT knowingly go down this road again. If color and animal are isolated from > each other, they should each be isolated from everything, including int. -- ~Ethan~ From greg.ewing at canterbury.ac.nz Wed Mar 13 00:42:55 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 13 Mar 2013 12:42:55 +1300 Subject: [Python-ideas] PEP XXX - Competitor with PEP 435: Adding an enum type to the Python standard library In-Reply-To: <513FA948.60604@stoneleaf.us> References: <513E03E3.6010809@stoneleaf.us> <513E1377.60506@stoneleaf.us> <513F3D66.1080000@stoneleaf.us> <0CBC64AA-97FD-4C7E-825E-15EE86C9EA2C@yahoo.com> <1363118825.93348.YahooMailNeo@web184705.mail.ne1.yahoo.com> <513F92A3.9060007@stoneleaf.us> <513FA948.60604@stoneleaf.us> Message-ID: <513FBD7F.1060202@canterbury.ac.nz> Ethan Furman wrote: > There seems to be two basic camps: those that think an enum > should be valueless ... and those for whom the integer is an > integral part of the enumeration, I think most people agree that both of these use cases are important. The split seems to be more between those that want to address them using separate types, and those that prefer a one-type-fits-all approach. -- Greg From random832 at fastmail.us Wed Mar 13 02:03:48 2013 From: random832 at fastmail.us (Random832) Date: Tue, 12 Mar 2013 21:03:48 -0400 Subject: [Python-ideas] PEP XXX - Competitor with PEP 435: Adding an enum type to the Python standard library In-Reply-To: References: <513E03E3.6010809@stoneleaf.us> <513E1377.60506@stoneleaf.us> <513F3D66.1080000@stoneleaf.us> <0CBC64AA-97FD-4C7E-825E-15EE86C9EA2C@yahoo.com> <1363118825.93348.YahooMailNeo@web184705.mail.ne1.yahoo.com> <513F92A3.9060007@stoneleaf.us> Message-ID: <513FD074.9010701@fastmail.us> On 03/12/2013 06:17 PM, Alex Stewart wrote: > effectively multi-enum sets Speaking of sets... it seems like sets of single enum values would be a good way to represent "ORed-together enums". But then if you want a function to accept one, it has to be able to not care if it's only a single one or not. So, what if each value of a 'flags' type enum is also able to be treated as a single-member set of itself? We already have one object that iterates into a sequence the same class: strings, so there's nothing wrong with that _conceptually_. The constructor could turn a sequence of the enum into a proper instance of it [if needed to get the int value] for passing into functions that require one, and a function that takes such a set of them could just not care the difference between Color.red|Color.blue and {Color.red,Color.blue}. From abarnert at yahoo.com Wed Mar 13 02:43:44 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 12 Mar 2013 18:43:44 -0700 (PDT) Subject: [Python-ideas] PEP XXX - Competitor with PEP 435: Adding an enum type to the Python standard library In-Reply-To: References: <513E03E3.6010809@stoneleaf.us> <513E1377.60506@stoneleaf.us> <513F3D66.1080000@stoneleaf.us> <0CBC64AA-97FD-4C7E-825E-15EE86C9EA2C@yahoo.com> <1363118825.93348.YahooMailNeo@web184705.mail.ne1.yahoo.com> <513F92A3.9060007@stoneleaf.us> Message-ID: <1363139024.2172.YahooMailNeo@web184701.mail.ne1.yahoo.com> > From: Alex Stewart >Sent: Tuesday, March 12, 2013 3:17 PM >Regarding the issue of bitmask-enums, I do agree that they are common enough in various APIs that it is important that we be able to support them easily. ?However, I have yet to see how or why they are actually different than int-enums in any practical way. ?I don't see why we need to treat them as a different category at all and I see no value in doing so. ?They're all just?int-enums. ?Problem solved. I agree with this. If there's an easy way to specify enum values, there's no reason that specifying 1, 2, 4, 8 has to be any different from specifying 0, 1, 2, 3. And, if there's a general-purpose way of "compounding enums" that lets me get the OR-ed value for passing to C/stdlib/etc. APIs that want that, I have no need for a special-purpose way of OR-ing bitmask enums. So, given all that, there is no value in treating them as a separate category. And?it looks like maybe you've got a good answer that gives all that below, so? >For clarification: ?I believe that what we're fundamentally talking about here is the question of "compounding" enum values into what are effectively multi-enum sets (they are not really the same thing anymore as a single enum value, because they do not have a one-to-one correspondence with any one enum value). ?With ints, this compounding operation is typically a "binary or" operation. ?This is what we call "bitmasks", but really they're just a particular way of compounding ints together. I suggested, a while back, that we could have some kind of EnumSet along with Enum, which sounds like what you're talking about here. So, if I did this: background = Color.RED | Color.BLUE What I get is not an Enum of type Color with a value RED | BLUE, but an EnumSet or type Color with a value {RED, BLUE}.?Your CompoundEnum seems to be the same thing, except that it puts the "enum-ness" first instead of the "set-ness", which is probably an improvement. There are a few tricky bits here?most of which I didn't think through when I first suggested this, but it looks like you may have. You need a 0 value. After all, "background & Color.GREEN" has to return _something_. What is it called, what are its str and repr, etc.? You also need negated values, or there's no way to write?"background &= ~Color.RED". So, "~Color.RED" has to be something. Presumably a CompoundEnum, with an?int value of -2 (since RED is 1). But what are its str and repr? Does this mean internally CompoundEnum is a set of values and a set of negated values? If you want to make CompoundEnum work for string enums, what's the negation of a string value? Getting the list of interactions between Enum and CompoundEnum exactly right seems non-trivial.?For example,?"Color.RED in background" has to work even if background is just plain "Color.RED". (Otherwise, you'd have to write horrible stuff like?"background == Color.RED or Color.RED in background".) You can't just say "both types support all numeric and set operations in the obvious way", because I don't think it's obvious that, e.g., __or__ treats either type as a set while __lt__ treats either type as a number. In fact, in some cases, I don't even know what the answer would be. Can __contains__ take a CompoundEnum on the left side? What's the __len__ of a CompoundEnum with negated values in it? And so on? Finally, what happens if someone has an int Enum whose values aren't unique bits, and tries to create a CompoundEnum out of it? If RED=1, BLUE=2, GREEN=3, is RED|BLUE an error? If so, that means that common cases like RDWR=3, ALL_FLAGS=0xFFFFFFFF, LOCAL_FLAGS=0x8FFC, etc. are errors too. On the other hand, if that's allowed, does that mean READ|WRITE gives me RDWR, or a CompoundEnum that compares equal to RDWR, ?? For that matter, what about RDWR | ~READ? (And can I define ANTIRED=-2 so ~RED = ANTIRED?) If you've solved these problems, then yes, CompoundEnum completely eliminates the need for BitMask. And it sounds like, even if you haven't, you're at least well on your way to doing so. I'm looking forward to seeing it. From abarnert at yahoo.com Wed Mar 13 02:58:23 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 12 Mar 2013 18:58:23 -0700 (PDT) Subject: [Python-ideas] PEP XXX - Competitor with PEP 435: Adding an enum type to the Python standard library In-Reply-To: <513FD074.9010701@fastmail.us> References: <513E03E3.6010809@stoneleaf.us> <513E1377.60506@stoneleaf.us> <513F3D66.1080000@stoneleaf.us> <0CBC64AA-97FD-4C7E-825E-15EE86C9EA2C@yahoo.com> <1363118825.93348.YahooMailNeo@web184705.mail.ne1.yahoo.com> <513F92A3.9060007@stoneleaf.us> <513FD074.9010701@fastmail.us> Message-ID: <1363139903.96412.YahooMailNeo@web184704.mail.ne1.yahoo.com> > From: Random832 > Sent: Tuesday, March 12, 2013 6:03 PM > Subject: Re: [Python-ideas] PEP XXX - Competitor with PEP 435: Adding an enum type to the Python standard library > > On 03/12/2013 06:17 PM, Alex Stewart wrote: >> effectively multi-enum sets > Speaking of sets... it seems like sets of single enum values would be a good way > to represent "ORed-together enums". But then if you want a function to > accept one, it has to be able to not care if it's only a single one or not. >? > So, what if each value of a 'flags' type enum is also able to be treated? > as a single-member set of itself? We already have one object that iterates into? > a sequence the same class: strings, so there's nothing wrong with that? > _conceptually_. That means that in order to make a window non-resizable, instead of this: ? ? window.style &= ~wx.RESIZE_BORDER you do this: ? ? window.style -= wx.RESIZE_BORDER Right? (IIRC, the actual code is probably something like "window.setStyle(window.getStyle() & ~wx.RESIZE_BORDER)", but I think we can ignore that; the issue is the same.) That solves the problem of how to represent ~wx.RESIZE_BORDER, etc. But it creates a new problem. Now non-bitmask integral Enum constants don't have integral-type operators, but set-type operators. So "FRIDAY - WEDNESDAY" returns "FRIDAY" (because it's set difference), and "WEDNESDAY < FRIDAY" is false (because it's not a proper subset)? So I think this means?you _do_ need separate types for bitmasked and ordered int values?or, alternatively, for set-able and non-set-able Enums, which I think Alex Stewart was able to eliminate. > The constructor could turn a sequence of the enum into a proper instance of it? > [if needed to get the int value] for passing into functions that require one, Well, you need to be able to get the int value of a set, too. What's the point of being able to do mmap.PROT_READ | mmap.PROT_WRITE if I (or the mmap module) can't turn that into a 3? Also, what happens if you call the constructor on a sequence of more than one instance? From ethan at stoneleaf.us Wed Mar 13 02:55:40 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 12 Mar 2013 18:55:40 -0700 Subject: [Python-ideas] PEP XXX - Competitor with PEP 435: Adding an enum type to the Python standard library In-Reply-To: <513FD074.9010701@fastmail.us> References: <513E03E3.6010809@stoneleaf.us> <513E1377.60506@stoneleaf.us> <513F3D66.1080000@stoneleaf.us> <0CBC64AA-97FD-4C7E-825E-15EE86C9EA2C@yahoo.com> <1363118825.93348.YahooMailNeo@web184705.mail.ne1.yahoo.com> <513F92A3.9060007@stoneleaf.us> <513FD074.9010701@fastmail.us> Message-ID: <513FDC9C.3000806@stoneleaf.us> On 03/12/2013 06:03 PM, Random832 wrote: > On 03/12/2013 06:17 PM, Alex Stewart wrote: >> effectively multi-enum sets > Speaking of sets... it seems like sets of single enum values would be a good way to represent "ORed-together enums". But > then if you want a function to accept one, it has to be able to not care if it's only a single one or not. > > So, what if each value of a 'flags' type enum is also able to be treated as a single-member set of itself? In aenum.py you are able to iterate over a BitMask enum, both single ones and ORed together ones, even if it only has one value. The code looks like this: 8<---------------------------------------------------------------------- def __iter__(yo): enums = [] for enum in yo._enums: if int(enum) & int(yo): enums.append(enum) return iter(enums) 8<---------------------------------------------------------------------- -- ~Ethan~ From random832 at fastmail.us Wed Mar 13 03:04:53 2013 From: random832 at fastmail.us (Random832) Date: Tue, 12 Mar 2013 22:04:53 -0400 Subject: [Python-ideas] PEP XXX - Competitor with PEP 435: Adding an enum type to the Python standard library In-Reply-To: <1363139024.2172.YahooMailNeo@web184701.mail.ne1.yahoo.com> References: <513E03E3.6010809@stoneleaf.us> <513E1377.60506@stoneleaf.us> <513F3D66.1080000@stoneleaf.us> <0CBC64AA-97FD-4C7E-825E-15EE86C9EA2C@yahoo.com> <1363118825.93348.YahooMailNeo@web184705.mail.ne1.yahoo.com> <513F92A3.9060007@stoneleaf.us> <1363139024.2172.YahooMailNeo@web184701.mail.ne1.yahoo.com> Message-ID: <513FDEC5.9060206@fastmail.us> On 03/12/2013 09:43 PM, Andrew Barnert wrote: > You also need negated values, or there's no way to write "background &= ~Color.RED". So, "~Color.RED" has to be something. No, you don't. I started saying the same thing (then got sidetracked into trying to define it as a general operation on sets, then abandoned the idea as silly) in my last, but.... why not just background -= Color.RED? Just like sets. It's different from C, but that's not necessarily a bad thing. From random832 at fastmail.us Wed Mar 13 03:11:49 2013 From: random832 at fastmail.us (Random832) Date: Tue, 12 Mar 2013 22:11:49 -0400 Subject: [Python-ideas] PEP XXX - Competitor with PEP 435: Adding an enum type to the Python standard library In-Reply-To: <1363139903.96412.YahooMailNeo@web184704.mail.ne1.yahoo.com> References: <513E03E3.6010809@stoneleaf.us> <513E1377.60506@stoneleaf.us> <513F3D66.1080000@stoneleaf.us> <0CBC64AA-97FD-4C7E-825E-15EE86C9EA2C@yahoo.com> <1363118825.93348.YahooMailNeo@web184705.mail.ne1.yahoo.com> <513F92A3.9060007@stoneleaf.us> <513FD074.9010701@fastmail.us> <1363139903.96412.YahooMailNeo@web184704.mail.ne1.yahoo.com> Message-ID: <513FE065.2010000@fastmail.us> On 03/12/2013 09:58 PM, Andrew Barnert wrote: > > Well, you need to be able to get the int value of a set, too. What's the point of being able to do mmap.PROT_READ | mmap.PROT_WRITE if I (or the mmap module) can't turn that into a 3? No, you'd be able to turn _that_ into a 3. What I'm suggesting is being able to pass x = {t.PROT_READ, t.PROT_WRITE} into the mmap module, and then it can do int(t(x)) to get 3. Particularly since I don't really _like_ the a|b syntax for this, for some indefinable reason, and would like for passing in actual set literals to be the preferred way of doing it. From abarnert at yahoo.com Wed Mar 13 07:08:22 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 12 Mar 2013 23:08:22 -0700 Subject: [Python-ideas] PEP XXX - Competitor with PEP 435: Adding an enum type to the Python standard library In-Reply-To: <513FDEC5.9060206@fastmail.us> References: <513E03E3.6010809@stoneleaf.us> <513E1377.60506@stoneleaf.us> <513F3D66.1080000@stoneleaf.us> <0CBC64AA-97FD-4C7E-825E-15EE86C9EA2C@yahoo.com> <1363118825.93348.YahooMailNeo@web184705.mail.ne1.yahoo.com> <513F92A3.9060007@stoneleaf.us> <1363139024.2172.YahooMailNeo@web184701.mail.ne1.yahoo.com> <513FDEC5.9060206@fastmail.us> Message-ID: On Mar 12, 2013, at 19:04, Random832 wrote: > On 03/12/2013 09:43 PM, Andrew Barnert wrote: >> You also need negated values, or there's no way to write "background &= ~Color.RED". So, "~Color.RED" has to be something. > No, you don't. I started saying the same thing (then got sidetracked into trying to define it as a general operation on sets, then abandoned the idea as silly) in my last, but.... why not just background -= Color.RED? Just like sets. Which means, as I said in another message, that Day.FRIDAY - Day.WEDNESDAY is not 2, but Day.FRIDAY. Which I'm pretty sure most people would find surprising. Unless you want bitmask and ordered int enums to be different types, I don't see a good way around this. A type that acts like a set and also like a number has conflicting intuitive meanings for most of its operators. The only reason you don't notice this for the obvious ones | and & is that those two set operations mean the same thing as the int operations (when thinking of an int as a set of bits); that's not true for any operators other than the bitwise ones. > It's different from C, but that's not necessarily a bad thing. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From ncoghlan at gmail.com Wed Mar 13 08:12:11 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 13 Mar 2013 00:12:11 -0700 Subject: [Python-ideas] PEP XXX - Competitor with PEP 435: Adding an enum type to the Python standard library In-Reply-To: <513FBD7F.1060202@canterbury.ac.nz> References: <513E03E3.6010809@stoneleaf.us> <513E1377.60506@stoneleaf.us> <513F3D66.1080000@stoneleaf.us> <0CBC64AA-97FD-4C7E-825E-15EE86C9EA2C@yahoo.com> <1363118825.93348.YahooMailNeo@web184705.mail.ne1.yahoo.com> <513F92A3.9060007@stoneleaf.us> <513FA948.60604@stoneleaf.us> <513FBD7F.1060202@canterbury.ac.nz> Message-ID: On Tue, Mar 12, 2013 at 4:42 PM, Greg Ewing wrote: > Ethan Furman wrote: > >> There seems to be two basic camps: those that think an enum should be >> valueless ... and those for whom the integer is an integral part of the >> enumeration, > > > I think most people agree that both of these use cases are > important. The split seems to be more between those that > want to address them using separate types, and those that > prefer a one-type-fits-all approach. There's a third camp - those that think the stdlib should just provide a basic explicit "labelled value" building block, and let others worry about composing them into more complex APIs and custom data types with defined semantics. Layered APIs are a good thing, and in this case, the obvious layering option is to separate out the "labelling" part which is broadly useful for debugging purposes from the "group of related values" part which is used as a tool to help ensure program correctness by triggering errors at the point of the undefined operation rather than allowing nonsensical results based on an underlying data type like int or str. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From metaentropy at gmail.com Wed Mar 13 13:47:35 2013 From: metaentropy at gmail.com (=?UTF-8?Q?Andr=C3=A9_Gillibert?=) Date: Wed, 13 Mar 2013 13:47:35 +0100 Subject: [Python-ideas] Python 2.7 for Windows 9x Message-ID: Hello. Officially Python doesn't support Windows 9x and NT 3.51/4.0 anymore. I won't change this design decision. However, I built python 2.6.8 and 2.7.3 to run on these platforms. This may be useful to people with a very old computer. May you link this project on your python.org Web site? This is available at: Thanks. -- Sincerely Andr? Gillibert From random832 at fastmail.us Wed Mar 13 14:35:26 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Wed, 13 Mar 2013 09:35:26 -0400 Subject: [Python-ideas] PEP XXX - Competitor with PEP 435: Adding an enum type to the Python standard library In-Reply-To: References: <513E03E3.6010809@stoneleaf.us> <513E1377.60506@stoneleaf.us> <513F3D66.1080000@stoneleaf.us> <0CBC64AA-97FD-4C7E-825E-15EE86C9EA2C@yahoo.com> <1363118825.93348.YahooMailNeo@web184705.mail.ne1.yahoo.com> <513F92A3.9060007@stoneleaf.us> <1363139024.2172.YahooMailNeo@web184701.mail.ne1.yahoo.com> <513FDEC5.9060206@fastmail.us> Message-ID: <1363181726.9899.140661203794349.43324E66@webmail.messagingengine.com> On Wed, Mar 13, 2013, at 2:08, Andrew Barnert wrote: > Which means, as I said in another message, that Day.FRIDAY - > Day.WEDNESDAY is not 2, but Day.FRIDAY. Which I'm pretty sure most people > would find surprising. > > Unless you want bitmask and ordered int enums to be different types, That was part of my suggestion - I might not have been clear enough. Anyway - why would you subtract weekdays instead of subtracting the actual timestamps they're from and getting the number of days from the resulting timedelta? What makes you think a weekday enum wouldn't be a flag enum (so you can say an appointment is every monday, wednesday, and friday)? From foogod at gmail.com Wed Mar 13 18:29:20 2013 From: foogod at gmail.com (Alex Stewart) Date: Wed, 13 Mar 2013 10:29:20 -0700 Subject: [Python-ideas] Unify addresses passed to bind() and getaddrinfo() In-Reply-To: References: Message-ID: On Mon, Mar 11, 2013 at 12:20 PM, Giampaolo Rodol? wrote: > It took me quite some time to figure out what was wrong as to me "" > has always been an alias for "all interfaces" and didn't know > getaddrinfo() behaved differently than bind() in this regard. > If from getaddrinfo() standpoint "" (empty string) has no particular > meaning I propose to unify the two APIs and make getaddrinfo() assume > "" is an alias for "all interfaces". > I agree this difference in behavior seems a little confusing, and is arguably odd considering that bind() uses getaddrinfo() internally.. At the very least, I do believe that if getaddrinfo() accepts None to indicate "all local interfaces", that bind() should also accept None for this purpose as well. I don't think this would be a difficult change and I can't see any real downside to allowing it. The only concern I see with going the other direction (making getaddrinfo() consider an empty string to be the same as None), as you suggest, is that this would technically not make it work the same as the underlying system call, which does treat "" and NULL as meaning different things. It is true that in the typical Unix/Windows/MacOS case using the standard hosts/DNS resolvers, an empty hostname has no real meaning, but technically that is not necessarily true for all possible types of name services that an OS might provide, and it is possible that there's a name resolver out there for which specifying "" as the hostname actually does mean something important. Theoretically, this sort of change would make such a system not work as expected (possibly in non-obvious ways) when accessed from Python.. Personally, though, I do think it would make sense to change bind() to allow None as an argument, and make that the preferred way to specify "all interfaces" (still allowing an empty string as another way to say the same thing). It would make it more consistent with getaddrinfo, as well as also making it more Pythonic, IMHO.. --Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Wed Mar 13 18:53:46 2013 From: guido at python.org (Guido van Rossum) Date: Wed, 13 Mar 2013 10:53:46 -0700 Subject: [Python-ideas] Python 2.7 for Windows 9x In-Reply-To: References: Message-ID: This is very cool, although perhaps for a small-ish audience. I recommend that at least you add a description of your work to the python.org Wiki. Thanks! On Wed, Mar 13, 2013 at 5:47 AM, Andr? Gillibert wrote: > Hello. > > Officially Python doesn't support Windows 9x and NT 3.51/4.0 anymore. > I won't change this design decision. However, I built python 2.6.8 and > 2.7.3 to run on these platforms. > This may be useful to people with a very old computer. > > May you link this project on your python.org Web site? > > This is available at: > > > Thanks. > > -- > Sincerely > Andr? Gillibert > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- --Guido van Rossum (python.org/~guido) From g.rodola at gmail.com Thu Mar 14 15:04:50 2013 From: g.rodola at gmail.com (=?ISO-8859-1?Q?Giampaolo_Rodol=E0?=) Date: Thu, 14 Mar 2013 15:04:50 +0100 Subject: [Python-ideas] Unify addresses passed to bind() and getaddrinfo() In-Reply-To: References: Message-ID: Agreed, but now that I look at how getsockaddrarg() is implemented into socketmodule.c I'm kind of scared of what could happen if we blindly change/force the original tuple passed by the user in case of exotic families such as AF_NETLINK, AF_BLUETOOTH and others (I have no idea what kind of address tuple (?) is supposed be passed in those cases). Alternatively we might change it only for AF_INET sockets but it seems kind of inconsistent and doesn't probably worth the effort. --- Giampaolo https://code.google.com/p/pyftpdlib/ https://code.google.com/p/psutil/ https://code.google.com/p/pysendfile/ 2013/3/13 Alex Stewart : > On Mon, Mar 11, 2013 at 12:20 PM, Giampaolo Rodol? > wrote: >> >> It took me quite some time to figure out what was wrong as to me "" >> has always been an alias for "all interfaces" and didn't know >> getaddrinfo() behaved differently than bind() in this regard. >> If from getaddrinfo() standpoint "" (empty string) has no particular >> meaning I propose to unify the two APIs and make getaddrinfo() assume >> "" is an alias for "all interfaces". > > > I agree this difference in behavior seems a little confusing, and is > arguably odd considering that bind() uses getaddrinfo() internally.. > > At the very least, I do believe that if getaddrinfo() accepts None to > indicate "all local interfaces", that bind() should also accept None for > this purpose as well. I don't think this would be a difficult change and I > can't see any real downside to allowing it. The only concern I see with > going the other direction (making getaddrinfo() consider an empty string to > be the same as None), as you suggest, is that this would technically not > make it work the same as the underlying system call, which does treat "" and > NULL as meaning different things. > > It is true that in the typical Unix/Windows/MacOS case using the standard > hosts/DNS resolvers, an empty hostname has no real meaning, but technically > that is not necessarily true for all possible types of name services that an > OS might provide, and it is possible that there's a name resolver out there > for which specifying "" as the hostname actually does mean something > important. Theoretically, this sort of change would make such a system not > work as expected (possibly in non-obvious ways) when accessed from Python.. > > Personally, though, I do think it would make sense to change bind() to allow > None as an argument, and make that the preferred way to specify "all > interfaces" (still allowing an empty string as another way to say the same > thing). It would make it more consistent with getaddrinfo, as well as also > making it more Pythonic, IMHO.. > > --Alex > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > From holger at merlinux.eu Fri Mar 15 20:43:55 2013 From: holger at merlinux.eu (holger krekel) Date: Fri, 15 Mar 2013 19:43:55 +0000 Subject: [Python-ideas] new PEP submission: transitioning to hosting release files on PYPI Message-ID: <20130315194355.GI9677@merlinux.eu> Hi all, FYI i just submitted the below PEP draft to the PEP editors to receive a proper version number. The topics and the draft saw extensive discussions (four versions) on catalog-sig at python.org. We'd like to keep the discussion there if possible. best, holger krekel PEP: XXX Title: Transitioning to release-file hosting on PyPI Version: $Revision$ Last-Modified: $Date$ Author: Holger Krekel , Carl Meyer Discussions-To: catalog-sig at python.org Status: Draft (PRE-submit V4) Type: Process Content-Type: text/x-rst Created: 10-Mar-2013 Post-History: Abstract ======== This PEP proposes a backward-compatible two-phase transition process to speed up, simplify and robustify installing from the pypi.python.org (PyPI) package index. To ease the transition and minimize client-side friction, **no changes to distutils or existing installation tools are required in order to benefit from the first transition phase, which will result in faster, more reliable installs for most existing packages**. The first transition phase implements an easy and explicit means for a package maintainer to control which release file links are served to present-day installation tools. The first phase also includes the implementation of analysis tools for present-day packages, to support communication with package maintainers and the automated setting of default modes for controlling release file links. The first phase also will default newly-registered projects on PyPI to only serve links to release files which were uploaded to PyPI. The second transition phase concerns end-user installation tools, which shall default to only install release files that are hosted on PyPI and tell the user if external release files exist, offering a choice to automatically use those external files. Rationale ========= .. _history: History and motivations for external hosting -------------------------------------------- When PyPI went online, it offered release registration but had no facility to host release files itself. When hosting was added, no automated downloading tool existed yet. When Philip Eby implemented automated downloading (through setuptools), he made the choice to allow people to use download hosts of their choice. The finding of externally-hosted packages was implemented as follows: #. The PyPI ``simple/`` index for a package contains all links found by scraping them from that package's long_description metadata for any release. Links in the "Download-URL" and "Home-page" metadata fields are given ``rel=download`` and ``rel=homepage`` attributes, respectively. #. Any of these links whose target is a file whose name appears to be in the form of an installable source or binary distribution, with name in the form "packagename-version.ARCHIVEEXT", is considered a potential installation candidate by installation tools. #. Similarly, any links suffixed with an "#egg=packagename-version" fragment are considered an installation candidate. #. Additionally, the ``rel=homepage`` and ``rel=download`` links are crawled by installation tools and, if HTML, are themselves scraped for release-file links in the above formats. Today, most packages released on PyPI host their release files on PyPI, but a small percentage (XXX need updated data) rely on external hosting. There are many reasons [2]_ why people have chosen external hosting. To cite just a few: - release processes and scripts have been developed already and upload to external sites - it takes too long to upload large files from some places in the world - export restrictions e.g. for crypto-related software - company policies which require offering open source packages through own sites - problems with integrating uploading to PyPI into one's release process (because of release policies) - desiring download statistics different from those maintained by PyPI - perceived bad reliability of PyPI - not aware that PyPI offers file-hosting Irrespective of the present-day validity of these reasons, there clearly is a history why people choose to host files externally and it even was for some time the only way you could do things. This PEP takes the position that there are at least some valid reasons for external hosting. Problem ------- **Today, python package installers (pip, easy_install, buildout, and others) often need to query many non-PyPI URLs even if there are no externally hosted files**. Apart from querying pypi.python.org's simple index pages, also all homepages and download pages ever specified with any release of a package are crawled by an installer. The need for installers to crawl external sites slows down installation and makes for a brittle and unreliable installation process. Those sites and packages also don't take part in the :pep:`381` mirroring infrastructure, further decreasing reliability and speed of automated installation processes around the world. Most packages are hosted directly on pypi.python.org [1]_. Even for these packages, installers still crawl their homepage and download-url, if specified. Many package uploaders are not aware that specifying the "homepage" or "download-url" in their package metadata will needlessly slow down the installation process for all users. Relying on third party sites also opens up more attack vectors for injecting malicious packages into sites using automated installs. A simple attack might just involve getting hold of an old now-unused homepage domain and placing malicious packages there. Moreover, performing a Man-in-The-Middle (MITM) attack between an installation site and any of the download sites can inject malicious packages on the installation site. As many homepages and download locations are using HTTP and not HTTPS, such attacks are not hard to launch. Such MITM attacks can easily happen even for packages which never intended to host files externally as their homepages are contacted by installers anyway. There is currently no way for package maintainers to avoid external-link crawling, other than removing all homepage/download url metadata for all historic releases. While a script [3]_ has been written to perform this action, it is not a good general solution because it removes useful metadata from PyPI releases. Even if the sites referenced by "Homepage" and "Download-URL" links were not scraped for further links, there is no obvious way under the current system for a package owner to link to an installable file from a long_description metadata field (which is shown as package documentation on ``/pypi/PKG``) without installation tools automatically considering that file a candidate for installation. Conversely, there is no way to explicitly register multiple external release files without putting them in metadata fields. Goals ----- These are the goals to be achieved by implementation of this PEP: * Package owners should be able to explicitly control which files are presented by PyPI to installer tools as installation candidates. Installation should not be slowed and made less reliable by extensive and unnecessary crawling of links that package owners did not explicitly nominate as installation files. * It should remain possible for package owners to choose to host their release files on their own hosting, external to PyPI. It should be easy for a user to request the installation of such releases using automated installer tools. * Automated installer tools should not install externally-hosted packages **by default**, but only when explicitly authorized to do so by the user. When tools refuse to install such a package by default, they should tell the user exactly which external link(s) they would need to follow, and what option(s) the user can provide to authorize the tool to follow those links. PyPI should provide all necessary metadata for installer tools to implement this easily and within a single request/reply interaction. * Migration from the status quo to the above points should be gradual and minimize breakage. This includes tooling that makes it easy for package owners with an existing release process that uploads to non-PyPI hosting to also upload those release files to PyPI. Solution / two transition phases ================================ The first transition phase introduces a "hosting-mode" field for each project on PyPI, allowing package owners explicit control of which release file links are served to present-day installation tools in the machine-readable ``simple/`` index. The first transition will, after successful hosting-mode manipulations by individual early-adopters, set a default hosting mode for existing packages, based on automated analysis. **Maintainers will be notified one month ahead of any such automated change**. At completion of the first transition phase, **all present-day existing release and installation processes and tools are expected to continue working**. Any remaining errors or problems are expected to only relate to installation of individual packages and can be easily corrected by package maintainers or PyPI admins if maintainers are not reachable. Also in the first phase, each link served in the ``simple/`` index will be explicitly marked as ``rel="internal"`` (hosted by the index itself) or ``rel="external"`` (linking to an external site that is not part of the index). In the second transition phase, PyPI client installation tools shall be updated to default to only install ``rel="internal"`` packages unless a user specifies option(s) to permit installing from external links. Maintainers of packages which currently host release files on non-PyPI sites shall receive instructions and tools to ease "re-hosting" of their historic and future package release files. This re-hosting tool MUST be available before automated hosting-mode changes are announced to package maintainers. Implementation ============== Hosting modes ------------- The foundation of the first transition phase is the introduction of three "modes" of PyPI hosting for a package, affecting which links are generated for the ``simple/`` index. These modes are implemented without requiring changes to installation tools via changes to the algorithm for generating the machine-readable ``simple/`` index. The modes are: - ``pypi-scrape-crawl``: no change from the current situation of generating machine-readable links for installation tools, as outlined in the history_. - ``pypi-scrape``: for a package in this mode, links to be added to the ``simple/`` index are still scraped from package metadata. However, the "Home-page" and "Download-url" links are given ``rel=ext-homepage`` and ``rel=ext-download`` attributes instead of ``rel=homepage`` and ``rel=download``. The effect of this (with no change in installation tools necessary) is that these links will not be followed and scraped for further candidate links by present-day installation tools: only installable files directly hosted from PyPI or linked directly from PyPI metadata will be considered for installation. Installation tools MAY evolve to offer an option to use the new rel-attribution to crawl external pages but MUST NOT default to it. - ``pypi-explicit``: for a package in this mode, only links to release files uploaded to PyPI, and external links to release files explicitly nominated by the package owner (via a new interface exposed by PyPI) will be added to the ``simple/`` index. Thus the hope is that eventually all projects on PyPI can be migrated to the ``pypi-explicit`` mode, while preserving the ability to install release files hosted externally via installer tools. Deprecation of hosting modes to eventually only allow the ``pypi-explicit`` mode is NOT REGULATED by this PEP but is expected to become feasible some time after successful implementation of the transition phases described in this PEP. It is expected that deprecation requires **a new process to deal with abandoned packages** because of unreachable maintainers for still popular packages. First transition phase (PyPI) ----------------------------- The proposed solution consists of multiple implementation and communication steps: #. Implement in PyPI the three modes described above, with an interface for package owners to select the mode for each package and register explicit external file URLs. #. For packages in all modes, label all links in the ``simple/`` index with ``rel="internal"`` or ``rel="external"``, to make it easier for client tools to distinguish the types of links in the second transition phase. #. Default all newly-registered packages to ``pypi-explicit`` mode (package owners can still switch to the other modes as desired). #. Determine (via an automated analysis tool) which packages have all installable files available on PyPI itself (group A), which have all installable files linked directly from PyPI metadata (group B), and which have installable versions available that are linked only from external homepage/download HTML pages (group C). #. Send mail to maintainers of projects in group A that their project will be automatically configured to ``pypi-explicit`` mode in one month, and similarly to maintainers of projects in group B that their project will be automatically configured to ``pypi-scrape`` mode. Inform them that this change is not expected to affect installability of their project at all, but will result in faster and safer installs for their users. Encourage them to set this mode themselves sooner to benefit their users. #. Send mail to maintainers of packages in group C that their package hosting mode is ``pypi-scrape-crawl``, list the URLs which currently are crawled, and suggest that they either re-host their packages directly on PyPI and switch to ``pypi-explicit``, or at least provide direct links to release files in PyPI metadata and switch to ``pypi-scrape``. Provide instructions and tools to help with these transitions. Second transition phase (installer tools) ----------------------------------------- For the second transition phase, maintainers of installation tools are asked to release two updates. The first update shall provide clear warnings if externally-hosted release files (that is, files whose link is ``rel="external"``) are selected for download, for which projects and URLs exactly this happens, and warn that in future versions externally-hosted downloads will be disabled by default. The second update should change the default mode to allow only installation of ``rel="internal"`` package files, and allow installation of externally-hosted packages only when the user supplies an option (ideally an option specifying exactly which external domains are to be trusted as download sources). When download of an externally-hosted package is disallowed, the user should be notified, with instructions for how to make the install succeed and warnings about the implication (that a file will be downloaded from a site that is not part of the package index). Open Questions / tasks ======================= - Should we introduce some form of PyPI API versioning in this PEP? (it might complicate matters and delay the implementation but is often seen as good practise). - do another round of discussions with installation tool authors and see about incorporating their feedback. There is one known issue in particular from Philip J. Eby who considers a host-based pattern matching algorithm preferable to interpreting "rel" attributes. References ========== .. [1] Donald Stufft, ratio of externally hosted versus pypi-hosted, http://mail.python.org/pipermail/catalog-sig/2013-March/005549.html (XXX need to update this data for all easy_install-supported formats) .. [2] Marc-Andre Lemburg, reasons for external hosting, http://mail.python.org/pipermail/catalog-sig/2013-March/005626.html .. [3] Holger Krekel, Script to remove homepage/download metadata for all releases http://mail.python.org/pipermail/catalog-sig/2013-February/005423.html Acknowledgments ================ Philip Eby for precise information and the basic ideas to implement the transition via server-side changes only. Donald Stufft for pushing away from external hosting and offering to implement both a Pull Request for the necessary PyPI changes and the analysis tool to drive the transition phase 1. Marc-Andre Lemburg, Nick Coghlan and catalog-sig in general for thinking through issues regarding getting rid of "external hosting". Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: From sah at awesame.org Sun Mar 17 19:34:16 2013 From: sah at awesame.org (Steven Hazel) Date: Sun, 17 Mar 2013 11:34:16 -0700 Subject: [Python-ideas] Thoughts on the future of concurrency in Python: PEP 3148 Message-ID: Hi, I'm one of the authors of monocle (https://github.com/saucelabs/monocle), an async Python programming framework with a blocking look-alike syntax based on generators. We've been using monocle at my startup Sauce Labs, and I used its predecessor on a small team at BitTorrent. I think it's met with some amazing success vs. similar projects using threads. In about five years of working with teams with some members who haven't always understood monocle very well, we've never written a deadlock, for example. I've been keeping up with PEPs and skimming python-ideas discussions about concurrency-related features in Python, and feeling unsure about how best to jump into the conversation. Today it occurred to me that sooner is better and I should just get all my various opinions out there in case they're helpful. First of all, I want to reiterate: while I have mainly criticism to offer, it is only intended to be helpful. I don't have a lot of time to devote to this myself, so I'm trying to do what I can to be useful to those who do. I think I'll take this one PEP at a time, and start with the easiest one. PEP 3148 - Futures First, I like the idea of this PEP. I think it'll improve interoperability between async frameworks in Python. I believe the name "future" is a poor choice. After many years of explaining this concept to programmers, I've found that its various abstract-conceptual names do a lot to confuse people. Twisted's "deferred" is probably the worst offender here, but both "future" and "promise" also make this very simple construct sound complicated, and it's a crime. There are two ways I've found to explain this that seem to keep people from getting confused: The first is, call it a "callback registry". Many programmers are familiar with the idea of passing callbacks into a function. You just explain that here, instead of taking the callbacks as parameters, we return a callback registry object and you can add your callbacks there. (If the operation is already done they'll just get called when you add them.) The second is a variant on that: say that this function, instead of taking a callback as a parameter, returns its callback. That sounds useless, but fortunately the callback it returns is a callable object, and it also has an "add" method that lets you add functions that it will call when called back. So it's a callback that you can manipulate to do what you want. (This is what we do in monocle: https://github.com/saucelabs/monocle/blob/master/monocle/callback.py) These approaches, by focusing on the simple and familiar idea of callbacks, do a lot to eliminate confusing ideas about communicating with the future of the process. Finally, I'm not sure I get the point of the Executor class in PEP 3148. It seems to me that any real implementation of that class needs a call_later method for scheduling operations, and a way of connecting up to event-based IO APIs. I don't really understand what Executor would be good for. Thanks for reading. -- Steven Hazel Sauce Labs Cofounder and VP Product http://saucelabs.com/ work: sah at saucelabs.com other: sah at awesame.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun Mar 17 21:42:56 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 17 Mar 2013 13:42:56 -0700 Subject: [Python-ideas] Thoughts on the future of concurrency in Python: PEP 3148 In-Reply-To: References: Message-ID: On 17 Mar 2013 11:35, "Steven Hazel" wrote: > > Hi, > > I'm one of the authors of monocle (https://github.com/saucelabs/monocle), an async Python programming framework with a blocking look-alike syntax based on generators. We've been using monocle at my startup Sauce Labs, and I used its predecessor on a small team at BitTorrent. I think it's met with some amazing success vs. similar projects using threads. In about five years of working with teams with some members who haven't always understood monocle very well, we've never written a deadlock, for example. > > I've been keeping up with PEPs and skimming python-ideas discussions about concurrency-related features in Python, and feeling unsure about how best to jump into the conversation. Today it occurred to me that sooner is better and I should just get all my various opinions out there in case they're helpful. > > First of all, I want to reiterate: while I have mainly criticism to offer, it is only intended to be helpful. I don't have a lot of time to devote to this myself, so I'm trying to do what I can to be useful to those who do. > > I think I'll take this one PEP at a time, and start with the easiest one. > > PEP 3148 - Futures This PEP isn't new, it was released in Python 3.2, and is available for 2.x as a backport on PyPI. Cheers, Nick. > > First, I like the idea of this PEP. I think it'll improve interoperability between async frameworks in Python. > > I believe the name "future" is a poor choice. After many years of explaining this concept to programmers, I've found that its various abstract-conceptual names do a lot to confuse people. Twisted's "deferred" is probably the worst offender here, but both "future" and "promise" also make this very simple construct sound complicated, and it's a crime. > > There are two ways I've found to explain this that seem to keep people from getting confused: > > The first is, call it a "callback registry". Many programmers are familiar with the idea of passing callbacks into a function. You just explain that here, instead of taking the callbacks as parameters, we return a callback registry object and you can add your callbacks there. (If the operation is already done they'll just get called when you add them.) > > The second is a variant on that: say that this function, instead of taking a callback as a parameter, returns its callback. That sounds useless, but fortunately the callback it returns is a callable object, and it also has an "add" method that lets you add functions that it will call when called back. So it's a callback that you can manipulate to do what you want. (This is what we do in monocle: https://github.com/saucelabs/monocle/blob/master/monocle/callback.py) > > These approaches, by focusing on the simple and familiar idea of callbacks, do a lot to eliminate confusing ideas about communicating with the future of the process. > > Finally, I'm not sure I get the point of the Executor class in PEP 3148. It seems to me that any real implementation of that class needs a call_later method for scheduling operations, and a way of connecting up to event-based IO APIs. I don't really understand what Executor would be good for. > > Thanks for reading. > > -- > Steven Hazel > Sauce Labs Cofounder and VP Product > http://saucelabs.com/ > work: sah at saucelabs.com > other: sah at awesame.org > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Mar 17 22:00:32 2013 From: guido at python.org (Guido van Rossum) Date: Sun, 17 Mar 2013 14:00:32 -0700 Subject: [Python-ideas] Thoughts on the future of concurrency in Python: PEP 3148 In-Reply-To: References: Message-ID: Hi Steven! On Sun, Mar 17, 2013 at 11:34 AM, Steven Hazel wrote: > Hi, > > I'm one of the authors of monocle (https://github.com/saucelabs/monocle), an > async Python programming framework with a blocking look-alike syntax based > on generators. We've been using monocle at my startup Sauce Labs, and I used > its predecessor on a small team at BitTorrent. I think it's met with some > amazing success vs. similar projects using threads. In about five years of > working with teams with some members who haven't always understood monocle > very well, we've never written a deadlock, for example. > > I've been keeping up with PEPs and skimming python-ideas discussions about > concurrency-related features in Python, and feeling unsure about how best to > jump into the conversation. Today it occurred to me that sooner is better > and I should just get all my various opinions out there in case they're > helpful. > > First of all, I want to reiterate: while I have mainly criticism to offer, > it is only intended to be helpful. I don't have a lot of time to devote to > this myself, so I'm trying to do what I can to be useful to those who do. > > I think I'll take this one PEP at a time, and start with the easiest one. > > PEP 3148 - Futures > > First, I like the idea of this PEP. I think it'll improve interoperability > between async frameworks in Python. > > I believe the name "future" is a poor choice. After many years of explaining > this concept to programmers, I've found that its various abstract-conceptual > names do a lot to confuse people. Twisted's "deferred" is probably the worst > offender here, but both "future" and "promise" also make this very simple > construct sound complicated, and it's a crime. > > There are two ways I've found to explain this that seem to keep people from > getting confused: > > The first is, call it a "callback registry". Many programmers are familiar > with the idea of passing callbacks into a function. You just explain that > here, instead of taking the callbacks as parameters, we return a callback > registry object and you can add your callbacks there. (If the operation is > already done they'll just get called when you add them.) > > The second is a variant on that: say that this function, instead of taking a > callback as a parameter, returns its callback. That sounds useless, but > fortunately the callback it returns is a callable object, and it also has an > "add" method that lets you add functions that it will call when called back. > So it's a callback that you can manipulate to do what you want. (This is > what we do in monocle: > https://github.com/saucelabs/monocle/blob/master/monocle/callback.py) > > These approaches, by focusing on the simple and familiar idea of callbacks, > do a lot to eliminate confusing ideas about communicating with the future of > the process. > > Finally, I'm not sure I get the point of the Executor class in PEP 3148. It > seems to me that any real implementation of that class needs a call_later > method for scheduling operations, and a way of connecting up to event-based > IO APIs. I don't really understand what Executor would be good for. > > Thanks for reading. > > -- > Steven Hazel > Sauce Labs Cofounder and VP Product > http://saucelabs.com/ > work: sah at saucelabs.com > other: sah at awesame.org > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- --Guido van Rossum (python.org/~guido) From guido at python.org Sun Mar 17 22:06:48 2013 From: guido at python.org (Guido van Rossum) Date: Sun, 17 Mar 2013 14:06:48 -0700 Subject: [Python-ideas] Thoughts on the future of concurrency in Python: PEP 3148 In-Reply-To: References: Message-ID: Hi Steve! (Whoops, pressed the wrong button.) On Sun, Mar 17, 2013 at 11:34 AM, Steven Hazel wrote: > Hi, > > I'm one of the authors of monocle (https://github.com/saucelabs/monocle), an > async Python programming framework with a blocking look-alike syntax based > on generators. We've been using monocle at my startup Sauce Labs, and I used > its predecessor on a small team at BitTorrent. I think it's met with some > amazing success vs. similar projects using threads. In about five years of > working with teams with some members who haven't always understood monocle > very well, we've never written a deadlock, for example. I'm really sorry -- in my keynote I forgot to mention that Monocle was one of my inspirations for using yield to do async things in NDB, which I did for App Engine. Monocole has greatly guided my thoughts (more than Twisted's somewhat-equivalent inlineCallbacks). > I've been keeping up with PEPs and skimming python-ideas discussions about > concurrency-related features in Python, and feeling unsure about how best to > jump into the conversation. Today it occurred to me that sooner is better > and I should just get all my various opinions out there in case they're > helpful. Great! > First of all, I want to reiterate: while I have mainly criticism to offer, > it is only intended to be helpful. I don't have a lot of time to devote to > this myself, so I'm trying to do what I can to be useful to those who do. > > I think I'll take this one PEP at a time, and start with the easiest one. > > PEP 3148 - Futures > > First, I like the idea of this PEP. I think it'll improve interoperability > between async frameworks in Python. > > I believe the name "future" is a poor choice. After many years of explaining > this concept to programmers, I've found that its various abstract-conceptual > names do a lot to confuse people. Twisted's "deferred" is probably the worst > offender here, but both "future" and "promise" also make this very simple > construct sound complicated, and it's a crime. As Nick pointed out, this water is already under the 3.2 bridge. Too late to change now. > There are two ways I've found to explain this that seem to keep people from > getting confused: > > The first is, call it a "callback registry". Many programmers are familiar > with the idea of passing callbacks into a function. You just explain that > here, instead of taking the callbacks as parameters, we return a callback > registry object and you can add your callbacks there. (If the operation is > already done they'll just get called when you add them.) I actually want to de-emphasize the fact that Futures hold callbacks. I want people to think them as magic "wait points" that combine with "yield from" to return a pseudo-synchronous result. In fact, I want to altogether de-emphasize that async I/O is done under the hood using callbacks. Only implementers of async frameworks should need to know that, for the most part. > The second is a variant on that: say that this function, instead of taking a > callback as a parameter, returns its callback. That sounds useless, but > fortunately the callback it returns is a callable object, and it also has an > "add" method that lets you add functions that it will call when called back. > So it's a callback that you can manipulate to do what you want. (This is > what we do in monocle: > https://github.com/saucelabs/monocle/blob/master/monocle/callback.py) > > These approaches, by focusing on the simple and familiar idea of callbacks, > do a lot to eliminate confusing ideas about communicating with the future of > the process. Hm. I find the idea of callbacks far from simple nor familiar. > Finally, I'm not sure I get the point of the Executor class in PEP 3148. It > seems to me that any real implementation of that class needs a call_later > method for scheduling operations, and a way of connecting up to event-based > IO APIs. I don't really understand what Executor would be good for. It's a threadpool. > Thanks for reading. Thanks for thinking with us. And thanks for Monocle! (And for paying to fly Raymond to EuroPython a few years ago so I could hear him talk about it. :-) -- --Guido van Rossum (python.org/~guido) From rob.cliffe at btinternet.com Sun Mar 17 22:20:40 2013 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Sun, 17 Mar 2013 21:20:40 +0000 Subject: [Python-ideas] Thoughts on the future of concurrency in Python: PEP 3148 In-Reply-To: References: Message-ID: <514633A8.8020908@btinternet.com> On 17/03/2013 21:06, Guido van Rossum wrote: > I actually want to de-emphasize the fact that Futures hold callbacks. > I want people to think them as magic "wait points" that combine with > "yield from" to return a pseudo-synchronous result. In fact, I want to > altogether de-emphasize that async I/O is done under the hood using > callbacks. Only implementers of async frameworks should need to know > that, for the most part. Hm, as someone who's had to get to grips with Twisted, I find it frustrating and disorientating when I don't know how things work. I find that understanding what goes on under the hood is both helpful and reassuring. With great respect, Guido, something on the lines of [I exaggerate to make the point clear, no offence intended] "The average application programmer needn't trouble his pretty little head about this" feels a bit patronising. Rob Cliffe From rosuav at gmail.com Sun Mar 17 22:46:33 2013 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 18 Mar 2013 08:46:33 +1100 Subject: [Python-ideas] Thoughts on the future of concurrency in Python: PEP 3148 In-Reply-To: <514633A8.8020908@btinternet.com> References: <514633A8.8020908@btinternet.com> Message-ID: On Mon, Mar 18, 2013 at 8:20 AM, Rob Cliffe wrote: > With great respect, Guido, something on the lines of [I exaggerate to make > the point clear, no offence intended] "The average application programmer > needn't trouble his pretty little head about this" feels a bit patronising. But that's no different from all the other things that the average Python programmer need not worry about, yet which lower-level coders (eg the developers of Pythons themselves) do - for instance, list growth factors, platform-specific APIs (in most cases), and native integer and pointer sizes. Is it patronising to imply that Python coders need not trouble themselves about those? Certainly not, that's why high level languages exist! My understanding of Guido's statement is that callbacks are just part of the low-level implementation of futures. ChrisA From sah at awesame.org Sun Mar 17 23:14:29 2013 From: sah at awesame.org (Steven Hazel) Date: Sun, 17 Mar 2013 15:14:29 -0700 (PDT) Subject: [Python-ideas] Thoughts on the future of concurrency in Python: PEP 3148 In-Reply-To: References: Message-ID: On Sunday, March 17, 2013 2:06:48 PM UTC-7, Guido van Rossum wrote: > > As Nick pointed out, this water is already under the 3.2 bridge. Too > late to change now. > Oops, didn't realize it was already out. In that case, please take this as a suggestion that the docs might benefit from an explanation like "A Future is basically just a callback registry. Rather than taking callbacks as parameters, functions can return a Future, where callbacks can be registered with Future.add_done_callback." > I actually want to de-emphasize the fact that Futures hold callbacks. > I want people to think them as magic "wait points" that combine with > "yield from" to return a pseudo-synchronous result. In fact, I want to > altogether de-emphasize that async I/O is done under the hood using > callbacks. Only implementers of async frameworks should need to know > that, for the most part. > It's true that when you're using a generator-based async framework, you can and should think of a Future as a "thing you can wait on" most of the time. My experience with monocle though is that it is helpful rather than harmful to reveal that they're about callbacks. In the early days of monocle, we were using Deferreds, which a lot of new monocle users didn't really understand, and people tended to get very confused about what kinds of things a Deferred could possibly be doing. Explaining things in terms of callbacks was helpful in getting people to understand monocle. I don't think you can really abstract the idea of callbacks away from Futures without making them more mystifying. Callbacks are not an implementation detail of Futures, they're essential to what Futures do, so essential that I think CallbackRegistry is pretty good alternative name, and in monocle we actually called the class Callback. I sometimes explain that a monocle oroutine is "yielding on a callback" ? meaning it's waiting now and it'll resume when called back. The name helps explain what's happening, even for an user of the framework. Hm. I find the idea of callbacks far from simple nor familiar. > Many programmers have used callbacks in other contexts. But, even when they haven't, "you pass in a function that will get called when the operation is done" is an idea I've watched many people grasp immediately the first time they saw it. In contrast, Futures, which are almost the same idea, are often viewed as sophisticated black magic, and I've more than once heard them explained in terms of *time travel*. > It's a threadpool. > Oh, I see. Well, for what it's worth, including this next to Futures confused me. I kind of implied to me when I read the PEP that Executors are somehow necessary to using Futures, when in fact they're just one of many contexts in which a Future might be a good API. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dreamingforward at gmail.com Mon Mar 18 04:53:17 2013 From: dreamingforward at gmail.com (Mark Janssen) Date: Sun, 17 Mar 2013 20:53:17 -0700 Subject: [Python-ideas] Message passing syntax for objects Message-ID: Hello, I just posted an answers on quora.com about OOP (http://qr.ae/TM1Vb) and wanted to engage the python community on the subject. Alan Kay's idea of message-passing in Smalltalk are interesting, and like the questioner says, never took off. My answer was that Alan Kay's abstraction of "Everything is an object" fails because you can't have message-passing, an I/O task, working in the same space as your objects -- they are two very different functionalities and they have to be preserved **for the programmer**. This functional separation made me think that Python could benefit from a syntactical, language-given separation between Classes and the messages between them, to encourage loosely-coupled, modular OOP. Something that OOP has always promised but never delivered. I think we should co-opt C++'s poorly used >> and << I/O operators (for files) and re-purpose them for objects/classes. One could then have within interpreter space, the ability to pass in a message to an object. >>> 42 >> MyObject #sends 42 as a message into MyObject The Object definition would then have special methods __in__ to receive data and a special way of outputing data that can be caught __str__(?). I'm hoping the community can comment on the matter.... Thanks, Mark Tacoma, Washington From graffatcolmingov at gmail.com Mon Mar 18 05:00:46 2013 From: graffatcolmingov at gmail.com (Ian Cordasco) Date: Mon, 18 Mar 2013 00:00:46 -0400 Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: References: Message-ID: On Sun, Mar 17, 2013 at 11:53 PM, Mark Janssen wrote: > Hello, > > I just posted an answers on quora.com about OOP (http://qr.ae/TM1Vb) > and wanted to engage the python community on the subject. > > Alan Kay's idea of message-passing in Smalltalk are interesting, and > like the questioner says, never took off. My answer was that Alan > Kay's abstraction of "Everything is an object" fails because you can't > have message-passing, an I/O task, working in the same space as your > objects -- they are two very different functionalities and they have > to be preserved **for the programmer**. > > This functional separation made me think that Python could benefit > from a syntactical, language-given separation between Classes and the > messages between them, to encourage loosely-coupled, modular OOP. > Something that OOP has always promised but never delivered. > > I think we should co-opt C++'s poorly used >> and << I/O operators > (for files) and re-purpose them for objects/classes. One could then > have within interpreter space, the ability to pass in a message to an > object. > >>>> 42 >> MyObject #sends 42 as a message into MyObject > > The Object definition would then have special methods __in__ to > receive data and a special way of outputing data that can be caught > __str__(?). > > I'm hoping the community can comment on the matter.... > > Thanks, > > Mark What then happens to the binary shift operators: >> and < References: Message-ID: >> I think we should co-opt C++'s poorly used >> and << I/O operators >> (for files) and re-purpose them for objects/classes. One could then >> have within interpreter space, the ability to pass in a message to an >> object. >> >>>>> 42 >> MyObject #sends 42 as a message into MyObject >> > What then happens to the binary shift operators: >> and < defined by __rshift__ and __lshift__ (respectively) on an object > already (help(int)). You could co-opt those operators with those > methods on your object but that would probably confuse plenty of > people. Ah right. But then, the shift operators pale in comparison to a uniform way of passing messages between objects. By building it into the language, it would *enforce* a modular object style, rather than the current, very specialized and very programmer specific way there is now. In fact, most people never really think in that paradigm, yet if the language supported/proposed such a syntax, programmers would start to re-arrange the whole object hierarchy in a new, more modular and universal way. mark From dreamingforward at gmail.com Mon Mar 18 05:26:41 2013 From: dreamingforward at gmail.com (Mark Janssen) Date: Sun, 17 Mar 2013 21:26:41 -0700 Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: References: Message-ID: Continuing on this thread, there would be a new bunch of behaviors to be defined. Since "everything is an object", there can now be a standard way to define the *next* common abstraction of "every object interacts with other objects". And going with my suggestion of defining >> and << operators, I'm going to explore the concept further.... >>> 42 >> MyNumberType #would add the integer to your integer type >>> 42 >> MyCollectionType #would add the object into your collection: *poof*: no more random syntaxiis for putting things in collections.\ >>> MyObject >> # queries the object to output its state. >>> "http://www.cnn.com" >> MyInternetObject #outputs the HTML text from CNN's home page. Each object has to figure out how it will receive things from outside of it. Things it can't handle (a string sent to an int) just have to be dropped to some other space, much like stderr does within the O.S. There are probably many other very interesting examples, but the key idea I'm working on (as noted in other messages), is a sort-of universal language for the internet, a WebOS to be applied to a universal data model. Mark From dreamingforward at gmail.com Mon Mar 18 05:40:12 2013 From: dreamingforward at gmail.com (Mark Janssen) Date: Sun, 17 Mar 2013 21:40:12 -0700 Subject: [Python-ideas] Message passing for objects Message-ID: Just the thing I'm talking about. So I'm not the only one trying out these ideas..... :) (via the p2p-foundation mailing list) ---------- Forwarded message ---------- From: ProjectParadigm-ICT-Program Date: Sat, Mar 16, 2013 at 3:12 PM Subject: Re: A Distributed Economy -- A blog involving Linked Data To: Brent Shambaugh , Michel Bauwens Cc: Samuel Rose , "public-lod at w3.org" , Paul Cockshott Dear Brent, Thanks for bringing up this issue again, and let me rephrase. When considering resilience there are three networked systems to consider: humans, things and ideas/data/information. All three somehow now mesh and interact through the Internet and the Internet of Things. The conceptual model for the Semantic Web incorporates all three. Now if we want to complete this model we must incorporate the (Global) Ecosystems of Planet Earth. Thus we are able to create economic models that incorporate security, resilience, reliability and sustainability. As we speak thousands and thousands of engineers and scientists are already tackling the fundamental task of coming up with new all-encompassing paradigms and the 7th IEEE International Conference on Digital Ecosystems and Technologies will deal with quite a few of these issues. I will soon be setting up a blog which will deal with resilience, more specifically resilient grids and resilient grid technologies. The theme of resilience whether it relates to cyber space, natural ecosystems and resources management as part of sustainable economic development, human society or economic assets vis a vis natural disasters or conflict is proving to be be one of the hottest new themes in funded research, and one main central theme in Horizon 2020, the new European Union 71 billion euro research fund. Milton Ponson GSM: +297 747 8280 PO Box 1154, Oranjestad Aruba, Dutch Caribbean Project Paradigm: A structured approach to bringing the tools for sustainable development to all stakeholders worldwide by creating ICT tools for NGOs worldwide and: providing online access to web sites and repositories of data and information for sustainable development This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. ________________________________ From: Brent Shambaugh To: Michel Bauwens Cc: ProjectParadigm-ICT-Program ; Samuel Rose ; "public-lod at w3.org" ; Paul Cockshott Sent: Thursday, March 14, 2013 7:12 PM Subject: Re: A Distributed Economy -- A blog involving Linked Data In an attempt to understand the conversation we had, I was sent in a flurry of confusion. I started checking out books, and one was Resilience by Andrew Zolli and Ann Marie Healy. I found a few quotes that seem exciting to me: "Adhocracies thrive on data. And by the stroke of fantastic luck, we're currently witnessing the global birth of an adhocracy of data -- a global revolution that, for the first time, empowers orgranizations with the capacity to collect and correlate widely distributed real-time information about the way many critical systems are performing. This kind of open data will play a central role in resilience strategies for years to come." pg. 266, Resilience, Andrew Zolli and Ann Marie Healy "And for organizations of all types there is a powerful lesson here: Resilience benefits accrue to organizations that prioritize the collection, presentation, and sharing of data." pg. 269, Resilience, Andrew Zolli and Ann Marie Healy "A related theme in the resilience discussion is the importance of networks, which provide a universal, abstract reference system for describing how information, resources, and behaviours flow GSM: +297 747 8280 PO Box 1154, Oranjestad Aruba, Dutch Caribbean Project Paradigm: A structured approach to bringing the tools for sustainable development to all stakeholders worldwide by creating ICT tools for NGOs worldwide and: providing online access to web sites and repositories of data and information for sustainable development This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. ________________________________ From: Brent Shambaugh To: Michel Bauwens Cc: ProjectParadigm-ICT-Program ; Samuel Rose ; "public-lod at w3.org" ; Paul Cockshott Sent: Thursday, March 14, 2013 7:12 PM Subject: Re: A Distributed Economy -- A blog involving Linked Data In an attempt to understand the conversation we had, I was sent in a flurry of confusion. I started checking out books, and one was Resilience by Andrew Zolli and Ann Marie Healy. I found a few quotes that seem exciting to me: "Adhocracies thrive on data. And by the stroke of fantastic luck, we're currently witnessing the global birth of an adhocracy of data -- a global revolution that, for the first time, empowers orgranizations with the capacity to collect and correlate widely distributed real-time information about the way many critical systems are performing. This kind of open data will play a central role in resilience strategies for years to come." pg. 266, Resilience, Andrew Zolli and Ann Marie Healy "And for organizations of all types there is a powerful lesson here: Resilience benefits accrue to organizations that prioritize the collection, presentation, and sharing of data." pg. 269, Resilience, Andrew Zolli and Ann Marie Healy "A related theme in the resilience discussion is the importance of networks, which provide a universal, abstract reference system for describing how information, resources, and behaviours flow through many complex systems. Having a common means to describe biological, economic, and ecological systems, for example, allows researchers to make comparisons between the ways these very different kinds of entities approach similar problems, such as stopping a contagion - whether an actual virus, a financial panic, an unwanted behavior, or an environmental contaminant - when it begins to spread. Having a shared frame of reference allows us to consider how successful tactics in one domain might be applied to another - as we'll see in newly emerging fields like ecological finance." pg 19, Resilience, Andrew Zolli and Ann Marie Healy "Rather the resilience frame suggests a different, complementary effort to mitigation: to redesign our institutions, embolden our communities, encourage innovation and experimentation, and support our people in ways that will help them be prepared and cope with surprises and distruptions, even as we work to fend them off." pg 23, Resilience, Andrew Zolli and Ann Marie Healy It is interesting that Buckminster Fuller wrote about similar ideas over 30 years ago: "The inefficiency of automobiles' reciprocating engines - and their traffic-system-wasted fuel - and the energy inefficiency of today's buildings, are only two of hundreds of thousands of instances that can be cited of the design-avoidable energy wastage. But the technical raison d'etre for either the energy-effectiveness gains or losses is all completely invisible to human eyes. Thus, the significance of their omni-integratable potentialities is uncomprehended by either the world's leaders or the led. Neither the great political and financial power structures of the world, nor the specialization-blinded professionals, nor the population in general realize the sum-totally the omni-engineering-integratable, invisible revolution in metallurgical, chemical, and electronic arts now makes it possible to do so much more with ever fewer pounds and volumes of material, ergs of energy, and seconds of time per given technological function that it is now highly feasible to take care of everybody on Earth at a "higher standard of living than any have ever known.", pg. xxv, Critical Path, R. Buckminster Fuller "World Game will become increasingly effective in its prognoses and programming when the world-around, satellite-interrelayed computer system and its omni-Universe-operative (time-energy) accounting system are established. This system will identify the kilowatt-hour-expressed world inventory of foods, raw and recirculating resources, and all the world's unique mechanical and structural capabilities and their operating capacities as well as the respective kilowatt-hours of available energy-income-derived operating power with which to put their facilities to work. All the foregoing information will become available in respect to all the world-around technology's environment-controlling, life-sustaining, travel- and communication-accomidating structures and machines.", pg. 219, Critical Path, R. Buckminster Fuller I'm happy that Milton Ponson pointed out Resilience. I had never thought about resilience before. Looking into it was very gratifying. It gave me some confidence that I was perhaps doing some things right, but at the same time startled me by how much there is to learn to somehow survive the free fall. Doing a search for Linked Data and Resilience gave me a result from rkbexpolorer (http://www.rkbexplorer.com/explorer/#display=project-{http%3A//wiki.rkbexplorer.com/id/resist}) which is from the ReSIST (Resilience for Survivability) project in Europe (http://www.resist-noe.org/). They also have some links to some free course material at . I believe my blog evolved to explore a peer-to-peer economy. Michael Bauwens desribes such economies as distributed networks, "As political, economic, and social systems transform themselves into distributed networks, a new human dynamic is emerging: peer to peer (P2P). As P2P gives rise to the emergence of a third mode of production, a third mode of governence, and a third mode of property, it is poised to overhaul our political economy in unprecendented ways." (www.ctheory.net/articles.aspx?id=499) This suggests something broader. As a result of our conversation, I also looked at some people with socialist views such as Roberto Verzola, W. Paul Cockshott and Allin Cottrell, Raoul Viktor, and Heinz Dieterich. Roberto Verzola describes an economy of abundance, which may indeed be linked to P2P technologies. "An economy of abundance seeks to dismantle or reform these scarcity-generating institutions in such a way as to affirm our freedom to live life as art (self-expression to others), social equity (so that everything can live life as art), and sustainability (so that all life can thrive into the future). Among other things, this implies a much greater role for various forms of shared property, individual an community-level self-reliance, and participatory decision making." (http://www.shareable.net/blog/event-the-economics-of-abundance) He also argues that for innovation to proceed, everyone seeking knowledge should have access to it. "the most important means to ensure that innovation can proceed is to ensure that everyone seeking knowledge has access to it. ... Knowledge that helps empower people depends on openness, while knowledge that is used to coerce, to exert power over the disempowered, thrives on secrecy" p. 150, The Economics of Abunance: A Political Economy of Freedom, Equity and Sustainability, Roberto Verzola This seems to align well with my present feelings. I feel that engineering is so saturated with IP, that it is hard to feel like you're not going to be doing that. At the same time I want to develop my skills and thrive. How do become a Professional Engineer and not feel like you're going to be doing that? What if you don't like the lawyer saturated culture where people are suing other people over some idea you or someone else produced? I can sense that a lot of people, especially in the hacker and maker community, want to be able to support themselves and work on cool new things but don't want to deny other people to work on the same cool things. Why do ideas have to take on a life of their own and become part of something you might be employed by, but have no control beyond that? Sorry about the extreme language, but why do I imagine it as making deal with the devil? Betraying your friends so you can enjoy life and eat? In this, there is an underlying assumption that there are institutions that do not want to share or partner, or make it very difficult. If it is easier, I feel that could be better. Buckminster Fuller also wrote about such things: "2. Grandmother taught us the Golden Rule: "Love thy neighbor as thy self--do unto others as you would they should do unto you. 3. As we became older and more experienced, out uncles began to caution us to get over our sensitivity. "Life is hard," they explained. 'There is nowhere nearly enough life support for everybody on our planet, let alone enough for a comfortable life support. If you want to raise a family and have a comfortable life for them, you are going to have to deprive many others of the opportunity to survive and the sooner, the better. Your grandmother's Golden Rule is beautiful, but it doesn't work.'" p. 123. Critical Path, R. Buckminster Fuller Is it possible to have a win-win between people an business? Are there any financial barriers to entry and/or partnership? Sometimes I fear that I will never be paid enough to implement my ideas, or if I do then it will be too late to enter the market. Either I can't afford to do the work, or someone who developed and patented something that matches at least some of my idea decides not to involve me. Thus I would question spending the time developing my idea. I'm assuming I can develop my idea for my own personal use (possibly not?). However, I am more certain I may have trouble sharing and selling things developed from my ideas. I imagine that this favors those who already have money. People like Eric von Hippel and Michael Bauwens both speak about a lot of innovation goinf on outside the firm. For example Michel Bauwen's states: "The French-Italian school of 'cognitive capitalism' stresses the value creation today is no longer confined to the enterprise, but beholden to the mass intellectuality of knowledge workers, who through their lifelong learning/experiencing and systematic connectivity, constantly innovate within and without the enterprise. This is an important argument, since it would justify what we see as the only solution for the expansion of the P2P sphere into a society at large: the universal basic income. Only the independence of work and the salary structure can guarantee the peer producers can continue to create this sphere of highly productive use value." The Political Economy of Peer Production (www.ctheory.net/articles.aspx?id=499) Eric von Hippel also speaks about his book. (http://commons.wikimedia.org/wiki/File:20060123-Eric.von.Hippel-Democratizing.Innovation.ogg) However, giving people money for doing nothing makes me feel uncomfortable. Of course, as Heinz Dietrich suggests, if you already have money, things work just fine: "The first step, in fact, would be to establish a new cybernetic principle; you need something that coordinates billions of economic transactions everyday. And, so far, the market has been a relatively well-functioning system under two conditions: If the market is not monopolistic and you have the buying power for the merchandise you produce and for the services, then the market coordinates quite well."-- The Socialism of the 21st Century (http://eipcp.net/transversal/0805/dieterich/en) Paul Cockshott, and Allin Cottrell suggest a payment system determined democratically, "The payment system outlined in chapter 2 depends on the idea that the total labour content of each product or service can be calculated." p. 8, Towards a New Socialism If this is by the state, then I am moved to say I do not trust the government to do much right at all. Certainly, this is what I feel if I spend any length of time watching the news. But I would like to look into it. If this develops into something, the state should be involved at some level. I feel bad about this. I'll have to read more. However, I agree that with more democracy things would be better. "The principal bases for a post-Soviet socialism must be radical democracy and efficient planning. The democratic element, it is now clear, is not a luxury, or something that can be postponed until conditions are especially favourable. Without democracy, as we have argued above, the leaders of a socialist society will be driven to coercion in order to ensure the production of a surplus product, and if coercion slackens the system will tend to stagnate." p. 7, Towards a New Socialism I definately think there needs to be some way to accomplish things that makes it fair to people. In terms of me, I believe this underlies a larger problem than me being connected with the right job or being afraid of debt going back to school. It is the problem of connecting people with the right jobs, utilizing the skills they already have so they don't have to fear paying to learn what they already know, and raising awareness that the jobs are there. I dream about linked data being able to illuminate relationships between present skills and related skills to job seekers and employers. I also dream about linked data allowing people to market themselves with clarity as a basket of skills that represents who they really are rather than a basket of skills that was set by a well-meaning college, trade-school, or university. I honestly believe that people who do something they have a passion for, will be more effective employees or entrepreneurs. But how to pay for it? If you take out debt you need to find a way to pay it off. If you can't find something that reflects your values you may feel like you're enslaved to something else while trying to pay it off. It can be a pressing struggle as Paul Grignon's Money as Debt video on Youtube describes (https://www.youtube.com/watch?v=0K5_JE_gOys). Paul Grigon and others say that our present monetary system leads to infinite growth. "We need to become politically sensitive to the invisible architectures of power. In distributed systems, where there is no overt hierarchy, power is a function of design. One such system, perhaps the most important of all, is the monetary system, whose interest-bearing design requires the market to be linked to a system of infinite growth, and this link needs to be broken. A global reform of the monetary system, or the spread of new means of direct social production of money, are the necessary conditions for such a break." (http://p2pfoundation.net/About_The_Foundation) I'd imagine this would create no problems for people as long as there is the will and resources to grow infinitely. However, Paul Grigon points out an exception: those with the money to lend at interest will eventually wind up with all of the money, and due to forclosure the property too. My site explores distributed funding. (http://adistributedeconomy.blogspot.com/2012/03/distributed-funding.html). I am still not sure how exactly to accomplish it. I think it may involve something like Ripple (https://classic.ripplepay.com/) and PaySwarm (http://payswarm.com/). A friend of mine pointed out that it did not seem that Ripple allows to keep track of what you owe who, whereas PaySwarm appears to do so. I may need to develop something on my own (http://lists.w3.org/Archives/Public/public-webpayments/2013Feb/0034.html) that involes donations, and whatever models are needed. Embarrasingly, I'm still learning JavaScript. Thankfully, my friends are also encouraging me to focus on some small project. A few other thoughts: As I was reading, I noticed some mention of rival and non-rival goods. Rival goods could be seen as raw materials or products, and non-rival goods could exist in an infinite amount. In the maker world I see things such as CAD files as non-rival and raw materials and end products as rival. I question whether people would still pay for rival goods, and perhaps donate for non-rival goods if there was an open source economy? What if things such as PaySwarm made it easy to do so? The Rep Rap and Ardrino are open source hardware, and all products by Makerbot used to be (http://josefprusa.cz/open-hardware-meaning/, http://www.hoektronics.com/2012/09/21/makerbot-and-open-source-a-founder-perspective/). People can share their designs, but would people share their profits with those who contributed to their idea? It wouldn't have to be much, as small amounts still add up. Would this be bad? Even if people don't have to pay, things might still vary as does the amount you might get by selling a book? Concieveably if you have a lot of open source hardware, then you could have as much flexibility in the physical world as you do in the software world. In an extreme case, maybe you could have open source spaceships. They are after all lots of little parts, much like a GNU/Linux distro. If things could be freely copied and not exclusively owned as in the GPL, would you still have brand loyalty? While not going into the fine details, the Ultimaker and the Makerbot Thing-O-Matic look very similar. Why would I want one over the other? If whatever you chose was linked to previous innovations, and people let their donations flow to those authors, how much would it matter? Would the crowd maintain accountability so people would not collect money for doing nothing? The maker community seems to be supportive of things that they are free to contribute to. How far could this go, especially with support from arguments made by Don Tapscott and Anthony D. Williams in the book Wikinomics? For the legality, things like the JOBs act seem exciting. However, this seems to be for equity-based crowdfunding and not just donations (https://en.wikipedia.org/wiki/JOBS_Act). I'd imagine that it would be both even if some of the things were as described above. What if you were retired, and you had money, but nothing you contributed was being used? Could you grow your money to support yourself? The potential of digital technologies seems huge. I read about the Industrial Internet, as pointed out by Milton Poson. The GE report titled, "Industrial Internet. Pushing the Boundaries of Minds and Machines", by Peter C. Evans and Marco Annunziata has the following quote: "The combination of physics- based approaches, deep sector specific domain expertise, more automation of information flows, and predictive capabilities can join with the existing suite of ?big data? tools. The result is the Industrial Internet encompasses traditional approaches with newer hybrid approaches that can leverage the power of both historic and real-time data with industry specific advanced analytics." Of course, this makes me want to go down the path of Density Functional Theory and Molecular Dynamics. I had a brief exposure to these concepts in graduate school, and it reminds me of the layout algorithms in Gephi (at least MD, I know a little less about DFT). Yes! I just learned about DBSCAN (http://en.wikipedia.org/wiki/DBSCAN) in R (http://www.cl.cam.ac.uk/~dq209/others/Rdatamining.pdf) and ELKI (http://en.wikipedia.org/wiki/ELKI). And here you have it, the original DBSCAN paper (http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.121.9220). In addition, they speak about Enterprise Management Software in terms of the Industrial Internet: "At the other end of the spectrum, enterprise management software and solutions have been widely adopted to drive organizational efficiencies at the firm level. The benefits of these efforts include better tracking and coordination of labor, supply chain, quality, compliance, and sales and distribution across broad geographies and product lines. However, these efforts have sometimes fallen short because while they can passively track asset operations at the product level, the ability to impact asset performance is limited. Optimizing the system to maximize asset and enterprise performance is what the Industrial Internet offers." This reminds me of a presentation given by Dr. Manoj Dharwadkar of Bentley Systems Inc. titled, "Using Sematic Web Technologies in Open Applications" (http://www.w3.org/2008/12/ogws-slides/Bentley.pdf). It also reminds me of the The Simantics Platform at the VTT Technical Research Centre of Finland (www.vtt.fi) and the mission of Dassault Systems (www.3ds.com). They all have the ISO15926 ontology in common. I wasn't sure if they were talking about linked data in the report: "To make information intelligent, new connections need to be developed so that Big Data ?knows? when and where it needs to go, and how to get there. If imaging data is better connected, the right doctor could automatically receive a patient?s rendered images so the information is finding the doctor instead of the doctor finding the information. " --- Opportunity for Liked Data? random paper with medical devices communicating with the semantic web: http://radiographics.rsna.org/content/30/7/2039.abstract http://www.mdpnp.org/uploads/8-3_Schluter_26Jan.pdf (devices commmunicating, like Industrial Internet) Further, they go into what is needed to to build the Industrial Internet: "The Industrial Internet will require an adequate backbone. Data centers, broadband spectrum, and fiber networks are all components of the ICT infrastructure that will need to be further developed to connect the various machines, systems, and networks across industries and geographies. This will require a combination of inter- and intra- state infrastructure order to support the significant growth in data flows involved with the Industrial Internet. " I heard that Oklahoma, and the U.S. in general, needs more fiber. Someone said that Denver, Dallas, Kansas City, Silicon Valley, Austin(?), all have good networks. How would the talent to build the Industrial Internet be gathered? Here are a few more quotes: "Other alternatives for sourcing cross-discipline talent might include developing the existing resources in the native domain through collaborative approaches. Instead of building or buying talent that has multiple skills, create environments that accelerate the ability of people with different skills to interact and innovate together. On a larger scale, approaches such as crowdsourcing might be able to close some of the capabilities gaps that are sure to occur." "Today, the people that manage big data systems or perform advanced analytics have developed unique talents through self-driven specialization, rather than through any programs that build a standard set of skills or principles. Co-development of curriculum, integration of academic staff into industry, and other approaches will be needed to ensure that the talent needs of the Industrial Internet do not outpace the educational system." There definately is a lag between the development of IT, and its adoption. In Chemical Engineering, I'm pretty sure people thought I was crazy when I started talking about the Semantic Web. People in network security, and even computer science were not familiar with it. If you're talking about Wikinomics (Openness, Peering, Sharing, Acting Global) thinking there might be some growth to do. I heard of people at universities and hackerspaces speak of themselves as universities, but their culture is very different. Maybe hackerspaces are on the extreme of being open, whereas universities are less so? Maybe this is the case with IP. Maybe less so, with papers (but who can access them?). See a presentation in 2008 by RIC Jackson, then Director of the FIATECH Consortium: (http://www.slccc.net/documents_pdf/Technology-Ric%20Jackson%200811.pdf). Adoption of new tech for the enterprise is slow: (http://pandodaily.com/2012/02/11/why-oracle-may-really-be-doomed-this-time/). There are some, such as the Mayor of Newark, NJ, who bothered to go to SXSW to speak about the adoption of more tech: (http://pandodaily.com/2013/03/10/cory-booker-calls-for-tech-empowered-open-democracy/). Here are three more quotes from the report: "Measures to ensure the security of restricted data, including intellectual property,proprietary information, and personally identifiable information (PII) are critical. " --- this reminds me of the Read Write Web Community Group "Currently there are several standards bodies, but they are fragmented. The promotion and adoption of common and consistent standards on data structure, encryption, transfer mechanisms, and the proper use of data will go a long way in advancing cyber security." I was made fun of by a CS graduate when I was excited about a possible new standard. "Academia: Further research on data security and privacy should be pursued, including research on enhancing IT security, metrology, inferencing concerns with non- sensitive data, and legal foundations for privacy in data aggregation." Perhaps more collaboration with the hacker community? Is it true that some programmers, and some in CS tend to build things and ignore security in the process? I wonder what is going on at hacker conferences like Blackhat and DEFCON. BTW, people at the Chaos Communication Congress are geniuses. Whew! That's enough. If you're interested in more, read the report. It's exciting. :) I asked myself this question: What is the future role of the University? The university may serve as a repository for books, a place to do research, and a meeting place. Lectures? I'm not sure. How do the things that Michael Hammer & Lisa W. Hershman talk about fit in? They wrote a book titled, "Faster, Cheaper, Better: The 9 Levers for Transforming How Work Gets Done". I believe they were talking about Business Process Improvement (https://en.wikipedia.org/wiki/Business_process_improvement). Would Process Owners, as mentioned by them, serve a major role in the Industrial Internet? (http://it.toolbox.com/wiki/index.php/Process_Owner) ---------------- Resources that I am considering reading: The Net Delusion: The Dark Side of Internet Freedom, Evengy Morozov, 2011 To Save Everything, Click Here: The Folly of Technological Solutionism, Evengy Morozov, 2013 The Wealth of Networks, Yochai Benkler, Yale University Press, 2006 Science and the Crisis in Society, Frank H. George, Wiley, 1970 Future Perfect: The Case for Progress in a Networked Age, Steven Johnson, 2012 Here Comes Everybody: The Power of Organization Without Organizations, Clay Shirky, 2008 Nasa's Advanced Automation for Space Missions, http://www.islandone.org/MMSG/aasm/ (Robots, Expert Systems, Etc..), The Technical Stuff, 1980 The Future: Six Drivers of Global Change, Al Gore, 2013, http://www.amazon.com/The-Future-Drivers-Global-Change/dp/0812992946, Reviewed by Tim Berners-Lee, may relate well to the previous link An Inquiry to Into the Nature and Causes of the Wealth of Nations, Adam Smith, LL.D. F.R.S., MDCCCXLIII (1843) (according to Wikipedia, it was first published in 1776) Books by Chris Anderson and Lawrence Lessig Consent of the Networked: The Worldwide Struggle for Internet Freedom, Rebecca MacKinnon, 2012 Business Process Improvement: the breakthrough strategy for total quality, productivity, and competitiveness, H J Harrington , 1991 Faster, better, cheaper : low-cost innovation in the U.S. space program, Howard E McCurdy, 2001 On Wed, Jan 9, 2013 at 11:42 PM, Michel Bauwens wrote: it seems to me that these shifts have already started, before 2013, including in these fields, but are also much more long-term transformations ... in the case of deep-pocketed and politically powerful vested interests, only moderate bottom-up advances can be expected in the very short term ... both telecom and banking are still heavily centralized, they enabled people-based p2p dynamics but control the infrastructure, the data, the design and many other aspects of their only partly distributed systems ... I'm sure the same is true of GE .. no corporation will allow a fully p2p distributed system without some form of centralized control Michel On Thu, Jan 10, 2013 at 2:54 AM, ProjectParadigm-ICT-Program wrote: 2013 will see the advent of new paradigms for infrastructures that up until now where centralized, i.e. electric power generation and distribution, intermodal transportation and logistics, food and agro-industrial production and distribution, industrial production and distribution, consumer products manufacturing and distribution, pharmaceuticals production and distribution, energy extraction and distribution (including coal, gas, shale oil/gas and biofuels). The data and telecom infrastructure and parallel the banking and financial sectors are the only ones espousing decentralized distributed P2P (and B2B) processes. Resilience is a property that can only be achieved by copying the structure of the internet and some of its inherent characteristics. By defining strategic infrastructures as decentralized networks of distributed P2P (B2B) processes embedded in an intelligent grid it becomes possible to define resilience in a way similar to the resilience of the Internet. And a resilient grid lends itself perfectly to embedding in a semantic web overlay grid. The Industrial Internet as defined by GE and outlined in a recent white paper comes pretty close to it but not quite yet. See http://www.gereports.com/meeting-of-minds-and-machines/. Milton Ponson GSM: +297 747 8280 PO Box 1154, Oranjestad Aruba, Dutch Caribbean Project Paradigm: A structured approach to bringing the tools for sustainable development to all stakeholders worldwide by creating ICT tools for NGOs worldwide and: providing online access to web sites and repositories of data and information for sustainable development This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. ________________________________ From: Brent Shambaugh To: ProjectParadigm-ICT-Program Cc: Michel Bauwens ; Samuel Rose ; "public-lod at w3.org" ; Paul Cockshott Sent: Wednesday, January 9, 2013 2:10 AM Subject: Re: A Distributed Economy -- A blog involving Linked Data Oh well, I'll share my story on a W3C forum no less. Model, true. Would my experiences even translate? I think you'd have to see this from my own personal perspective. Even though I grew up in an American home there was a lot of discourse in my family. There wasn't a lot of room for personal expression, and my family was very religious so I was afraid of offending God if I went against the dominating figure and/or ideology in the family. I was also very shy growing up, and I did not have much money, even though I came from an upper middle class family. I felt out of place most of the time, and sometimes I had ideas that people did not seem to understand or be interested in. I liked computers, and wanted to learn more about them. I was always asking people doing computer stuff how to program, even though I had a lot of trouble doing it myself. I think it was because I struggled with algebra (and other maths), but more so algebra. I also was a bit lost in some documentation, and may have not been fully aware of other resources that may have helped. I was afraid of tinkering, but I built webpages and was proud of them and I also built structures in the woods (but that is a bit off topic). My family paid for my college. I'm thankful for that, but it also leaves me with a feeling of responsibility to them. I'll admit to not being in sync with things in my undergraduate years. It looks very good if you have an internship. But at the time I made a few mistakes perhaps. I was a bit afraid to try because the companies I qualified for either were not doing something that interested me and/or something that I felt reflected my beliefs, values and possibly something else that is hard to describe. In short, perhaps passion. Over time I realized that it would probably be wiser to accept things as is if I ever hoped to be employed. Making the sale was difficult though. I think perhaps people think I'm lazy, or uniformed, because I did not work (except for academic things) in college. Or was it emotion? Ideas out of place? I was also affected by many of the same family things growing up. I have an interest in physics, electronics, economics, systems, etc. I think that if I ever hope to use my education, and share what I have learned, I need to do something amazing. I could go back to school, take on a lot of debt, and just hope that I get enough good grades to impress enough people (and not have them think I'd get bored when trying to get a job). Or I could learn things on my own, participate in projects, and hope that people receive me with open arms. Since 2007 when I discovered Polywell nuclear fusion I've gained new perspective on the world. I never actually built a fusion reactor, but I did try to learn what was behind them. This motivated me to read lots of books, and my desire to do other things to explore my uniqueness as an individual led to even more books. GNU/Linux facilitated my graduate work, and I can relate to it's philosophy through my many frustrations. Open source is great, because I don't have to worry so much about my skills wasting away. Being at the university also helps. I also don't have to manufacture things or do anything special to have excitement about it. But you know, how much can you actually get from someone who hasn't experienced that much real employment? Because of that automatically people see me in a certain way. And my views may not be necessarily realistic for lack of experience. But whatever it is, it seems I have have found a lot of energy and my friends seem to notice. I think about what I am learning more too. But would this model help people in the real world? I feel that had it existed it could have helped me growing up, but that is my own personal experience. In addition to studying, a lot of my peers spent their time drinking beer, socializing, and playing and/or watching sports. And most seemed to have more money. Now most seem to have even more money, and spend time on Facebook talking about things they have bought or families that they are raising. Their educational level is hard to discern. Not many seem to be posting things about hacking, making or things that might suggest deep insight. But not everyone fits that. I guess what matters is whether it will work or not, and whether it truly will benefit others. For that both an experiment and conversation will help. Thank you Samuel for referring me to Michael. Milton, I am not certain what it will do yet. I am not certain what resilience truly means. I'm definitely bothered by the wastefulness brought upon by obsolescence of products. It would be much better I think if we knew how they worked so we could reuse the them (I'm saying the parts) in other things. We've had this problem at the hackerspace. We have lots of stuff around that if we had the blueprint, it would be much better. If we knew how this blueprint connected to other things I personally think that would be even better. On a separate issue. In graduate school there were people there that seemed really lost. I mean they were doing their work, but didn't seem to have a joy about it. There also was not a lot of organization, and it was hard to find things. Outside of school, there are people that I know could go to graduate school but didn't. It was frustrating to me that I could not seem to sell them on thinking more deeply about things, or when they said I was really smart (but did not have the confidence or belief that they could do it themselves). Still others just weren't there. I've seen those who weren't there at the hackerspace. I question why, and think the world would be a better place if this could be tapped into. " Roberto Verzola is to my mind the political economist who has done most in studying this, see http://p2pfoundation.net/Category:Commons_Economics ; Wolfgang Hoeschele is planning an ambitious database based on a Needs, Organisational REsources, (I forgot what the A stands for) I'm sure that the proposed modelling effort will contribute to this field; if you are ideologically open, you may also want to talk with people like Paul Cockshott and the people of the Center for Transition Science at UNAM in Mexico City; who are very good at econometric modelling and interested in a cybernetic planning revival, " I still have to think more about this. I was reading over it a bit today. I might have seen something about this today. Someone was talking about how technologies were allowing us (or could? ) to become more mobile, and that people really didn't have to be co located. I don't remember what technologies that they were referring. "Peer to peer processes in addition should be defined as geography independent, historically nomads, hunter gatherers and technomads in the modern age all show this to be true." I hope to write soon. On Tue, Jan 8, 2013 at 6:57 PM, Brent Shambaugh wrote: I'm feeling that this is shaped by my own personal experience? I'm willing, but should I risk putting it out there? -- P2P Foundation: http://p2pfoundation.net - http://blog.p2pfoundation.net Updates: http://twitter.com/mbauwens; http://www.facebook.com/mbauwens #82 on the (En)Rich list: http://enrichlist.org/the-complete-list/ -- P2P Foundation: http://p2pfoundation.net - http://blog.p2pfoundation.net Updates: http://twitter.com/mbauwens; http://www.facebook.com/mbauwens #82 on the (En)Rich list: http://enrichlist.org/the-complete-list/ _______________________________________________ P2P Foundation - Mailing list http://www.p2pfoundation.net https://lists.ourproject.org/cgi-bin/mailman/listinfo/p2p-foundation From dreamingforward at gmail.com Mon Mar 18 05:46:26 2013 From: dreamingforward at gmail.com (Mark Janssen) Date: Sun, 17 Mar 2013 21:46:26 -0700 Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: References: Message-ID: On Sun, Mar 17, 2013 at 9:26 PM, Mark Janssen wrote: > Continuing on this thread, there would be a new bunch of behaviors to > be defined. Since "everything is an object", there can now be a > standard way to define the *next* common abstraction of "every object > interacts with other objects". And going with my suggestion of > defining >> and << operators, I'm going to explore the concept > further.... > Each object has to figure out how it will receive things from outside > of it. Things it can't handle (a string sent to an int) just have to > be dropped to some other space, much like stderr does within the O.S. I guess here's the idea I'm getting at. As a programming language paradigm, OOP has to evolve -- it still has too much dependency on number-crunching and the mathematical operators still dominate. But a better abstraction to wrap the OOP paradigm around is *message-passing* rather than *arithmetic*. And having in/out operators on objects is just *way cool*. mark From tjreedy at udel.edu Mon Mar 18 07:26:19 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 18 Mar 2013 02:26:19 -0400 Subject: [Python-ideas] Message passing for objects In-Reply-To: References: Message-ID: On 3/18/2013 12:40 AM, Mark Janssen wrote: Mark, posting 900 off-topic lines of a digest from another list is not polite. Please stick to Python and reasonable length posts. -- Terry Jan Reedy From steve at pearwood.info Mon Mar 18 07:46:32 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 18 Mar 2013 17:46:32 +1100 Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: References: Message-ID: <5146B848.3040509@pearwood.info> On 18/03/13 14:53, Mark Janssen wrote: > Hello, > > I just posted an answers on quora.com about OOP (http://qr.ae/TM1Vb) > and wanted to engage the python community on the subject. > > Alan Kay's idea of message-passing in Smalltalk are interesting, and > like the questioner says, never took off. My answer was that Alan > Kay's abstraction of "Everything is an object" fails because you can't > have message-passing, an I/O task, working in the same space as your > objects -- they are two very different functionalities and they have > to be preserved **for the programmer**. > > This functional separation made me think that Python could benefit > from a syntactical, language-given separation between Classes and the > messages between them, to encourage loosely-coupled, modular OOP. > Something that OOP has always promised but never delivered. I am very interested in this as a concept, although I must admit I'm not entirely sure what you mean by it. I've read your comment on the link above, and subsequent emails in this thread, and I'm afraid I don't understand what you mean here. I feel you are assuming that your readers are already experts on message-passing languages (Smalltalk?). I know what *I* mean by message passing, but that's not necessarily what you mean by it. For example, you state in another email: [quote] By building it into the language, it would *enforce* a modular object style, rather than the current, very specialized and very programmer specific way there is now. In fact, most people never really think in that paradigm, yet if the language supported/proposed such a syntax, programmers would start to re-arrange the whole object hierarchy in a new, more modular and universal way. [end quote] I don't understand this. In what way would message passing enforce a modular object style? In what way does Python not already have a modular object style? -- Steven From rosuav at gmail.com Mon Mar 18 07:58:50 2013 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 18 Mar 2013 17:58:50 +1100 Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: References: Message-ID: On Mon, Mar 18, 2013 at 3:46 PM, Mark Janssen wrote: > I guess here's the idea I'm getting at. As a programming language > paradigm, OOP has to evolve -- it still has too much dependency on > number-crunching and the mathematical operators still dominate. > > But a better abstraction to wrap the OOP paradigm around is > *message-passing* rather than *arithmetic*. And having in/out > operators on objects is just *way cool*. "Way cool", unfortunately, isn't enough for writing code. It has to be useful, too. The Zen of Python reminds us that practicality beats purity; turning everything into message passing may be awesome in purity, but where does it stand on practicality? Are there real-world problems that are awkward to solve in present-day Python that are made massively cleaner/easier with this proposal? I suppose what I'm asking for is a 1-2 sentence blurb to sell the idea. What's the key advantage for daily work? ChrisA From ubershmekel at gmail.com Mon Mar 18 08:23:11 2013 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Mon, 18 Mar 2013 09:23:11 +0200 Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: References: Message-ID: On Mon, Mar 18, 2013 at 8:58 AM, Chris Angelico wrote: > I suppose what I'm asking for is a 1-2 sentence blurb to sell the > idea. What's the key advantage for daily work? > http://en.wikipedia.org/wiki/Message_passing http://en.wikipedia.org/wiki/Erlang_(programming_language) In Erlang, message passing between processes is the main/only thing you do. When every program is built as a set of processes passing messages, you get free parallelization. You can also arrange the message queuing to allow hot-replacing processes, i.e. update your code while it's running. It's a neat concept, but it's not in python's history or culture to "enforce" programming paradigms. And of course the bit shifting operators aren't going anywhere. Yuval Greenfield -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Mon Mar 18 06:34:37 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 18 Mar 2013 18:34:37 +1300 Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: References: Message-ID: <5146A76D.1020205@canterbury.ac.nz> Ian Cordasco wrote: > On Sun, Mar 17, 2013 at 11:53 PM, Mark Janssen > wrote: > >>Hello, >> >>I just posted an answers on quora.com about OOP (http://qr.ae/TM1Vb) >>and wanted to engage the python community on the subject. My answer to that question would be that it *did* catch on, it's just that we changed the terminology. Instead of message passing, we talk about calling methods. IMO the term "message passing" in Smalltalk is misleading, because it suggests there is something asynchronous going on, but there isn't. It's just a subroutine call. -- Greg From masklinn at masklinn.net Mon Mar 18 08:47:51 2013 From: masklinn at masklinn.net (Masklinn) Date: Mon, 18 Mar 2013 08:47:51 +0100 Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: <5146A76D.1020205@canterbury.ac.nz> References: <5146A76D.1020205@canterbury.ac.nz> Message-ID: On 2013-03-18, at 06:34 , Greg Ewing wrote: > Ian Cordasco wrote: >> On Sun, Mar 17, 2013 at 11:53 PM, Mark Janssen >> wrote: >>> Hello, >>> >>> I just posted an answers on quora.com about OOP (http://qr.ae/TM1Vb) >>> and wanted to engage the python community on the subject. > > My answer to that question would be that it *did* > catch on, it's just that we changed the terminology. > Instead of message passing, we talk about calling > methods. Ruby uses a smalltalk-ish terminology: message is what is sent to an other object, method is a hook used by an object to respond to a message. Hence `Object#send`[0] and the lack of externally-accessible attributes (unless explicitly exposed via a message). I believe objective-c is much the same, for similar reasons (and thus the low-level routine for message dispatching is objc_msgSend()[1]) [0] http://ruby-doc.org/core-2.0/Object.html#method-i-send [1] https://developer.apple.com/library/mac/#documentation/Cocoa/Reference/ObjCRuntimeRef/Reference/reference.html#//apple_ref/c/func/objc_msgSend From shane at umbrellacode.com Mon Mar 18 09:51:38 2013 From: shane at umbrellacode.com (Shane Green) Date: Mon, 18 Mar 2013 01:51:38 -0700 Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: References: Message-ID: <15B9E29B-BD7A-4437-9115-EFA863AE3D87@umbrellacode.com> So, by introducing this collaboration mechanism with a syntax that defines it as sending and receiving things that are *not* arbitrary objects, the language would naturally reinforce a more thoroughly decoupled architecture? Sent from my iPad On Mar 17, 2013, at 8:53 PM, Mark Janssen wrote: > Hello, > > I just posted an answers on quora.com about OOP (http://qr.ae/TM1Vb) > and wanted to engage the python community on the subject. > > Alan Kay's idea of message-passing in Smalltalk are interesting, and > like the questioner says, never took off. My answer was that Alan > Kay's abstraction of "Everything is an object" fails because you can't > have message-passing, an I/O task, working in the same space as your > objects -- they are two very different functionalities and they have > to be preserved **for the programmer**. > > This functional separation made me think that Python could benefit > from a syntactical, language-given separation between Classes and the > messages between them, to encourage loosely-coupled, modular OOP. > Something that OOP has always promised but never delivered. > > I think we should co-opt C++'s poorly used >> and << I/O operators > (for files) and re-purpose them for objects/classes. One could then > have within interpreter space, the ability to pass in a message to an > object. > >>>> 42 >> MyObject #sends 42 as a message into MyObject > > The Object definition would then have special methods __in__ to > receive data and a special way of outputing data that can be caught > __str__(?). > > I'm hoping the community can comment on the matter.... > > Thanks, > > Mark > Tacoma, Washington > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From dreamingforward at gmail.com Mon Mar 18 17:55:05 2013 From: dreamingforward at gmail.com (Mark Janssen) Date: Mon, 18 Mar 2013 09:55:05 -0700 Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: <5146B848.3040509@pearwood.info> References: <5146B848.3040509@pearwood.info> Message-ID: On Sun, Mar 17, 2013 at 11:46 PM, Steven D'Aprano wrote: > I am very interested in this as a concept, although I must admit I'm not > entirely sure what you mean by it. I've read your comment on the link above, > and subsequent emails in this thread, and I'm afraid I don't understand what > you mean here. I feel you are assuming that your readers are already experts > on message-passing languages (Smalltalk?). I know what *I* mean by message > passing, but that's not necessarily what you mean by it. I'm sorry, I haven't been very clear. I'm not even an expert on message-passing languages, but I see that it's a profound concept that hasn't been adequately integrated into the OOP model. In any case, I will try to do better. And I apologize to everyone on the list for the prior mail spam. A part of me is a bit giddy with the idea. By message passing, I mean all the ways we communicate to objects in the OOP environment. Usually we "communicate" to them through method-invokation. But this is the wrong way, I argue, to look at the problem. With function or method syntax, you're telling the computer to "execute something", but that is not the right concepts for OOP. You want the objects to interact with each other and in a high-level language, the syntax should assist with that. > By building it into the language, it would *enforce* a modular object > style, rather than the current, very specialized and very programmer > specific way there is now. In fact, most people never really think in > that paradigm, yet if the language supported/proposed such a syntax, > programmers would start to re-arrange the whole object hierarchy in a > new, more modular and universal way. > [end quote] > > I don't understand this. In what way would message passing enforce a modular > object style? In what way does Python not already have a modular object > style? Hopefully my paragraph clarifies that a bit. But the key conceptual shift is that by enforcing a syntax that moves away from invoking methods and move to message passing between objects, you're automatically enforcing a more modular approach. Mark From dreamingforward at gmail.com Mon Mar 18 18:04:36 2013 From: dreamingforward at gmail.com (Mark Janssen) Date: Mon, 18 Mar 2013 10:04:36 -0700 Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: References: <5146B848.3040509@pearwood.info> Message-ID: On Sun, Mar 17, 2013 at 11:46 PM, Steven D'Aprano wrote: > I am very interested in this as a concept, although I must admit I'm not > entirely sure what you mean by it. I've read your comment on the link above, > and subsequent emails in this thread, and I'm afraid I don't understand what > you mean here. I feel you are assuming that your readers are already experts > on message-passing languages (Smalltalk?). I know what *I* mean by message > passing, but that's not necessarily what you mean by it. I'm sorry, I haven't been very clear. I'm not even an expert on message-passing languages, but I see that it's a profound concept that hasn't been adequately integrated into the OOP model. In any case, I will try to do better. And I apologize to everyone on the list for the prior mail spam. A part of me is a bit giddy with the idea. By message passing, I mean all the ways we communicate to objects in the OOP environment. Usually we "communicate" to them through method-invokation. But this is the wrong way, I argue, to look at the problem. With function or method syntax, you're telling the computer to "execute something", but that is not the right concepts for OOP. You want the objects to interact with each other and in a high-level language, the syntax should assist with that. > By building it into the language, it would *enforce* a modular object > style, rather than the current, very specialized and very programmer > specific way there is now. In fact, most people never really think in > that paradigm, yet if the language supported/proposed such a syntax, > programmers would start to re-arrange the whole object hierarchy in a > new, more modular and universal way. > [end quote] > > I don't understand this. In what way would message passing enforce a modular > object style? In what way does Python not already have a modular object > style? Hopefully my paragraph clarifies that a bit. But the key conceptual shift is that by enforcing a syntax that moves away from invoking methods and move to message passing between objects, you're automatically enforcing a more modular approach. Mark From solipsis at pitrou.net Mon Mar 18 18:13:40 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 18 Mar 2013 18:13:40 +0100 Subject: [Python-ideas] Message passing syntax for objects References: <5146B848.3040509@pearwood.info> Message-ID: <20130318181340.683bc161@pitrou.net> Le Mon, 18 Mar 2013 10:04:36 -0700, Mark Janssen a ?crit : > By message passing, I mean all the ways we communicate to objects in > the OOP environment. Usually we "communicate" to them through > method-invokation. But this is the wrong way, I argue, to look at the > problem. I think you have failed to articulate clearly what the problem is. > Hopefully my paragraph clarifies that a bit. But the key conceptual > shift is that by enforcing a syntax that moves away from invoking > methods and move to message passing between objects, you're > automatically enforcing a more modular approach. That sounds like wishful thinking to me. Regards Antoine. From dreamingforward at gmail.com Mon Mar 18 18:18:53 2013 From: dreamingforward at gmail.com (Mark Janssen) Date: Mon, 18 Mar 2013 10:18:53 -0700 Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: References: <5146A76D.1020205@canterbury.ac.nz> Message-ID: > Ian Cordasco wrote: >> >> On Sun, Mar 17, 2013 at 11:53 PM, Mark Janssen >> wrote: >> >>> Hello, >>> >>> I just posted an answers on quora.com about OOP (http://qr.ae/TM1Vb) >>> and wanted to engage the python community on the subject. > > > My answer to that question would be that it *did* > catch on, it's just that we changed the terminology. > Instead of message passing, we talk about calling > methods. Yes, but this is where it breaks the OOP abstraction by 90 degrees. By using function calls, you're telling the machine to do something. But when you want to pass something to an object there should be a natural way to do this for every object. By using methods you pollute the concept space with all sorts of semi-random (i.e. personal) names, like append, add, enqueue, etc. This proposal would not only make a consistent syntax across all objects, but train the programmer to *think* modularly in the sense of having a community of re-usable object. I.e. "What should I do if another object passes me something?". No one thinks this now, because the programmer expects new developers to learn *their* interface! Mark From dreamingforward at gmail.com Mon Mar 18 18:36:50 2013 From: dreamingforward at gmail.com (Mark Janssen) Date: Mon, 18 Mar 2013 10:36:50 -0700 Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: References: Message-ID: >> I guess here's the idea I'm getting at. As a programming language >> paradigm, OOP has to evolve -- it still has too much dependency on >> number-crunching and the mathematical operators still dominate. >> >> But a better abstraction to wrap the OOP paradigm around is >> *message-passing* rather than *arithmetic*. And having in/out >> operators on objects is just *way cool*. > > "Way cool", unfortunately, isn't enough for writing code. It has to be > useful, too. The Zen of Python reminds us that practicality beats > purity; turning everything into message passing may be awesome in > purity, but where does it stand on practicality? Are there real-world > problems that are awkward to solve in present-day Python that are made > massively cleaner/easier with this proposal? Perhaps I'm using the wrong language. Instead of borrowing from Alan Kays, it should be called "data passing". It's a very common and important part of the object-oriented paradigm. The language could support a *consistent* way to do this for all objects. > I suppose what I'm asking for is a 1-2 sentence blurb to sell the > idea. What's the key advantage for daily work? No more having to learn each programers interface for passing interacting with the object. The >> and << syntax now becomes the de facto way for object interaction across all objects. Class design will revolve around this fact. Mark From rob.cliffe at btinternet.com Mon Mar 18 18:44:00 2013 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Mon, 18 Mar 2013 17:44:00 +0000 Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: References: <5146A76D.1020205@canterbury.ac.nz> Message-ID: <51475260.7020707@btinternet.com> On 18/03/2013 17:18, Mark Janssen wrote: >> Ian Cordasco wrote: >>> On Sun, Mar 17, 2013 at 11:53 PM, Mark Janssen >>> wrote: >>> >>>> Hello, >>>> >>>> I just posted an answers on quora.com about OOP (http://qr.ae/TM1Vb) >>>> and wanted to engage the python community on the subject. >> >> My answer to that question would be that it *did* >> catch on, it's just that we changed the terminology. >> Instead of message passing, we talk about calling >> methods. > Yes, but this is where it breaks the OOP abstraction by 90 degrees. > By using function calls, you're telling the machine to do something. > But when you want to pass something to an object there should be a > natural way to do this for every object. By using methods you pollute > the concept space with all sorts of semi-random (i.e. personal) names, > like append, add, enqueue, etc. > > This proposal would not only make a consistent syntax across all > objects, but train the programmer to *think* modularly in the sense of > having a community of re-usable object. I.e. "What should I do if > another object passes me something?". No one thinks this now, because > the programmer expects new developers to learn *their* interface! > > Mark > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > I'm struggling to understand what you mean (and I don't seem to be the only one). As far as I can tell, you would replace an object with N methods by an object with one huge method (ProcessReceivedMessage say) which deals with N cases. (Not an improvement as your code has become much less modular.) And instead of semi-arbitrary method names, you have semi-arbitrary messages (names, numbers or whatever). Just as much of an interface to learn. Am I missing something? Can you give a semi-concrete example using pseudo-code, where your way clearly does something which can't be done as well (or at all) using method calls? Rob Cliffe From dreamingforward at gmail.com Mon Mar 18 18:54:12 2013 From: dreamingforward at gmail.com (Mark Janssen) Date: Mon, 18 Mar 2013 10:54:12 -0700 Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: <51475260.7020707@btinternet.com> References: <5146A76D.1020205@canterbury.ac.nz> <51475260.7020707@btinternet.com> Message-ID: >>> My answer to that question would be that it *did* >>> catch on, it's just that we changed the terminology. >>> Instead of message passing, we talk about calling >>> methods. >> >> Yes, but this is where it breaks the OOP abstraction by 90 degrees. >> By using function calls, you're telling the machine to do something. >> But when you want to pass something to an object there should be a >> natural way to do this for every object. By using methods you pollute >> the concept space with all sorts of semi-random (i.e. personal) names, >> like append, add, enqueue, etc. >> >> This proposal would not only make a consistent syntax across all >> objects, but train the programmer to *think* modularly in the sense of >> having a community of re-usable object. I.e. "What should I do if >> another object passes me something?". No one thinks this now, because >> the programmer expects new developers to learn *their* interface! > I'm struggling to understand what you mean (and I don't seem to be the only > one). > As far as I can tell, you would replace an object with N methods by an > object with one huge method (ProcessReceivedMessage say) which deals with N > cases. > (Not an improvement as your code has become much less modular.) No. If the language provides a data-passing syntax (like << and >>), then programmers would design their object hierarchy with this idea in mind. If you have N methods (N > 1) for passing data to an object, I'm suggesting you haven't designed you object classes properly. > And instead of semi-arbitrary method names, you have semi-arbitrary messages > (names, numbers or whatever). > Just as much of an interface to learn. > Am I missing something? Yes, the data passing syntax (<< and >>) would be the same for all objects because the language itself would be providing the primary interface between objects. > Can you give a semi-concrete example using pseudo-code, where your way > clearly does something which can't be done as well (or at all) using method > calls? I will work on it. What I'd really like is to refactor all the Python standard library code to see how much simpler it would be. Mark From ethan at stoneleaf.us Mon Mar 18 18:47:33 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 18 Mar 2013 10:47:33 -0700 Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: References: Message-ID: <51475335.8030008@stoneleaf.us> On 03/18/2013 10:36 AM, Mark Janssen wrote: > No more having to learn each programers interface for passing > interacting with the object. The >> and << syntax now becomes the de > facto way for object interaction across all objects. Class design > will revolve around this fact. We need a concrete example. Let's do `int`s. Currently, a small subset of the things we can do with ints includes multiplying, dividing, adding, subtracting, and (formatted) displaying -- how would these operations look with a message passing inteface? -- ~Ethan~ From dreamingforward at gmail.com Mon Mar 18 19:22:29 2013 From: dreamingforward at gmail.com (Mark Janssen) Date: Mon, 18 Mar 2013 11:22:29 -0700 Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: <51475335.8030008@stoneleaf.us> References: <51475335.8030008@stoneleaf.us> Message-ID: On Mon, Mar 18, 2013 at 10:47 AM, Ethan Furman wrote: > On 03/18/2013 10:36 AM, Mark Janssen wrote: >> >> No more having to learn each programers interface for passing >> interacting with the object. The >> and << syntax now becomes the de >> facto way for object interaction across all objects. Class design >> will revolve around this fact. > > We need a concrete example. Let's do `int`s. Currently, a small subset of > the things we can do with ints includes multiplying, dividing, adding, > subtracting, and (formatted) displaying -- how would these operations look > with a message passing inteface? Good. You've chosen perhaps the worst example to demonstrate the point, but let's see if I can turn it into an opportunity (haha). I suggest that the mathematical domain is completely separate from what most people do with computers and with programming. The term "computers" is a misnomer, and is misinforming the power that the computer as an *abstraction* provides. Keep in mind that there aren't really 1's and 0's moving around in the machine. In the machine, there are only *states* (which could *any* differentiable "thing", like "happy face" "sad face"). We've just come to symbolize them with numbers because of their history of calculation. Let's examine arithmetic for a moment. Ultimately, arithmetic can be done with simple + and - operators (multiplication simply by repeating addition, etc). If you allow negative numbers you really only need the ONE operator. Now, I think numbers are the only domain where the more standard (current) interface should be preserved, but yet, as I've shown, the message or data-passing interface can accommodate arithmetic if purity is desired, it's just that for real math, it won't be as efficient. But that's okay, because really, what we're generally doing (i mean the computers) these days is manipulating *symbols* or "objects". Most of the time that is what we're doing -- not arithmetic, except in some very simple ways which I will try to elucidate. So, *if* that is the case, then the language should develop a *construct* in which to standardize this fact. So, if our domain is "objects" more than number (which is a pretty concrete thing), then, in what ways do (or should) numbers and data/symbols/objects interact within the programming domain or environment? As you know from C, most interaction with number *as a language construct* was fairly limited: incrementing a loop, for example. In fact C invented the unit increment as a syntactical element. I think this is the more valid way of examining an interface into number as a linguistic construction. But as I told another poster, I will examine the standard library and come up with some good examples of how a standardized data passing interface will benefit the programmer. Mark From guido at python.org Mon Mar 18 19:38:59 2013 From: guido at python.org (Guido van Rossum) Date: Mon, 18 Mar 2013 11:38:59 -0700 Subject: [Python-ideas] Thoughts on the future of concurrency in Python: PEP 3148 In-Reply-To: <514633A8.8020908@btinternet.com> References: <514633A8.8020908@btinternet.com> Message-ID: That is not at all what I said. "With great respect" is apparently a passive-aggressive way to start an unfair criticism. On Sun, Mar 17, 2013 at 2:20 PM, Rob Cliffe wrote: > > On 17/03/2013 21:06, Guido van Rossum wrote: >> >> I actually want to de-emphasize the fact that Futures hold callbacks. I >> want people to think them as magic "wait points" that combine with "yield >> from" to return a pseudo-synchronous result. In fact, I want to altogether >> de-emphasize that async I/O is done under the hood using callbacks. Only >> implementers of async frameworks should need to know that, for the most >> part. > > > Hm, as someone who's had to get to grips with Twisted, I find it frustrating > and disorientating when I don't know how things work. I find that > understanding what goes on under the hood is both helpful and reassuring. > With great respect, Guido, something on the lines of [I exaggerate to make > the point clear, no offence intended] "The average application programmer > needn't trouble his pretty little head about this" feels a bit patronising. > Rob Cliffe > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- --Guido van Rossum (python.org/~guido) From guido at python.org Mon Mar 18 19:50:09 2013 From: guido at python.org (Guido van Rossum) Date: Mon, 18 Mar 2013 11:50:09 -0700 Subject: [Python-ideas] Thoughts on the future of concurrency in Python: PEP 3148 In-Reply-To: References: Message-ID: On Sun, Mar 17, 2013 at 3:14 PM, Steven Hazel wrote: > On Sunday, March 17, 2013 2:06:48 PM UTC-7, Guido van Rossum wrote: >> As Nick pointed out, this water is already under the 3.2 bridge. Too >> late to change now. > > Oops, didn't realize it was already out. In that case, please take this as a > suggestion that the docs might benefit from an explanation like "A Future is > basically just a callback registry. Rather than taking callbacks as > parameters, functions can return a Future, where callbacks can be registered > with Future.add_done_callback." But that's not how I want users to think about Futures at all (certainly not in the async I/O world). You should almost never have to think about the callbacks registered with the Future! (Unless you are implementing an adapter for a callback-based framework -- but that's presumably a fairly uncommon activity.) >> I actually want to de-emphasize the fact that Futures hold callbacks. >> I want people to think them as magic "wait points" that combine with >> "yield from" to return a pseudo-synchronous result. In fact, I want to >> altogether de-emphasize that async I/O is done under the hood using >> callbacks. Only implementers of async frameworks should need to know >> that, for the most part. > > It's true that when you're using a generator-based async framework, you can > and should think of a Future as a "thing you can wait on" most of the time. > My experience with monocle though is that it is helpful rather than harmful > to reveal that they're about callbacks. In the early days of monocle, we > were using Deferreds, which a lot of new monocle users didn't really > understand, and people tended to get very confused about what kinds of > things a Deferred could possibly be doing. Explaining things in terms of > callbacks was helpful in getting people to understand monocle. I'm guessing that there are two quite distinct audiences for the docs: those new to async programming, and those who have used a callback-based framework before. For the latter, it certainly makes sense to clarify how Futures work, and I do not intend to hide it -- I just don't think this is the primary audience for the docs. If you've used callbacks before, you can find out how Futures work in the reference docs. > I don't think you can really abstract the idea of callbacks away from > Futures without making them more mystifying. Callbacks are not an > implementation detail of Futures, they're essential to what Futures do, so > essential that I think CallbackRegistry is pretty good alternative name, and > in monocle we actually called the class Callback. I sometimes explain that a > monocle oroutine is "yielding on a callback" ? meaning it's waiting now and > it'll resume when called back. The name helps explain what's happening, even > for an user of the framework. I respectfully disagree. I am currently writing and/or reviewing reams of code that *uses* Futures to implement complex protocols (currently we're working on an HTTP client and server). This code uses yield-from all over the place, and not once is there a need to use a callback. >> Hm. I find the idea of callbacks far from simple nor familiar. > > Many programmers have used callbacks in other contexts. But, even when they > haven't, "you pass in a function that will get called when the operation is > done" is an idea I've watched many people grasp immediately the first time > they saw it. Easy to grasp, but leads to lots of trouble later on. I'd say that the concept of "GO TO" is equally graspable -- but that doesn't make it a good concept to start with. > In contrast, Futures, which are almost the same idea, are often > viewed as sophisticated black magic, and I've more than once heard them > explained in terms of *time travel*. Sigh. I will make sure that won't happen in any of the docs that *I* write. The concept is more closely related to the financial concept with the same name. I suppose I could call them Promises instead, but Future seem to have caught on already (PEP 3148, and it's also what Java uses). Also note that explicit Futures are a lot less magical than implicit Futures. Finally, I have not had any issues explaining the use of Futures in NDB, where they are being used by (many) thousands of users with only an average grasp of Python concepts. >> It's a threadpool. > > Oh, I see. Well, for what it's worth, including this next to Futures > confused me. I kind of implied to me when I read the PEP that Executors are > somehow necessary to using Futures, when in fact they're just one of many > contexts in which a Future might be a good API. Take it up with the authors of PEP 3148, or submit a doc bug for the stdlib docs explaining them: http://docs.python.org/3/library/concurrent.futures.html?highlight=executor#concurrent.futures.Executor -- --Guido van Rossum (python.org/~guido) From guido at python.org Mon Mar 18 20:01:52 2013 From: guido at python.org (Guido van Rossum) Date: Mon, 18 Mar 2013 12:01:52 -0700 Subject: [Python-ideas] Separate group for discussing PEP 3156 and Tulip Message-ID: I've created a Google Group for PEP 3156 and Tulip. If you're interested in this topic at all, please join the group. The group is here: https://groups.google.com/forum/?fromgroups#!forum/python-tulip Email is python-tulip at googlegroups.com; only members can post (open Google Groups attract too much spam). -- --Guido van Rossum (python.org/~guido) From g.brandl at gmx.net Mon Mar 18 20:06:33 2013 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 18 Mar 2013 20:06:33 +0100 Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: References: Message-ID: Am 18.03.2013 05:26, schrieb Mark Janssen: > Continuing on this thread, there would be a new bunch of behaviors to > be defined. Since "everything is an object", there can now be a > standard way to define the *next* common abstraction of "every object > interacts with other objects". The problem is that for most objects there isn't *the* interaction. Sure, you could split up complicated objects into small pieces with a smaller functionality, but at some point you have to stop. Let's see how this concept fares with simple types such as integers or collections... > And going with my suggestion of > defining >> and << operators, I'm going to explore the concept > further.... > >>>> 42 >> MyNumberType #would add the integer to your integer type That's just random. Why not multiply? Why not exponentiate? >>>> 42 >> MyCollectionType #would add the object into your collection: > *poof*: no more random syntaxiis for putting things in collections.\ So you've replaced one method of a collections API by your magical operator, for all collections. What about the other methods that are just as important, such as deleting items, indexing, and querying? The "syntaxitis" would stay just the same, except if you introduce more operators, which means new syntax again. Also, how would this work for dictionaries or deques? >>>> MyObject >> # queries the object to output its state. What is "its state"? A readable representation? A serialized representation? A memory dump? >>>> "http://www.cnn.com" >> MyInternetObject #outputs the HTML text from CNN's home page. > > Each object has to figure out how it will receive things from outside > of it. Things it can't handle (a string sent to an int) just have to > be dropped to some other space, much like stderr does within the O.S. > > There are probably many other very interesting examples, but the key > idea I'm working on (as noted in other messages), is a sort-of > universal language for the internet, a WebOS to be applied to a > universal data model. It seems that you are reinventing pipes (such as UNIX shell pipes). I agree that as a model for handling data the pipe paradigm is elegant, but only as long as you deal with simple data of a single kind (such as strings in the UNIX world). But make the data complex enough, and it's an instance "all problems look like pipes if all you have is a vertical bar". Georg From jsbueno at python.org.br Mon Mar 18 20:15:59 2013 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Mon, 18 Mar 2013 16:15:59 -0300 Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: References: <51475335.8030008@stoneleaf.us> Message-ID: On 18 March 2013 15:22, Mark Janssen wrote: > But as I told another poster, I will examine the standard library and > come up with some good examples of how a standardized data passing > interface will benefit the programmer. Before you go into that length maybe you could tell us, for example, if we have an image object in a 3rd party library and we want to draw a rectangle. Currently I write something like: >>> myimage.draw_rect((x1,y1,width, height), RED) Now, if I want to save this image to a file: >>> myimage.save("myfile.png") -- So, how would these two simple examples be rewritten in such a magic message-only World? js -><- From Steve.Dower at microsoft.com Mon Mar 18 20:19:30 2013 From: Steve.Dower at microsoft.com (Steve Dower) Date: Mon, 18 Mar 2013 19:19:30 +0000 Subject: [Python-ideas] Separate group for discussing PEP 3156 and Tulip In-Reply-To: References: Message-ID: > From Guido van Rossum > I've created a Google Group for PEP 3156 and Tulip. If you're interested in > this topic at all, please join the group. > > The group is here: > https://groups.google.com/forum/?fromgroups#!forum/python-tulip > > Email is python-tulip at googlegroups.com; only members can post (open > Google Groups attract too much spam). Is there a subscribe handler on that email address or do we need a Google account? > -- > --Guido van Rossum (python.org/~guido) From senthil at uthcode.com Mon Mar 18 20:28:53 2013 From: senthil at uthcode.com (Senthil Kumaran) Date: Mon, 18 Mar 2013 12:28:53 -0700 Subject: [Python-ideas] Separate group for discussing PEP 3156 and Tulip In-Reply-To: References: Message-ID: On Mon, Mar 18, 2013 at 12:19 PM, Steve Dower wrote: > Is there a subscribe handler on that email address or do we need a Google account? Try either of these: http://groups.google.com/group/python-tulip/boxsubscribe?email= Or sending an email to: python-tulip+subscribe at googlegroups.com. -- Senthil From masklinn at masklinn.net Mon Mar 18 20:52:06 2013 From: masklinn at masklinn.net (Masklinn) Date: Mon, 18 Mar 2013 20:52:06 +0100 Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: References: Message-ID: <3BD6B793-EA02-48F3-815E-ED47193F1BD8@masklinn.net> On 2013-03-18, at 20:06 , Georg Brandl wrote: > >>>>> "http://www.cnn.com" >> MyInternetObject #outputs the HTML text from CNN's home page. >> >> Each object has to figure out how it will receive things from outside >> of it. Things it can't handle (a string sent to an int) just have to >> be dropped to some other space, much like stderr does within the O.S. >> >> There are probably many other very interesting examples, but the key >> idea I'm working on (as noted in other messages), is a sort-of >> universal language for the internet, a WebOS to be applied to a >> universal data model. > > It seems that you are reinventing pipes (such as UNIX shell pipes). > I agree that as a model for handling data the pipe paradigm is elegant, > but only as long as you deal with simple data of a single kind (such as > strings in the UNIX world). But make the data complex enough, and it's > an instance "all problems look like pipes if all you have is a vertical > bar". Or you start tagging your messages (since the messages are apparently structured not byte-streams as they are with unix pipes) as is done in Erlang, the end-result mostly being to remove a (useful) bit of syntactic sugar. Unless the messages become asynchronous, then I guess you get actors? From abarnert at yahoo.com Mon Mar 18 21:19:19 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 18 Mar 2013 13:19:19 -0700 (PDT) Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: References: <51475335.8030008@stoneleaf.us> Message-ID: <1363637959.27844.YahooMailNeo@web184701.mail.ne1.yahoo.com> From: Mark Janssen Sent: Monday, March 18, 2013 11:22 AM >> We need a concrete example.? Let's do `int`s.? Currently, a small > subset of >> the things we can do with ints includes multiplying, dividing, adding, >> subtracting, and (formatted) displaying -- how would these operations look >> with a message passing inteface? > > Good.? You've chosen perhaps the worst example to demonstrate the > point, but let's see if I can turn it into an opportunity (haha). OK, let's use your own example, which you keep reiterating: >>> 42 >> MyCollectionType? #would add the object into your collection:? *poof*: no more random syntaxiis for putting things in collections.\ This solves the problem that list.append, set.add, etc. all look completely different. Great. But what if I want to extend instead of appending? If you say "well, another sequence extends instead of appending", then how do I create lists with lists in them? For that matter, how do I?insert at a given position, or replace a slice, or? anything besides the default??The obvious solution is that you pass some kind of wrapper message: ? ? Extender(lst2) >> lst1 ? ? Inserter(3, lst2) >> lst1 But now, you're simulating separate methods by using type-switching. That's about as unpythonic as possible. And if you think about it, how can you write?Inserter(3, lst2) in this everything-is-a-message syntax? It's going to get really ugly without a lot of sugar. Unless you curry the messages: ? ? lst2 >> 3 >> Inserter >> lst1 Here, you're not sending 3 to lst1, but to "Inserter >> lst1", a lst1-inserter object, and then you're sending lst2 to a lst1-inserter-at-position-3 object.?And now you've written Reverse Haskell. So, why doesn't Smalltalk?which, like Python, is not a type-based language?not have this problem? I think you've fundamentally misunderstood Kay's design.?Smalltalk messages have a structure: a name, and a sequence of parameters. (I'm going to use a fake Smalltalk-like syntax below, to better show off the advantages of Smalltalk and avoid getting diverted into side tracks.) For example: ? ? lst1 extend:lst2 ? ? lst1 append:lst2 ? ? lst1 insert:lst2 atPosition:3 Notice that this is almost completely isomorphic to the familiar dot syntax, just with different punctuation?space instead of dot, colon instead of comma, and the method name and parameter names intermixed. But only _almost_. Dot syntax makes it easy to make either the parameter names or the position optional in calls, while Smalltalk-style syntax means you always need both. In other words, you can't Second, the natural set of first-class values is different.? In Smalltalk, "selectors" are values, while in Python, you can only accomplish the same thing through ugly string-based method lookups: ? ? lst1 perform:(insert:atPosition:) withArgument:lst2 and:3 ? ? lst1.getattr('insert')(lst2, 3) In Smalltalk, "messages" (and/or "invocations") are also values, while in Python, you have to simulate them with lambda/partial/etc.: ? ? message = (insert:lst2 atPosition:3) ? ? message = lambda lst: lst.insert(lst2, 3) In Python, bound methods and unbound methods are values, while in Smalltalk, you have to simulate them with something like lambda (which I won't show here): ? ? inserter1 = lst1.insert ? ? inserter1(lst2, 3) ? ? inserter2 = list.insert ? ? inserter2(lst1, lst2, 3) So, there are some minor advantages and disadvantages to each style. But ultimately, there's no real difference between them. From dreamingforward at gmail.com Mon Mar 18 21:24:49 2013 From: dreamingforward at gmail.com (Mark Janssen) Date: Mon, 18 Mar 2013 13:24:49 -0700 Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: References: Message-ID: On Mon, Mar 18, 2013 at 12:06 PM, Georg Brandl wrote: > Am 18.03.2013 05:26, schrieb Mark Janssen: >> Continuing on this thread, there would be a new bunch of behaviors to >> be defined. Since "everything is an object", there can now be a >> standard way to define the *next* common abstraction of "every object >> interacts with other objects". > > The problem is that for most objects there isn't *the* interaction. Sure, > you could split up complicated objects into small pieces with a smaller > functionality, but at some point you have to stop. Yes. But that is the point, if you look at the quora post -- to invert the object model and create mashups of simple modular data types and working *upwards*. > Let's see how this > concept fares with simple types such as integers or collections... > >>>>> 42 >> MyNumberType #would add the integer to your integer type > > That's just random. Why not multiply? Why not exponentiate? Well, as I noted in another post, that while these can be broken down into their simpler component (addition and negative numbers), numbers should probably be treated separately. >>>>> 42 >> MyCollectionType #would add the object into your collection: >> *poof*: no more random syntaxiis for putting things in collections.\ > > So you've replaced one method of a collections API by your magical operator, > for all collections. Yes -- for all collections. That's a pretty big gain right? > What about the other methods that are just as important, > such as deleting items, indexing, and querying? The "syntaxitis" would stay > just the same, except if you introduce more operators, which means new syntax > again. > > Also, how would this work for dictionaries or deques? Well, now you get into the Work: a unified data model. Deques, trees, lists, etc were all preliminary evolutionary explorations on this giant computer science journey of knowledge (and data types) which will have to be, can be, pruned and dropped. >>>>> MyObject >> # queries the object to output its state. > > What is "its state"? A readable representation? A serialized representation? > A memory dump? That's still for us to decide. We're mastering the OOP paradigm here: What is the ideal object and what is "in common" across all objects? We are Zen, we want to master the notion of object. What is the simplest object model possible without sacrificing critical functionality... >>>>> "http://www.cnn.com" >> MyInternetObject #outputs the HTML text from CNN's home page. >> >> Each object has to figure out how it will receive things from outside >> of it. Things it can't handle (a string sent to an int) just have to >> be dropped to some other space, much like stderr does within the O.S. >> >> There are probably many other very interesting examples, but the key >> idea I'm working on (as noted in other messages), is a sort-of >> universal language for the internet, a WebOS to be applied to a >> universal data model. > > It seems that you are reinventing pipes (such as UNIX shell pipes). That is a very interesting comparison. That is something like what I'm trying to do. In tandem with the Internet, I do see a kind of synthesis of Web + O.S. integration -- ultimately, creating a "data ecosystem". mark From dreamingforward at gmail.com Mon Mar 18 21:42:46 2013 From: dreamingforward at gmail.com (Mark Janssen) Date: Mon, 18 Mar 2013 13:42:46 -0700 Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: <1363637959.27844.YahooMailNeo@web184701.mail.ne1.yahoo.com> References: <51475335.8030008@stoneleaf.us> <1363637959.27844.YahooMailNeo@web184701.mail.ne1.yahoo.com> Message-ID: > OK, let's use your own example, which you keep reiterating: > >>>> 42 >> MyCollectionType #would add the object into your collection: *poof*: no more random syntaxiis for putting things in collections.\ > > This solves the problem that list.append, set.add, etc. all look completely different. Great. > > But what if I want to extend instead of appending? If you say "well, another sequence extends instead of appending", then how do I create lists with lists in them? For that matter, how do I insert at a given position, or replace a slice, or? anything besides the default? The obvious solution is that you pass some kind of wrapper message: > > Extender(lst2) >> lst1 > Inserter(3, lst2) >> lst1 > > But now, you're simulating separate methods by using type-switching. That's about as unpythonic as possible. And if you think about it, how can you write Inserter(3, lst2) in this everything-is-a-message syntax? It's going to get really ugly without a lot of sugar. Unless you curry the messages: [...snipped several pages...] All that is very good analysis. However, these data types you talk about, I'm gong to argue are explorations from the journey of computer *science* which are suboptimal. But to say suboptimal I have to suggest the context in which I'm optimizing. That context is creating the idea of a data universe and ecosystem where the ideals of OOP and re-usability come out of dreamland and into reality. And the only way to do that is to start at the *bottom* and work upwards. That is to define the fundamental unit, to re-evalutate the fundamental Object. But to figure out that you need also the fundamental communications pathway -- how those fundamental objects will interact. The questions for this data universe can be boiled down to only a few: what event necessitates the object/node creation? What is the relationship *between* objects? At least that's the start of what I'm calling a unified data model. Mark From abarnert at yahoo.com Mon Mar 18 22:26:07 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 18 Mar 2013 14:26:07 -0700 (PDT) Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: References: <51475335.8030008@stoneleaf.us> <1363637959.27844.YahooMailNeo@web184701.mail.ne1.yahoo.com> Message-ID: <1363641967.7432.YahooMailNeo@web184703.mail.ne1.yahoo.com> From: Mark Janssen Sent: Monday, March 18, 2013 1:42 PM > >> OK, let's use your own example, which you keep reiterating: >> >>>>> 42 >> MyCollectionType? #would add the object into your > collection:? *poof*: no more random syntaxiis for putting things in > collections.\ >> >> This solves the problem that list.append, set.add, etc. all look completely > different. Great. >> >> But what if I want to extend instead of appending?? > All that is very good analysis.? However, these data types you talk > about, I'm gong to argue are explorations from the journey of computer > *science* which are suboptimal. I chose these data types because they were your own example. > But to say suboptimal I have to > suggest the context in which I'm optimizing.? That context is creating > the idea of a data universe and ecosystem where the ideals of OOP and > re-usability come out of dreamland and into reality.? And the only way > to do that is to start at the *bottom* and work upwards.? That is to > define the fundamental unit, to re-evalutate the fundamental Object. > But to figure out that you need also the fundamental communications > pathway -- how those fundamental objects will interact. > > The questions for this data universe can be boiled down to only a few: > what event necessitates the object/node creation?? What is the > relationship *between* objects? OK, so "computer science" data types like lists, and integers (as things you can perform both multiplication and addition on), are not fundamental. What is fundamental? Anyone who knows any mathematical logic can answer this. For example, give me the empty set and a handful of fundamental operations, and I can give you integers (starting with Peano arithmetic) and lists (starting with ordered pairs). Or, give me relations and a handful of fundamental operations. Or? The problem here is that "handful of fundamental operations". You want just _one_ fundamental binary operation.?Well, even that's possible. In fact, you can use the X combinator as both your basic value and your one basic operation. Even if you _do_ reduce everything to the X combinator, that still doesn't explain how "42 >> mycollection" fails to be ambiguous. There are multiple ways you can model integers and collections that are all perfectly sensible, one of which will make "42 >> mylist" append 42 to the end of the list, one of which will make "42 >> mylist" append to the start, and one of which will make "42 >> mylist" insert the next argument to come in at position 42. Which one of these is "right"? And, if you don't like that type (which, again, was your suggestion), pick any other type that anyone might ever want to deal with. And this goes back to the same fundamental misconception that I pointed out in the last email, that you skipped over.?Smalltalk and Simula, and their descendants, and relational databases, and semantic web initiatives, and so on all have some concept of structure to their messages. This is what allows the same object to do more than one thing. That's why they can model?fields and sequences, cars and phones, web services and documents, etc. That's why they're useful. Finally, if you want to design a whole new language from the ground up, whether on top of the X combinator or ER models or RDF+RIF or whatever, with a completely different model from Python and a completely different syntax? why would you post about it on python-ideas? From dreamingforward at gmail.com Mon Mar 18 22:37:30 2013 From: dreamingforward at gmail.com (Mark Janssen) Date: Mon, 18 Mar 2013 14:37:30 -0700 Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: <1363641967.7432.YahooMailNeo@web184703.mail.ne1.yahoo.com> References: <51475335.8030008@stoneleaf.us> <1363637959.27844.YahooMailNeo@web184701.mail.ne1.yahoo.com> <1363641967.7432.YahooMailNeo@web184703.mail.ne1.yahoo.com> Message-ID: >> All that is very good analysis. However, these data types you talk >> about, I'm gong to argue are explorations from the journey of computer >> *science* which are suboptimal. > > I chose these data types because they were your own example. > >> The questions for this data universe can be boiled down to only a few: >> what event necessitates the object/node creation? What is the >> relationship *between* objects? > > OK, so "computer science" data types like lists, and integers (as things you can perform both multiplication and addition on), are not fundamental. What is fundamental? > > Anyone who knows any mathematical logic can answer this. For example, give me the empty set and a handful of fundamental operations, and I can give you integers (starting with Peano arithmetic) and lists (starting with ordered pairs). Or, give me relations and a handful of fundamental operations. Or? Yes, but then you're working in the abstraction space I call the Aperion (after the Greeks), but this is to create a different space, with a different set of basis. You're working with lines. I'm working with data. This is important if you're going make such comparisons. > And this goes back to the same fundamental misconception that I pointed out in the last email, that you skipped over. Smalltalk and Simula, and their descendants, and relational databases, and semantic web initiatives, and so on all have some concept of structure to their messages. This is what allows the same object to do more than one thing. Well, I'm going to suggest that those initiatives failed because of their fundamental premise is flawed. > Finally, if you want to design a whole new language from the ground up, Whoa whoa -- I'm not trying to design a "whole new language". I'm trying to continue the evolution of programming language elegance. And to me Python is the right direction. Mark From miki.tebeka at gmail.com Tue Mar 19 00:02:33 2013 From: miki.tebeka at gmail.com (Miki Tebeka) Date: Mon, 18 Mar 2013 16:02:33 -0700 (PDT) Subject: [Python-ideas] argparse - add support for environment variables In-Reply-To: <71869e7d-605c-4599-8010-0c195e86e982@googlegroups.com> References: <71869e7d-605c-4599-8010-0c195e86e982@googlegroups.com> Message-ID: OK, so what's next in process? Got some +1 and some -1, how do we proceed? (or not). On Tuesday, February 19, 2013 8:03:16 AM UTC-8, Miki Tebeka wrote: > > Greetings, > > The usual way of resolving configuration is command line -> environment -> > default. > Currently argparse supports only command line -> default, I'd like to > suggest an optional "env" keyword to add_argument that will also resolve > from environment. (And also optional env dictionary to the ArgumentParser > __init__ method [or to parse_args], which will default to os.environ). > > Example: > [spam.py] > > parser = ArgumentParser() > > parser.add_argument('--spam', env='SPAM', default=7) > args = parser.parse_args() > print(args.spam) > > ./spam.py -> 7 > ./spam.py --spam=12 -> 12 > SPAM=9 ./spam.py -> 9 > SPAM=9 ./spam.py --spam=12 -> 12 > > What do you think? > -- > Miki > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dreamingforward at gmail.com Tue Mar 19 00:41:41 2013 From: dreamingforward at gmail.com (Mark Janssen) Date: Mon, 18 Mar 2013 16:41:41 -0700 Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: <1363643500.25746.YahooMailNeo@web184701.mail.ne1.yahoo.com> References: <5146B848.3040509@pearwood.info> <1363643500.25746.YahooMailNeo@web184701.mail.ne1.yahoo.com> Message-ID: On Mon, Mar 18, 2013 at 2:51 PM, Andrew Barnert wrote: > Have you even looked at a message-passing language? > > A Smalltalk "message" is a selector and a sequence of arguments. That's what you send around. Newer dynamic-typed message-passing OO and actor languages are basically the same as Smalltalk. Yes, but you have to understand that Alan Kays came with strange ideas of some future computer-human symbiosis. So his language design and other similar attempts (like php) is rather skewed from that premise And also, despite name-dropping, I'm not trying to create anything like that idea of message-passing. I'm talking about something very simple, a basic and universal way for objects to communicate. >> With function or method syntax, you're telling the computer to >> "execute something", but that is not the right concepts for OOP. You >> want the objects to interact with each other and in a high-level >> language, the syntax should assist with that. > > And you have to tell the object _how_ to interact with each other. This is a different paradigm that what I'm talking about. In the OOP of my world, Objects already embody the intelligence of how they are going to interact with the outside world, because I put them there. > Even with reasonably intelligent animals, you don't just tell two animals to interact, except in the rare case where you don't care whether they become friends or dinner. You're model of computer programming is very alien to me. So I don't think it will be productive to try to convince you of what I'm suggesting, but feel free to continue... Mark From steve at pearwood.info Tue Mar 19 01:39:05 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 19 Mar 2013 11:39:05 +1100 Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: References: Message-ID: <5147B3A9.2070201@pearwood.info> On 19/03/13 07:24, Mark Janssen wrote: >>>>>> 42 >> MyCollectionType #would add the object into your collection: >>> >> *poof*: no more random syntaxiis for putting things in collections.\ >> > >> >So you've replaced one method of a collections API by your magical operator, >> >for all collections. > Yes -- for all collections. That's a pretty big gain right? No, not at all. If it's a gain at all, it's a tiny, microscopic gain. But it's not a gain. We lose a lot: - we can no longer distinguish between *adding* something to an unordered collection, and *appending* to an ordered collection; - we can no longer distinguish between (for example) *appending* to the end of a list, *extending* a list with a sequence, and *inserting* somewhere inside a list. - we cannot even distinguish between "put this thing in your collection" and "search your collection for this thing", since we're limited to a single >> "send message" operator. But please don't let me discourage you. I am actually very interested in a message passing idiom, I just don't think that it is as big a fundamental paradigm shift as you appear to believe. Way back in the late 1980s, I spent a lot of time working with a language that used a message passing paradigm, Apple's Hypertalk (part of Hypercard, which was strongly influenced by Alan Kay and Smalltalk). Hypertalk is long gone now, but you can get a feel for it with something like OpenXION: http://www.openxion.org I encourage you to see how message passing paradigms operate in existing other languages, such as OpenXION, and to come up with examples of syntax for simple tasks. E.g. what syntax would you use for something like this? * Get a file name from the user. * Open a file with that name. * Read the text of that file. * Convert the text to lowercase. * Write it back out to the same file. At the moment, I couldn't even begin to imagine how I might write that code using your syntax, let alone how it would be an improvement. -- Steven From dreamingforward at gmail.com Tue Mar 19 02:38:49 2013 From: dreamingforward at gmail.com (Mark Janssen) Date: Mon, 18 Mar 2013 18:38:49 -0700 Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: <5147B3A9.2070201@pearwood.info> References: <5147B3A9.2070201@pearwood.info> Message-ID: On Mon, Mar 18, 2013 at 5:39 PM, Steven D'Aprano wrote: >>> >So you've replaced one method of a collections API by your magical >>> > operator, >>> >for all collections. >> >> Yes -- for all collections. That's a pretty big gain right? > > No, not at all. If it's a gain at all, it's a tiny, microscopic gain. But > it's not a gain. We lose a lot: > > - we can no longer distinguish between *adding* something to an unordered > collection, and *appending* to an ordered collection; Your ClassName or __doc__ is supposed to make that clear, because your API doesn't. This is the problem I'm referring to when I talk about "hyper-personal API's" -- you have to learn the programmer's personal language. Even 15 years of python append and extend are still ambiguous and confusing. You've adapted to this. > - we can no longer distinguish between (for example) *appending* to the end > of a list, *extending* a list with a sequence, and *inserting* somewhere > inside a list. Well, these are old data paradigm operations which will go away in my view. The very thinking in terms of "lists within lists" is very personal and no one else will be able to use whatever you're building. > - we cannot even distinguish between "put this thing in your collection" and > "search your collection for this thing", since we're limited to a single >> > "send message" operator. Ummm, perhaps you missed something in Python: "search your collection for this thing" is done with "in"; i.e. , "item in myObject". That already is TOWTDI way to do it. Obviously, the first item is already handled with the messaging operator. > E.g. what syntax would you use for something like this? [...] Keeping in mind, this idea would require a major refactoring of the standard library. Instead of these very ornate (byzantine?) complex classes, we have many smaller, more universal/general classes, building up to the complexity desired (like Unix pipes did for shell scripting, files, and the O.S.). Those "universals" haven't been conceived yet for Python, so I don't know yet what it will look like. > * Open a file with that name. Why does everyone seem to pick the most corner-type cases? As this is about a "universal data model", issues with interacting within the existing operating system becomes an issue and has to be specially handled. So you have to ask how deep you want me to go in this architectural model I'm envisioning, because as it is a universal model, ultimately the O.S. changes radically (which is why I was suggesting that the issue of Python async behavior be postponed). But now that you have me thinking on it, I see the file system as being composed of namespaces organized in a tree. The Python interpreter would access them directly. Mark From breamoreboy at yahoo.co.uk Tue Mar 19 03:09:28 2013 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Tue, 19 Mar 2013 02:09:28 +0000 Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: References: <5147B3A9.2070201@pearwood.info> Message-ID: On 19/03/2013 01:38, Mark Janssen wrote: +1 best trolling so far this millenium. -- Cheers. Mark Lawrence From haoyi.sg at gmail.com Tue Mar 19 03:14:32 2013 From: haoyi.sg at gmail.com (Haoyi Li) Date: Mon, 18 Mar 2013 22:14:32 -0400 Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: References: <5147B3A9.2070201@pearwood.info> Message-ID: I felt I just had to chip in here, knowing something about how actors (which is basically what you're advocating) work in Scala. Some points: - Message sends have to give you something that method calls don't Whether that's the ability to ignore not-understood messages (like in Obj-C), or making the message send-receive behavior asynchronous (like in Scala) or ensuring messages are handled serially per-object, it has to be something. Message sends which don't provide anything over method calls are really just method calls with a funky syntax. As such, in Scala actors are used when you need these benefits: stateful, mutable objects which are accessed from multiple threads but have non-thread-safe innards, or objects whose allowed behavior changes over time and the "ignore not-understood messages" thing comes in handy. Worker/connection pools, state-machines, that sort of thing. Nobody sends messages to integers or lists! - Actors (or message-based objects) work in tandem with case classes (data-based objects/structs) and don't replace them. I think this is a significant point: small things (lists, tuples, primitives) are kept as structs and the data inside them is manipulated directly, and big things (web servers, background-workers, http clients) are then done with state-hiding and encapsulation and all that. Having a simple thing (like a list or a tuple) with encapsulation and sending messages to it is as silly as having a massive structure with all its internal structures and data exposed to the outside world. In particular, your dislike for "lists within lists" seems incompatible with your desire for "more universal/general classes, building up to the complexity desired". Isn't that almost the perfect example of simple, general classes used to build up complex structures? That's all for now -Haoyi On Mon, Mar 18, 2013 at 9:38 PM, Mark Janssen wrote: > On Mon, Mar 18, 2013 at 5:39 PM, Steven D'Aprano > wrote: > >>> >So you've replaced one method of a collections API by your magical > >>> > operator, > >>> >for all collections. > >> > >> Yes -- for all collections. That's a pretty big gain right? > > > > No, not at all. If it's a gain at all, it's a tiny, microscopic gain. But > > it's not a gain. We lose a lot: > > > > - we can no longer distinguish between *adding* something to an unordered > > collection, and *appending* to an ordered collection; > > Your ClassName or __doc__ is supposed to make that clear, because your > API doesn't. This is the problem I'm referring to when I talk about > "hyper-personal API's" -- you have to learn the programmer's personal > language. Even 15 years of python append and extend are still > ambiguous and confusing. You've adapted to this. > > > - we can no longer distinguish between (for example) *appending* to the > end > > of a list, *extending* a list with a sequence, and *inserting* somewhere > > inside a list. > > Well, these are old data paradigm operations which will go away in my > view. The very thinking in terms of "lists within lists" is very > personal and no one else will be able to use whatever you're building. > > > - we cannot even distinguish between "put this thing in your collection" > and > > "search your collection for this thing", since we're limited to a single > >> > > "send message" operator. > > Ummm, perhaps you missed something in Python: "search your collection > for this thing" is done with "in"; i.e. , "item in myObject". That > already is TOWTDI way to do it. Obviously, the first item is already > handled with the messaging operator. > > > E.g. what syntax would you use for something like this? [...] > > Keeping in mind, this idea would require a major refactoring of the > standard library. Instead of these very ornate (byzantine?) complex > classes, we have many smaller, more universal/general classes, > building up to the complexity desired (like Unix pipes did for shell > scripting, files, and the O.S.). Those "universals" haven't been > conceived yet for Python, so I don't know yet what it will look like. > > > * Open a file with that name. > > Why does everyone seem to pick the most corner-type cases? As this is > about a "universal data model", issues with interacting within the > existing operating system becomes an issue and has to be specially > handled. So you have to ask how deep you want me to go in this > architectural model I'm envisioning, because as it is a universal > model, ultimately the O.S. changes radically (which is why I was > suggesting that the issue of Python async behavior be postponed). > > But now that you have me thinking on it, I see the file system as > being composed of namespaces organized in a tree. The Python > interpreter would access them directly. > > Mark > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wuwei23 at gmail.com Tue Mar 19 03:25:39 2013 From: wuwei23 at gmail.com (alex23) Date: Mon, 18 Mar 2013 19:25:39 -0700 (PDT) Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: References: Message-ID: <490f0227-53f4-407c-9698-da1767679536@qo9g2000pbb.googlegroups.com> On Mar 19, 6:24?am, Mark Janssen wrote: > Well, now you get into the Work: ?a unified data model. ? Deques, > trees, lists, etc were all preliminary evolutionary explorations on > this giant computer science journey of knowledge (and data types) > which will have to be, can be, pruned and dropped. These data types weren't created because of any ad hoc "exploration" of best practice. They all have different performance & memory characteristics. What type you use for a particular problem is directly related to the requirements. How do you plan on having a unified data model that provides optimal performance / memory usage for every problem space? Something more concrete than "evolving the language" wankery - like actual, non-trivial examples - would really help with your argument. From abarnert at yahoo.com Tue Mar 19 04:12:41 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 18 Mar 2013 20:12:41 -0700 (PDT) Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: References: <5147B3A9.2070201@pearwood.info> Message-ID: <1363662761.23049.YahooMailNeo@web184703.mail.ne1.yahoo.com> > From: Mark Janssen > Sent: Monday, March 18, 2013 6:38 PM > On Mon, Mar 18, 2013 at 5:39 PM, Steven D'Aprano > wrote: >> - we can no longer distinguish between *adding* something to an unordered >> collection, and *appending* to an ordered collection; > > Your ClassName or __doc__ is supposed to make that clear, because your > API doesn't.? This is the problem I'm referring to when I talk about > "hyper-personal API's" -- you have to learn the programmer's > personal?language. What's hyper-personal here? Mutable sequences support effectively the same set of operations in every language. Each language may have different names for these operations, but within a language they're consistent. No matter whose Python code you're looking at, a list has the same append and extend methods?and, for that matter, so does any other class that satisfies the MutableSequence ABC, even any other type that "duck types" as a mutable sequence. >> - we can no longer distinguish between (for example) *appending* to the end >> of a list, *extending* a list with a sequence, and *inserting* somewhere >> inside a list. > > Well, these are old data paradigm operations which will go away in my > view.? The very thinking in terms of "lists within lists" is very > personal and no one else will be able to use whatever you're building. The idea of non-flat collections is certainly not "personal".?Resources contain, or link to, resources, hierarchically or otherwise. Directories contain directories. Research papers reference other resource papers. Web pages link to other web pages. Documents have subdocuments. And so on. Your new paradigm has to account for that, or it's useless. And meanwhile, the idea that "no one else will be able to use whatever you're building" if you have lists within lists is disproven millions of times per day. I just wrote and ran a script that gets a list of all movies on all channels owned by a certain YouTube user. I had absolutely no trouble using their lists within lists (actually, it's three levels deeper than that), despite the fact that I know nothing about their code, and they've never even heard of me. We just both know JSON, and that's all it takes. >> - we cannot even distinguish between "put this thing in your? > collection" and >> "search your collection for this thing", since we're limited > to a single >> >> "send message" operator. > > Ummm, perhaps you missed something in Python: "search your collection > for this thing" is done with "in"; i.e. , "item in > myObject".? That > already is TOWTDI way to do it.? Obviously, the first item is already > handled with the messaging operator. You've clearly missed something in Python. "item in collection" is, basically, syntactic sugar for "collection.__contains__(item)". And "collection[3]" is "collection.__getitem__(3)". And so on. It's all method calling. More importantly, you've missed something fundamental about message passing. The whole point of the paradigm is that all operations are messages. That means you can do things like make all messages sequenced and asynchronous (as in Erlang) and get guaranteed-safe concurrency. Or make all message handling dynamic modifiable at runtime (as in Smalltalk) and have hot-swappable code. As soon as you allow end-runs around messaging to access objects directly, none of that works. And, needless to say, this clearly requires the list type to handle multiple different kinds of methods/messages. You can dismiss the need for extend, insert, etc., but without?__contains__, __getitem__, __iter__, etc., a list is completely useless?it's a write-only structure. At this point, it's starting to feel like that NewsRadio episode where Bill interviews a "business visionary" who has something important to say about the future of computers, then admits that he's never actually used a computer, but thinks they sound neat. >> * Open a file with that name. > > Why does everyone seem to pick the most corner-type cases? Maybe if you gave a single good example, people wouldn't keep coming up with examples you don't like.?But really, almost everyone has picked the two examples you yourself gave: appending to a list, and adding a number to a number. You've complained that those are terrible examples, and haven't given any others, so what can people do but guess what you might possibly have in mind? Meanwhile, I gave you a dozen different examples from a wide range of application areas, and you ignored them completely and went on about list appending. > But now that you have me thinking on it, I see the file system as > being composed of namespaces organized in a tree.? The Python > interpreter would access them directly. Well, because a filesystem contains links and mountpoints and other things, it isn't quite just a tree, but let's accept that. So, I've got an?object representing a directory (a subtree), say, "home", representing "/Users/abarnert".?Presumably, "foo >> home" gives me the path "/Users/abarnert/foo". I can send it data to write to that file, or send data from it to read from that file. So far, so good.?But how do I select a subdirectory under home? Or get a list of all files and subdirectories under home? Or move a file, or delete it, or change permissions on it, or get its size? Ultimately, there are many things i might want to do with a directory or a file, and therefore it needs multiple methods. Just like a list. Or a number. Or a document, an account, a client connection, a window, a modeled species, etc. Being able to do multiple things is fundamental to the usefulness of objects. Something that can only do one thing isn't an object, it's a function. In fact, it's a function with a single parameter. From abarnert at yahoo.com Tue Mar 19 04:31:31 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 18 Mar 2013 20:31:31 -0700 (PDT) Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: References: <5146B848.3040509@pearwood.info> <1363643500.25746.YahooMailNeo@web184701.mail.ne1.yahoo.com> Message-ID: <1363663891.8351.YahooMailNeo@web184701.mail.ne1.yahoo.com> From: Mark Janssen Sent: Monday, March 18, 2013 4:41 PM > On Mon, Mar 18, 2013 at 2:51 PM, Andrew Barnert > wrote: >> Have you even looked at a message-passing language? >> >> A Smalltalk "message" is a selector and a sequence of arguments. > That's what you send around. Newer dynamic-typed message-passing OO and > actor languages are basically the same as Smalltalk. > > Yes, but you have to understand that Alan Kays came with strange ideas > of some future computer-human symbiosis.? So his language design and > other similar attempts (like php) is rather skewed from that premise The idea that message passing is fundamentally different from method calling also turned out to be one of those strange ideas, since it only took a couple years to prove that they are theoretically completely isomorphic?and, for that matter, they're both isomorphic to closures. > And also, despite name-dropping, I'm not trying to create anything > like that idea of message-passing.? I'm talking about something very > simple, a basic and universal way for objects to communicate. Message passing is a simple, basic, and universal way for objects to communicate. Everything from dot-syntax method calls to JSON RPC protocols can be modeled as passing messages.?But what you're talking about isn't message passing. The idea that messages have names, and reference objects as arguments, is fundamental, and by leaving that out, you're talking about something different.? In effect, your "objects" are just single-parameter functions, and your "messages" are the call operator. >>> With function or method syntax, you're telling the computer to >>> "execute something", but that is not the right concepts for > OOP.? You >>> want the objects to interact with each other and in a high-level >>> language, the syntax should assist with that. >>? >> And you have to tell the object _how_ to interact with each other. > > This is a different paradigm that what I'm talking about.? In the OOP > of my world, Objects already embody the intelligence of how they are > going to interact with the outside world, because I put them there. The paradigm you're talking about is useless. You have lists that know how to append, but don't know how to get/search/iterate. Almost every useful object needs the intelligence to interact with the world in two or more ways. >> Even with reasonably intelligent animals, you don't just tell two > animals to interact, except in the rare case where you don't care whether > they become friends or dinner. > > You're model of computer programming is very alien to me.? So I don't > think it will be productive to try to convince you of what I'm > suggesting, but feel free to continue... My model of (object-oriented) computer programming is that programming objects model objects which?have a variety of behaviors, each of which is triggered by sending a different message.?This is pretty much the central definition that everyone who programs or theorizes about programming uses. If you read any textbook, wiki page, journal article, or tutorial, they're all talking about that, or something directly isomorphic to it. If that's alien to you, then object-oriented programming is alien to you. From tjreedy at udel.edu Tue Mar 19 05:03:09 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 19 Mar 2013 00:03:09 -0400 Subject: [Python-ideas] argparse - add support for environment variables In-Reply-To: References: <71869e7d-605c-4599-8010-0c195e86e982@googlegroups.com> Message-ID: On 3/18/2013 7:02 PM, Miki Tebeka wrote: > OK, so what's next in process? Got some +1 and some -1, how do we > proceed? (or not). Seems not. > On Tuesday, February 19, 2013 8:03:16 AM UTC-8, Miki Tebeka wrote: > > Greetings, > > The usual way of resolving configuration is command line -> > environment -> default. > Currently argparse supports only command line -> default, I'd like > to suggest an optional "env" keyword to add_argument that will also > resolve from environment. (And also optional env dictionary to the > ArgumentParser __init__ method [or to parse_args], which will > default to os.environ). > > Example: > [spam.py] from os import environ as env > parser = ArgumentParser() > > parser.add_argument('--spam', env='SPAM', default=7) parser.add_argument('--spam', default = env.get('SPAM', 7)) This is about the same number of chars to type and I believe it is available now in all versions. I like it better because it puts the e.v. name and default right together. -- Terry Jan Reedy From rosuav at gmail.com Tue Mar 19 07:28:45 2013 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 19 Mar 2013 17:28:45 +1100 Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: References: <5147B3A9.2070201@pearwood.info> Message-ID: On Tue, Mar 19, 2013 at 12:38 PM, Mark Janssen wrote: > On Mon, Mar 18, 2013 at 5:39 PM, Steven D'Aprano wrote: >>>> >So you've replaced one method of a collections API by your magical >>>> > operator, >>>> >for all collections. >>> >>> Yes -- for all collections. That's a pretty big gain right? >> >> No, not at all. If it's a gain at all, it's a tiny, microscopic gain. But >> it's not a gain. We lose a lot: >> >> - we can no longer distinguish between *adding* something to an unordered >> collection, and *appending* to an ordered collection; > > Your ClassName or __doc__ is supposed to make that clear, because your > API doesn't. This is the problem I'm referring to when I talk about > "hyper-personal API's" -- you have to learn the programmer's personal > language. Even 15 years of python append and extend are still > ambiguous and confusing. You've adapted to this. There are interfaces where a generic "do something with X and Y" concept makes sense, but I don't think program code is one of them. In a GUI, you can drag one icon (#1) onto another icon (#2), which might accomplish any of the following: * Move file/folder #1 into folder #2 * Copy file/directory #1 onto remote volume #2 * Print document #1 on printer #2 * Invoke program #2, passing it the name of #1 * Destroy object #1 using the settings of shredder #2 * Send email #1 to recipient #2 These are all actions that I *have done*, and all using the exact same invocation sequence. It makes sense in context. With code, though, there are just way too many common operations. Since we're working at a lower level of objects and data structures, we're going to need to do a lot more. (Of course, drag-and-drop has a number of other UI actions possible, such as dragging a file into the body of a folder - that provides destination AND position, so you could use that to put something into a particular place in an ordered collection. Plus you can hold modifier keys while dragging, or in other ways add information. But it's still far FAR less than code can do.) How do you identify, to the code, which message you want to pass? The easiest way is with a keyword... a method name. Which brings us right back to where heaps of high level languages are: method calling. ChrisA From ncoghlan at gmail.com Tue Mar 19 15:49:36 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 19 Mar 2013 07:49:36 -0700 Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: References: <5147B3A9.2070201@pearwood.info> Message-ID: On Mon, Mar 18, 2013 at 11:28 PM, Chris Angelico wrote: > There are interfaces where a generic "do something with X and Y" > concept makes sense, but I don't think program code is one of them. Having finally parsed out what I think the OP is asking for, I have to disagree. In fact, Guido disagrees as well: he thinks what the OP wants is so important that he built it into Python from day one. The notation Python uses to "send a message" to an object is actually "obj(message)". This process of sending a message is referred to as "calling". When you call someone, the message you send is referred to as "the arguments to the call". An object accepts messages by defining a method with the special name "__call__". When you define this method, you declare the "parameters" you expect to receive as part of any calls. The process of mapping the sent arguments to the declared parameters is referred to as "argument binding". You can perform the argument binding step without actually making a call by using the inspect.Signature.bind API in Python 3.3+ (or the backport of that API to earlier Python versions: https://pypi.python.org/pypi/funcsigs/) However, it's also useful to aggregate objects that you can send messages to into larger collections, along with shared data for those objects to work with. Python provides a "class" mechanism for this aggregation step, and in this case the objects which receive messages and operate on the shared data are referred to as "methods". To prevent an infinite regress, Python has a particular kind of object which it inherently knows how to send a message to, rather than relying on a "__call__" method. This intrinsic message receiver is referred to as a "function". Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ethan at stoneleaf.us Tue Mar 19 16:33:34 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 19 Mar 2013 08:33:34 -0700 Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: References: <5147B3A9.2070201@pearwood.info> Message-ID: <5148854E.90704@stoneleaf.us> On 03/19/2013 07:49 AM, Nick Coghlan wrote: > On Mon, Mar 18, 2013 at 11:28 PM, Chris Angelico wrote: >> There are interfaces where a generic "do something with X and Y" >> concept makes sense, but I don't think program code is one of them. > > Having finally parsed out what I think the OP is asking for, I have to > disagree. In fact, Guido disagrees as well: he thinks what the OP > wants is so important that he built it into Python from day one. > > The notation Python uses to "send a message" to an object is actually > "obj(message)". I don't disagree with you, Nick, but I don't think that's what the OP is looking for, either. Even using call syntax, it seems to me the OP would still only be sending one type of message, with no arguments, no differentiation, no choices. To use the OP's own example: some_collection(42) would add 42 to the collection... but we have no say in where, or how. In fact, using call() notation we have less than the OP's proposal as his proposal has a one-way in and a one-way out, but an argument-less* call can only provide one of those two options. -- ~Ethan~ *By argument-less I mean we can only give one thing to call. From graffatcolmingov at gmail.com Tue Mar 19 16:58:22 2013 From: graffatcolmingov at gmail.com (Ian Cordasco) Date: Tue, 19 Mar 2013 11:58:22 -0400 Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: <5148854E.90704@stoneleaf.us> References: <5147B3A9.2070201@pearwood.info> <5148854E.90704@stoneleaf.us> Message-ID: On Tue, Mar 19, 2013 at 11:33 AM, Ethan Furman wrote: > I don't disagree with you, Nick, but I don't think that's what the OP is > looking for, either. Even using call syntax, it seems to me the OP would > still > only be sending one type of message, with no arguments, no differentiation, > no choices. I understand OP wants to be able to send anything, but he seems to be disregarding how << and >> are currently implemented on objects (via __lshift__ and __rshift__ respectively). Each of those can be made in a custom fashion by OP (or anyone else) but the question then becomes, how do you add extra parameters (assuming you want them), i.e., how do you do: obj << 4, *extra_args I think (because I don't have a way of testing it right now) that you'd get an error for trying to expand extra_args in the creation of a tuple, regardless of the signature of __lshift__. Even if you didn't do that, I think python would interpret that as the creation of a tuple, i.e., the result of obj << 4 with the rest of the arguments. > To use the OP's own example: > > some_collection(42) > > would add 42 to the collection... but we have no say in where, or how. In > fact, > using call() notation we have less than the OP's proposal as his proposal > has a > one-way in and a one-way out, but an argument-less* call can only provide > one of > those two options. > > -- > ~Ethan~ > > *By argument-less I mean we can only give one thing to call. And if we do this with the current implementation of python you can still only give one thing to call (assuming I'm correct with my above example). We would have to fundamentally change python and how it interprets those special cases (again basing the statement on my above assumptions/intuition). I'm sure in another language I would find this feature to be interesting or even appreciate it, but I don't think it is either very pythonic or useful in python. I'm also wondering if this is just a very well done troll since OP has yet to back any of his own examples which he thinks others are choosing to make his life difficult. From jeff at jeffreyjenkins.ca Tue Mar 19 20:41:34 2013 From: jeff at jeffreyjenkins.ca (Jeff Jenkins) Date: Tue, 19 Mar 2013 15:41:34 -0400 Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: References: <5147B3A9.2070201@pearwood.info> <5148854E.90704@stoneleaf.us> Message-ID: Does __getattribute__ and __call__ basically accomplish this? You pass a "message name" to an object which it then uses to route to a handler which takes a tuple "message body" to process the message. It just happens that the syntax is obj.message_name(message, body, parts) instead of using << or >> On Tue, Mar 19, 2013 at 11:58 AM, Ian Cordasco wrote: > On Tue, Mar 19, 2013 at 11:33 AM, Ethan Furman wrote: > > I don't disagree with you, Nick, but I don't think that's what the OP is > > looking for, either. Even using call syntax, it seems to me the OP would > > still > > only be sending one type of message, with no arguments, no > differentiation, > > no choices. > > I understand OP wants to be able to send anything, but he seems to be > disregarding how << and >> are currently implemented on objects (via > __lshift__ and __rshift__ respectively). Each of those can be made in > a custom fashion by OP (or anyone else) but the question then becomes, > how do you add extra parameters (assuming you want them), i.e., how do > you do: > > obj << 4, *extra_args > > I think (because I don't have a way of testing it right now) that > you'd get an error for trying to expand extra_args in the creation of > a tuple, regardless of the signature of __lshift__. Even if you didn't > do that, I think python would interpret that as the creation of a > tuple, i.e., the result of obj << 4 with the rest of the arguments. > > > To use the OP's own example: > > > > some_collection(42) > > > > would add 42 to the collection... but we have no say in where, or how. > In > > fact, > > using call() notation we have less than the OP's proposal as his proposal > > has a > > one-way in and a one-way out, but an argument-less* call can only provide > > one of > > those two options. > > > > -- > > ~Ethan~ > > > > *By argument-less I mean we can only give one thing to call. > > And if we do this with the current implementation of python you can > still only give one thing to call (assuming I'm correct with my above > example). We would have to fundamentally change python and how it > interprets those special cases (again basing the statement on my above > assumptions/intuition). I'm sure in another language I would find this > feature to be interesting or even appreciate it, but I don't think it > is either very pythonic or useful in python. I'm also wondering if > this is just a very well done troll since OP has yet to back any of > his own examples which he thinks others are choosing to make his life > difficult. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dreamingforward at gmail.com Wed Mar 20 05:20:57 2013 From: dreamingforward at gmail.com (Mark Janssen) Date: Tue, 19 Mar 2013 21:20:57 -0700 Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: References: <5147B3A9.2070201@pearwood.info> Message-ID: On Tue, Mar 19, 2013 at 7:49 AM, Nick Coghlan wrote: > On Mon, Mar 18, 2013 at 11:28 PM, Chris Angelico wrote: >> There are interfaces where a generic "do something with X and Y" >> concept makes sense, but I don't think program code is one of them. > > Having finally parsed out what I think the OP is asking for, I have to > disagree. In fact, Guido disagrees as well: he thinks what the OP > wants is so important that he built it into Python from day one. > > The notation Python uses to "send a message" to an object is actually > "obj(message)". Cheeky comments aside, the problem with this is that it conflates to fundamental, very different desires: execute and pass-this-message. While yes in my mind I can think of obj.method(data) as passing data to my object, *I don't want to do it in my mind*. I want to do it in the interpreter. That's a big difference, because *other programmers can't read my mind* and THEY won't know if I'm passing data or executing a function. Mark From dreamingforward at gmail.com Wed Mar 20 05:24:48 2013 From: dreamingforward at gmail.com (Mark Janssen) Date: Tue, 19 Mar 2013 21:24:48 -0700 Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: References: <5146B848.3040509@pearwood.info> <1363643500.25746.YahooMailNeo@web184701.mail.ne1.yahoo.com> <1363663891.8351.YahooMailNeo@web184701.mail.ne1.yahoo.com> Message-ID: On Tue, Mar 19, 2013 at 1:09 PM, Terry Reedy wrote: > On 3/18/2013 11:31 PM, Andrew Barnert wrote: > >> The idea that message passing is fundamentally different from method >> calling also turned out to be one of those strange ideas, since it >> only took a couple years to prove that they are theoretically >> completely isomorphic?and, > > Since the isomorphism is so obvious, I somehow missed that Kay actually > thought that they were different. I suppose one could have different (but > isomorphic) mental image models. Yes, that's the point I'm making, and it's significant because other programmers can't see other's mental models. mark From dreamingforward at gmail.com Wed Mar 20 05:36:16 2013 From: dreamingforward at gmail.com (Mark Janssen) Date: Tue, 19 Mar 2013 21:36:16 -0700 Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: References: <5147B3A9.2070201@pearwood.info> Message-ID: On Mon, Mar 18, 2013 at 7:14 PM, Haoyi Li wrote: > I felt I just had to chip in here, knowing something about how actors (which > is basically what you're advocating) work in Scala. Some points: Thank you. Your input is valued. > - Message sends have to give you something that method calls don't Right. > Whether that's the ability to ignore not-understood messages (like in > Obj-C) That's one. > , or making the message send-receive behavior asynchronous (like in > Scala) That's two. > or ensuring messages are handled serially per-object, This happens in either paradigm. > I think this is a significant point: small things (lists, tuples, > primitives) are kept as structs and the data inside them is manipulated > directly, Yes, and here is where something significant I think will happen. Complicated data structures just simply don't get re-used. Python allows lists within lists within lists, but any program that uses that outside of n x n matrices won't ever get re-used ever. Because there is no unified data model. > and big things (web servers, background-workers, http clients) are > then done with state-hiding and encapsulation and all that. Yeah, that part is fine. > Having a simple thing (like a list or a tuple) with encapsulation and > sending messages to it is as silly [...] Ah, but you see I'm envisioning a data ecosystem (to borrow a phrase) for the Internet. A peer-2-peer model for sharing data. So sending messages isn't so silly. > In particular, your dislike for "lists within lists" seems incompatible with > your desire for "more universal/general classes, building up to the > complexity desired". Isn't that almost the perfect example of simple, > general classes used to build up complex structures? I think I see the source of confusion, I used the word "object" when that is the term used in Python for lists, etc -- things used to store data, but I see them as separate. I make a distinction between classes which not only may be stateful, but be able to *do* things, with data types which don't "do" things, but *are* things. It's a subtle distinction, rather like languists distinguish between verbs and object even though both are *words*. Mark From ncoghlan at gmail.com Wed Mar 20 13:47:27 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 20 Mar 2013 05:47:27 -0700 Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: References: <5147B3A9.2070201@pearwood.info> Message-ID: On Tue, Mar 19, 2013 at 9:36 PM, Mark Janssen wrote: > I think I see the source of confusion, I used the word "object" when > that is the term used in Python for lists, etc -- things used to store > data, but I see them as separate. I make a distinction between > classes which not only may be stateful, but be able to *do* things, > with data types which don't "do" things, but *are* things. It's a > subtle distinction, rather like languists distinguish between verbs > and object even though both are *words*. Python, however, makes no such distinction - everything is either an object, or a reference to an object. Hence, the puzzled incomprehension in response to your proposal. In Python, even numbers can do things, like tell you how many bits an integer needs for its binary representation: >>> 3000 .bit_length() 12 Message passing is generally seen in the Python community as a higher level state isolation technique, something you use to manage increasing complexity, not something you use all the time. It's similar to the way we allow people to use Python for imperative or functional code, even though it's all object-oriented under the hood. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From stephen at xemacs.org Thu Mar 21 14:37:42 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 21 Mar 2013 22:37:42 +0900 Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: References: <5147B3A9.2070201@pearwood.info> Message-ID: <87wqt0dgbd.fsf@uwakimon.sk.tsukuba.ac.jp> Mark Janssen writes: > On Mon, Mar 18, 2013 at 7:14 PM, Haoyi Li wrote: > > I think this is a significant point: small things (lists, tuples, > > primitives) are kept as structs and the data inside them is > > manipulated directly, > > Yes, and here is where something significant I think will happen. > Complicated data structures just simply don't get re-used. Python > allows lists within lists within lists, but any program that uses > that outside of n x n matrices won't ever get re-used ever. I don't see the distinction you're aiming at. PyPI, for example, is full of extremely complicated data structures that get reused. Some of them get reused fairly frequently, though not as often as simple lists and tuples. > Because there is no unified data model. That I can agree with, as an absolute. > > Having a simple thing (like a list or a tuple) with encapsulation > > and sending messages to it is as silly [...] > > Ah, but you see I'm envisioning a data ecosystem (to borrow a > phrase) for the Internet. A peer-2-peer model for sharing data. > So sending messages isn't so silly. Sure. Smalltalk proved that. But it's also not a universal improvement over the algebraic model of objects and operators, or function and method calling, and so on. It can already be emulated in Python (including the "ignoring messages you don't understand" aspect, AFAIK) by using properties to turn what look like attribute references into method calls (or with a rather uglier syntax, dict references). I can see the advantage to you for your very specialized (at present, anyway) research project of a fairly radical change to Python syntax, but I don't see an advantage to the vast majority of Python users? From haoyi.sg at gmail.com Thu Mar 21 19:08:31 2013 From: haoyi.sg at gmail.com (Haoyi Li) Date: Thu, 21 Mar 2013 14:08:31 -0400 Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: <87wqt0dgbd.fsf@uwakimon.sk.tsukuba.ac.jp> References: <5147B3A9.2070201@pearwood.info> <87wqt0dgbd.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: There are basically two discussions happening here: ADTs (small objects with public contents) vs Encapsulation, and method calls vs message sends. A few responses: >> or ensuring messages are handled serially per-object, >This happens in either paradigm. I mean that messages sent *from multiple threads *are handled serially. If your entire program is single threaded (as many python programs are) or has a GIL which prevents multiple concurrent method calls, then i guess it isn't a big deal. But in a language that does have pre-emptive multithreading, this basically gives you a non-blocking, guaranteed (because there's no other way to interact with the object other than sending messages) "lock" around each object. Each object basically gets: - its own single-threaded event loop to run its methods - without the performance hit of creating lots and lots of threads - without the overhead of running multiple processes and IPC betweeen the one-thread-per-process event loops (twisted, node.js, etc.) when you want to utilize multiple cores - with minimal overhead over standard method calls (both syntactic and performance) - for zero effort on the part of the programmer These are definitely not things you get in either paradigm, and are probably the main reasons people use actors in Scala: they don't use them because they want a funky function call syntax, or they don't like static checking for unknown messages! It really does give you a lot of nice things in return for the funky "function call" (message send) syntax >Yes, and here is where something significant I think will happen. >Complicated data structures just simply don't get re-used. Python >allows lists within lists within lists, but any program that uses that >outside of n x n matrices won't ever get re-used ever. Because there >is no unified data model. I disagree completely. Coming from Java, where every list-within-list has its own special class to represent it, my experience is that is a terrible idea: - Want to convert from one list-within-list to another list-within-list? Use a list comprehension and be done with it. Want to convert a special InsnList to a special ParameterList? Much more annoying. - Want to do normal-list-things on your list-within-list? just use map() filter() reduce() or list comprehension. Want to do normal list things to you special InsnList object which isn't a normal list? Also much more annoying. - Want to get something out of a list-within-list? just use square braces list_in_list[i][j]. Want to get something out of the special InsnList object? You'll need to look up the magic method to call (or message to send) to do it. It's not like encapsulating the whole thing makes the data struture any less complicated: it just makes it more annoying to do things with, because now I have to learn *your* way of doing things rather than the standard python-list way of doing things. In reality, nobody is ever going to re-use either your special InsnList object or my list-of-lists-of-instructions. However, when they're working with my list-of-lists, at least they can re-use *other* existing list-handling functions to manipulate it, and not have to dig through my docs to see what methods (or messages) my InsnList exposes. In the end, message sends and method calls are basically isomorphic, so much so that in Scala you can transparently convert method calls to message sends under the hood if you prefer that syntax! Unless there's some significant advantage of doing it, it seems to me that the improvement would basically be forcing people to run a regex on their code to convert obj.method(a, b) to obj << (msg, a, b), and then life goes on exactly as it did before. On Thu, Mar 21, 2013 at 9:37 AM, Stephen J. Turnbull wrote: > Mark Janssen writes: > > On Mon, Mar 18, 2013 at 7:14 PM, Haoyi Li wrote: > > > > I think this is a significant point: small things (lists, tuples, > > > primitives) are kept as structs and the data inside them is > > > manipulated directly, > > > > Yes, and here is where something significant I think will happen. > > Complicated data structures just simply don't get re-used. Python > > allows lists within lists within lists, but any program that uses > > that outside of n x n matrices won't ever get re-used ever. > > I don't see the distinction you're aiming at. PyPI, for example, is > full of extremely complicated data structures that get reused. Some > of them get reused fairly frequently, though not as often as simple > lists and tuples. > > > Because there is no unified data model. > > That I can agree with, as an absolute. > > > > Having a simple thing (like a list or a tuple) with encapsulation > > > and sending messages to it is as silly [...] > > > > Ah, but you see I'm envisioning a data ecosystem (to borrow a > > phrase) for the Internet. A peer-2-peer model for sharing data. > > So sending messages isn't so silly. > > Sure. Smalltalk proved that. But it's also not a universal > improvement over the algebraic model of objects and operators, or > function and method calling, and so on. It can already be emulated in > Python (including the "ignoring messages you don't understand" aspect, > AFAIK) by using properties to turn what look like attribute references > into method calls (or with a rather uglier syntax, dict references). > > I can see the advantage to you for your very specialized (at present, > anyway) research project of a fairly radical change to Python syntax, > but I don't see an advantage to the vast majority of Python users? > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dreamingforward at gmail.com Thu Mar 21 23:22:46 2013 From: dreamingforward at gmail.com (Mark Janssen) Date: Thu, 21 Mar 2013 15:22:46 -0700 Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: References: <5147B3A9.2070201@pearwood.info> <87wqt0dgbd.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Thu, Mar 21, 2013 at 11:08 AM, Haoyi Li wrote: > There are basically two discussions happening here: ADTs (small objects > with public contents) vs Encapsulation, and method calls vs message sends. > A few responses: > > Thank you. You've boiled the discussion down to the two main elements. Abstract data type's (which I was calling "prototypes") vs. Encapsulation is the important distinction. It's not an easy distinction to see from the words alone. The former, I'll say, is built from the ground (the bits) upwards, while the latter, while the latter from the top (the application-layer), downwards. But, I wouldn't say that for ADTs that public contents were the important part. In fact, I'd argue the opposite. > >> or ensuring messages are handled serially per-object, > > >This happens in either paradigm. > > I mean that messages sent *from multiple threads *are handled serially. > Well this is where it gets bizarre to me. Because, despite threading, the CPU still has to dispatch all processes and handle the data passing, ultimately. > If your entire program is single threaded (as many python programs are) or > has a GIL which prevents multiple concurrent method calls, then i guess it > isn't a big deal. But in a language that does have pre-emptive > multithreading, this basically gives you a non-blocking, guaranteed > (because there's no other way to interact with the object other than > sending messages) "lock" around each object. Each object basically gets: > > - its own single-threaded event loop to run its methods > - without the performance hit of creating lots and lots of threads > - without the overhead of running multiple processes and IPC betweeen the > one-thread-per-process event loops (twisted, node.js, etc.) when you want > to utilize multiple cores > - with minimal overhead over standard method calls (both syntactic and > performance) > - for zero effort on the part of the programmer > Well, I argue that in the world of programming language evolution, all of these, except for the last, are premature optimizations. > > >Yes, and here is where something significant I think will happen. > >Complicated data structures just simply don't get re-used. Python > >allows lists within lists within lists, but any program that uses that > >outside of n x n matrices won't ever get re-used ever. Because there > >is no unified data model. > > I disagree completely. Coming from Java, where every list-within-list has > its own special class to represent it, my experience is that is a terrible > idea: > > - Want to convert from one list-within-list to another list-within-list? > Use a list comprehension and be done with it. Want to convert a special > InsnList to a special ParameterList? Much more annoying. > How can you use a list comprehension unless you know the length or depth of each sub-list? > It's not like encapsulating the whole thing makes the data struture any > less complicated: it just makes it more annoying to do things with, because > now I have to learn *your* way of doing things rather than the standard > python-list way of doing things. > Where that where the ClassName and the Class.__doc__ should get you half the way there. There necessary component would be a universal/abstract data type. I propose a FractalGraph which can handle and scale any level of complexity. As long as we have a universal data type, the class hierarchy that emerges will be simple and general to handle, rather than dealing with personal taxonomies. > In the end, message sends and method calls are basically isomorphic, so > much so that in Scala you can transparently convert method calls to > message sends under the hood if > you prefer that syntax! Unless there's some significant advantage of doing > it, it seems to me that the improvement would basically be forcing people > to run a regex on their code to convert obj.method(a, b) to obj << (msg, a, > b), and then life goes on exactly as it did before. > No. While you're technically correct as to the issue of isomorphism. In practice, the usage you propose does not help whatsoever. "(mgs, a, b)" is too complex. Firstly, in the message-passing scheme I'm proposing, you never pass in more than one piece of data, unless it's a very structured, perhaps (a single XML-like tree?). mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From haoyi.sg at gmail.com Fri Mar 22 00:49:44 2013 From: haoyi.sg at gmail.com (Haoyi Li) Date: Thu, 21 Mar 2013 19:49:44 -0400 Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: References: <5147B3A9.2070201@pearwood.info> <87wqt0dgbd.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: > Well this is where it gets bizarre to me. Because, despite threading, the CPU still has to dispatch all processes and handle the data passing, ultimately. Yeah. It works automagically, though, really nicely. It means I don't need to mess around with a huge spaghetti of locks and mutexes and semaphores when i want to use multiple cores. Although the CPU has to handle all the processes and data passing, the programmer (almost) doesn't have to. That's a pretty big gain. > Well, I argue that in the world of programming language evolution, all of these, except for the last, are premature optimizations. I dunno, lots of people like event loops, not just the scala actors people: Go's goroutines, python's greenlets, ruby fibers, etc. are all doing this. They provide nice correctness (no pre-emption causing race conditions) and performance (OS threads are heavy!) characteristics. Looking at the "message passing" systems I know about: - Erlang processes - Go goroutines - Scala actors - Obj-C objects 3 out of 4 are basically using it for concurrency, for isolating mutable state and providing the multi-thread-serializable-calling behavior you called "bizarre". This is just to back up my claim that the things Scala-actors (and the other message passing systems above) give you w.r.t. concurrency is actually *really nice*, and even if you don't see the benefits, they're there and people love them. > How can you use a list comprehension unless you know the length or depth of each sub-list? The most common cases are non-recursive data structures, so I already know their depth, the less common case is recursive data structures, and I recurse on them; people do it all the time. I mean you could encapsulate the recursion in a method which calls itself on its children, or you could ask for its children in your function and recurse on them yourself. Neither is really better or worse, just different approaches with different tradeoffs. > Where that where the ClassName and the Class.__doc__ should get you half the way there. There necessary component would be a universal/abstract data type. I propose a FractalGraph which can handle and scale any level of complexity. As long as we have a universal data type, the class hierarchy that emerges will be simple and general to handle, rather than dealing with personal taxonomies. Isn't the idea of an "object" with "methods" and "fields" basically that universal data type? How is it yours will scale to any level of complexity and be simple and general to handle but the "object" universal data type doesn't? > No. While you're technically correct as to the issue of isomorphism. In practice, the usage you propose does not help whatsoever. "(mgs, a, b)" is too complex. Firstly, in the message-passing scheme I'm proposing, you never pass in more than one piece of data, unless it's a very structured, perhaps (a single XML-like tree?). Is the proposal basically to use shorter methods with fewer parameters? I mean, a list of arguments (which could be objects and contain other objects inside) sounds exactly like an XML-like tree to me. Here's an interesting question: Is there any existing language/system out there, that's built in the manner you describe (message passing etc.), which you can point at and say "see, they do it this way and it works much better"? That would help immensely in understanding what benefits you're envisioning, since I (and others?) apparently don't see them On Thu, Mar 21, 2013 at 6:22 PM, Mark Janssen wrote: > On Thu, Mar 21, 2013 at 11:08 AM, Haoyi Li wrote: > >> There are basically two discussions happening here: ADTs (small objects >> with public contents) vs Encapsulation, and method calls vs message sends. >> A few responses: >> >> Thank you. You've boiled the discussion down to the two main elements. > Abstract data type's (which I was calling "prototypes") vs. Encapsulation > is the important distinction. It's not an easy distinction to see from the > words alone. The former, I'll say, is built from the ground (the bits) > upwards, while the latter, while the latter from the top (the > application-layer), downwards. But, I wouldn't say that for ADTs that > public contents were the important part. In fact, I'd argue the opposite. > > >> >> or ensuring messages are handled serially per-object, >> >> >This happens in either paradigm. >> >> I mean that messages sent *from multiple threads *are handled serially. >> > > Well this is where it gets bizarre to me. Because, despite threading, the > CPU still has to dispatch all processes and handle the data passing, > ultimately. > > >> If your entire program is single threaded (as many python programs are) >> or has a GIL which prevents multiple concurrent method calls, then i guess >> it isn't a big deal. But in a language that does have pre-emptive >> multithreading, this basically gives you a non-blocking, guaranteed >> (because there's no other way to interact with the object other than >> sending messages) "lock" around each object. Each object basically gets: >> >> - its own single-threaded event loop to run its methods >> - without the performance hit of creating lots and lots of threads >> - without the overhead of running multiple processes and IPC betweeen the >> one-thread-per-process event loops (twisted, node.js, etc.) when you want >> to utilize multiple cores >> - with minimal overhead over standard method calls (both syntactic and >> performance) >> - for zero effort on the part of the programmer >> > > Well, I argue that in the world of programming language evolution, all of > these, except for the last, are premature optimizations. > >> >> >Yes, and here is where something significant I think will happen. >> >Complicated data structures just simply don't get re-used. Python >> >allows lists within lists within lists, but any program that uses that >> >outside of n x n matrices won't ever get re-used ever. Because there >> >is no unified data model. >> >> I disagree completely. Coming from Java, where every list-within-list has >> its own special class to represent it, my experience is that is a terrible >> idea: >> >> > - Want to convert from one list-within-list to another list-within-list? >> Use a list comprehension and be done with it. Want to convert a special >> InsnList to a special ParameterList? Much more annoying. >> > > How can you use a list comprehension unless you know the length or depth > of each sub-list? > > >> It's not like encapsulating the whole thing makes the data struture any >> less complicated: it just makes it more annoying to do things with, because >> now I have to learn *your* way of doing things rather than the standard >> python-list way of doing things. >> > > Where that where the ClassName and the Class.__doc__ should get you half > the way there. There necessary component would be a universal/abstract > data type. I propose a FractalGraph which can handle and scale any level > of complexity. As long as we have a universal data type, the class > hierarchy that emerges will be simple and general to handle, rather than > dealing with personal taxonomies. > > >> In the end, message sends and method calls are basically isomorphic, so >> much so that in Scala you can transparently convert method calls to >> message sends under the hood if >> you prefer that syntax! Unless there's some significant advantage of doing >> it, it seems to me that the improvement would basically be forcing people >> to run a regex on their code to convert obj.method(a, b) to obj << (msg, a, >> b), and then life goes on exactly as it did before. >> > > No. While you're technically correct as to the issue of isomorphism. In > practice, the usage you propose does not help whatsoever. "(mgs, a, b)" is > too complex. Firstly, in the message-passing scheme I'm proposing, you > never pass in more than one piece of data, unless it's a very structured, > perhaps (a single XML-like tree?). > > mark > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Fri Mar 22 10:08:19 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 22 Mar 2013 02:08:19 -0700 (PDT) Subject: [Python-ideas] Message passing syntax for objects In-Reply-To: References: <5147B3A9.2070201@pearwood.info> <87wqt0dgbd.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <1363943299.43688.YahooMailNeo@web184701.mail.ne1.yahoo.com> From: Haoyi Li Sent: Thursday, March 21, 2013 11:08 AM >I disagree completely. Coming from Java, where every list-within-list has its own special class to represent it, my experience is that is a terrible idea: This is a side issue, but??that's not the problem with Java lists. After all, Haskell also has a separate type for every list-within-list, yet you can still use comprehensions, map/filter/reduce, etc., and they even work over abstractions rather than just lists (I promise, I won't use the m-word here). Even C++ has std::transform and friends.?The terrible idea is having a rigid and non-extensible type system. Haskell (and, to a much lesser extent, C++) avoids that through explicit parameterization; Python (and, to a lesser extent, Smalltalk/ObjC/etc.) avoids it through implicit duck typing; Java forces the programmer to deal with it by writing horrible manual boilerplate. >In the end, message sends and method calls are basically isomorphic, so much so that in Scala you can transparently convert method calls to message sends under the hood?if you prefer that syntax! Unless there's some significant advantage of doing it, it seems to me that the improvement would basically be forcing people to run a regex on their code to convert obj.method(a, b) to obj << (msg, a, b), and then life goes on exactly as it did before. Here is the main point. Everyone is reading their own ideas into the proposal, because nobody can believe it's as ridiculous as it sounds. The whole key to the proposal is that you _don't_ have messages with parameters like (msg, a, b); a message is just an object. You can do obj1 >> obj2, and that's all you ever need. obj2 knows what to do with obj1, so you don't need to tell it. Since it's patently obvious that you _do_ need objects with more than one behavior, people assume he _must_ be talking about messages and objects in the sense of Smalltalk/Erlang/Scala/Go/etc.: a message as something that describes an action to take, with arguments. The fact that he's specifically referenced Alan Kay and Smalltalk and papers about this kind of message-sending makes that assumption even harder to avoid. But the assumption is wrong. You can't convert obj.method(a, b) to his syntax, because if you need to, obj isn't a good example of an object. He's specifically said, multiple times, that if you think there are multiple things obj2 might want to do with obj1, that means you're not imagining the right kinds of objects. Lists, numbers, files, filesystems, servers, documents, hyperlinked webs of documents, GUI windows, database tables, events, animals, cars, ecosystems, user accounts?all of these things are bad examples of objects. And he hasn't given any good examples, no matter how many times he's been asked; the only examples he's given are lists, numbers, and filesystems, all of which he claims are terrible. While his objects are "universal", there are no such things that exist. Except, of course, for single-argument, void-return functions. They work exactly like what he's describing as "objects". That observation has already been made three times, and he hasn't even responded. He just waits for someone else to come along, read something into his proposal that isn't there, and talk to them until they figure out he's a quack. From Ronny.Pfannschmidt at gmx.de Fri Mar 22 23:31:40 2013 From: Ronny.Pfannschmidt at gmx.de (Ronny Pfannschmidt) Date: Fri, 22 Mar 2013 23:31:40 +0100 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes Message-ID: <514CDBCC.4080407@gmx.de> Hi, while reviewing urllib.parse i noticed a pretty ugly pattern many functions had an attached global and in their own code they would compile an regex on first use and assign it to that global its clear that compiling a regex is expensive, so having them be compiled later at first use would be of some benefit but instead of all that reptetive code there should be an alternative to re.compile that waits with compilation for the first use -- Ronny From greg at krypto.org Fri Mar 22 23:42:49 2013 From: greg at krypto.org (Gregory P. Smith) Date: Fri, 22 Mar 2013 15:42:49 -0700 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes In-Reply-To: <514CDBCC.4080407@gmx.de> References: <514CDBCC.4080407@gmx.de> Message-ID: On Fri, Mar 22, 2013 at 3:31 PM, Ronny Pfannschmidt < Ronny.Pfannschmidt at gmx.de> wrote: > Hi, > > while reviewing urllib.parse i noticed a pretty ugly pattern > > many functions had an attached global and in their own code they would > compile an regex on first use and assign it to that global > > its clear that compiling a regex is expensive, so having them be compiled > later at first use would be of some benefit > It isn't expensive to do, it is expensive to do repeatedly for no reason. Thus the use of compiled regexes. Code like this would be better off refactored to reference a precompiled global rather than conditionally check if it needs compiling every time it is called. -gps > > but instead of all that reptetive code there should be an alternative to > re.compile that waits with compilation for the first use > > -- Ronny > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Mar 23 03:00:12 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 22 Mar 2013 19:00:12 -0700 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes In-Reply-To: References: <514CDBCC.4080407@gmx.de> Message-ID: On Fri, Mar 22, 2013 at 3:42 PM, Gregory P. Smith wrote: > > On Fri, Mar 22, 2013 at 3:31 PM, Ronny Pfannschmidt > wrote: >> >> Hi, >> >> while reviewing urllib.parse i noticed a pretty ugly pattern >> >> many functions had an attached global and in their own code they would >> compile an regex on first use and assign it to that global >> >> its clear that compiling a regex is expensive, so having them be compiled >> later at first use would be of some benefit > > > It isn't expensive to do, it is expensive to do repeatedly for no reason. > Thus the use of compiled regexes. Code like this would be better off > refactored to reference a precompiled global rather than conditionally check > if it needs compiling every time it is called. Alternatively, if there are a lot of different regexes, it may be better to rely on the implicit cache inside the re module. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ezio.melotti at gmail.com Sat Mar 23 13:04:34 2013 From: ezio.melotti at gmail.com (Ezio Melotti) Date: Sat, 23 Mar 2013 14:04:34 +0200 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes In-Reply-To: References: <514CDBCC.4080407@gmx.de> Message-ID: Hi, On Sat, Mar 23, 2013 at 12:42 AM, Gregory P. Smith wrote: > > On Fri, Mar 22, 2013 at 3:31 PM, Ronny Pfannschmidt > wrote: >> >> Hi, >> >> while reviewing urllib.parse i noticed a pretty ugly pattern >> >> many functions had an attached global and in their own code they would >> compile an regex on first use and assign it to that global >> >> its clear that compiling a regex is expensive, so having them be compiled >> later at first use would be of some benefit > > > It isn't expensive to do, Sometimes it is, see e.g. http://bugs.python.org/issue11454. Best Regards, Ezio Melotti > it is expensive to do repeatedly for no reason. > Thus the use of compiled regexes. Code like this would be better off > refactored to reference a precompiled global rather than conditionally check > if it needs compiling every time it is called. > > -gps > >> >> >> but instead of all that reptetive code there should be an alternative to >> re.compile that waits with compilation for the first use >> >> -- Ronny From solipsis at pitrou.net Sat Mar 23 13:18:28 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 23 Mar 2013 13:18:28 +0100 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes References: <514CDBCC.4080407@gmx.de> Message-ID: <20130323131828.2053bee2@pitrou.net> On Fri, 22 Mar 2013 15:42:49 -0700 "Gregory P. Smith" wrote: > On Fri, Mar 22, 2013 at 3:31 PM, Ronny Pfannschmidt < > Ronny.Pfannschmidt at gmx.de> wrote: > > > Hi, > > > > while reviewing urllib.parse i noticed a pretty ugly pattern > > > > many functions had an attached global and in their own code they would > > compile an regex on first use and assign it to that global > > > > its clear that compiling a regex is expensive, so having them be compiled > > later at first use would be of some benefit > > > > It isn't expensive to do, it is expensive to do repeatedly for no reason. > Thus the use of compiled regexes. Code like this would be better off > refactored to reference a precompiled global rather than conditionally > check if it needs compiling every time it is called. Precompiled regexes were a major contributor in Python startup time: http://bugs.python.org/issue13150 http://hg.python.org/cpython/rev/df950158dc33/ In the stdlib we should be rather careful about this. Third-party libraries can ignore such concerns if they aren't meant to use in interactive applications. Regards Antoine. From mal at egenix.com Sat Mar 23 13:38:19 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Sat, 23 Mar 2013 13:38:19 +0100 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes In-Reply-To: <20130323131828.2053bee2@pitrou.net> References: <514CDBCC.4080407@gmx.de> <20130323131828.2053bee2@pitrou.net> Message-ID: <514DA23B.9010301@egenix.com> On 23.03.2013 13:18, Antoine Pitrou wrote: > On Fri, 22 Mar 2013 15:42:49 -0700 > "Gregory P. Smith" wrote: >> On Fri, Mar 22, 2013 at 3:31 PM, Ronny Pfannschmidt < >> Ronny.Pfannschmidt at gmx.de> wrote: >> >>> Hi, >>> >>> while reviewing urllib.parse i noticed a pretty ugly pattern >>> >>> many functions had an attached global and in their own code they would >>> compile an regex on first use and assign it to that global >>> >>> its clear that compiling a regex is expensive, so having them be compiled >>> later at first use would be of some benefit >>> >> >> It isn't expensive to do, it is expensive to do repeatedly for no reason. >> Thus the use of compiled regexes. Code like this would be better off >> refactored to reference a precompiled global rather than conditionally >> check if it needs compiling every time it is called. > > Precompiled regexes were a major contributor in Python startup time: > http://bugs.python.org/issue13150 > http://hg.python.org/cpython/rev/df950158dc33/ > > In the stdlib we should be rather careful about this. Third-party > libraries can ignore such concerns if they aren't meant to use in > interactive applications. Wouldn't it make sense to add a way to pickle or marshal compiled REs ? The precompiled REs could then be loaded directly from the pickle, avoiding the compiling overhead on startup. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 23 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-03-13: Released eGenix pyOpenSSL 0.13 ... http://egenix.com/go39 ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From Ronny.Pfannschmidt at gmx.de Sat Mar 23 13:43:19 2013 From: Ronny.Pfannschmidt at gmx.de (Ronny Pfannschmidt) Date: Sat, 23 Mar 2013 13:43:19 +0100 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes In-Reply-To: <514DA23B.9010301@egenix.com> References: <514CDBCC.4080407@gmx.de> <20130323131828.2053bee2@pitrou.net> <514DA23B.9010301@egenix.com> Message-ID: <514DA367.5030605@gmx.de> > Wouldn't it make sense to add a way to pickle or marshal compiled REs ? > > The precompiled REs could then be loaded directly from the > pickle, avoiding the compiling overhead on startup. > as far as i can tell that would need regex as part of the syntax to make sense fort use in modules i dont think such a change would be accepted and i dont even what to deal with the potential bikeshedding for such an integration From mal at egenix.com Sat Mar 23 13:52:41 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Sat, 23 Mar 2013 13:52:41 +0100 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes In-Reply-To: <514DA367.5030605@gmx.de> References: <514CDBCC.4080407@gmx.de> <20130323131828.2053bee2@pitrou.net> <514DA23B.9010301@egenix.com> <514DA367.5030605@gmx.de> Message-ID: <514DA599.5090309@egenix.com> On 23.03.2013 13:43, Ronny Pfannschmidt wrote: > >> Wouldn't it make sense to add a way to pickle or marshal compiled REs ? >> >> The precompiled REs could then be loaded directly from the >> pickle, avoiding the compiling overhead on startup. >> > > as far as i can tell that would need regex as part of the syntax to make sense fort use in modules > i dont think such a change would be accepted and i dont even what to deal with the potential > bikeshedding for such an integration I wasn't thinking of making it part of the Python byte-code. It would suffice to add pickle/marshal support for the compiled RE code. This could then be loaded from a string embedded in the module code on startup. E.g. # rx = re.compile('.*') rx = pickle.loads('asdfsadfasdf') It would also be possible to seed the re module cache with such pickle.loads, perhaps compiled at Python build time. This would avoid having to change code in the stdlib to load pickles. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 23 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-03-13: Released eGenix pyOpenSSL 0.13 ... http://egenix.com/go39 ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From rosuav at gmail.com Sat Mar 23 14:15:30 2013 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 24 Mar 2013 00:15:30 +1100 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes In-Reply-To: <514DA599.5090309@egenix.com> References: <514CDBCC.4080407@gmx.de> <20130323131828.2053bee2@pitrou.net> <514DA23B.9010301@egenix.com> <514DA367.5030605@gmx.de> <514DA599.5090309@egenix.com> Message-ID: On Sat, Mar 23, 2013 at 11:52 PM, M.-A. Lemburg wrote: > It would suffice to add pickle/marshal support for the > compiled RE code. This could then be loaded from a string > embedded in the module code on startup. > > E.g. > # rx = re.compile('.*') > rx = pickle.loads('asdfsadfasdf') What would that do to versioning? Currently, as I understand it, the compiled RE is a complete implementation detail; at any time, the re module can change how it stores it. Pickles (again, as I understand it - I may be wrong) should be readable on other versions of Python (forward-compatibly, at least), on other architectures, etc, etc; would this be a problem? Alternatively, at the expense of some storage space, there could be some kind of fallback. If the tag doesn't perfectly match the creating Python's tag, it ignores the dumped version and just compiles it as normal. Hmm. Here's a mad thought - a bit of latticed casementing, if you like. Could the compiled regexes be stored in the .pyc file? That already has version tagging done. All it'd take is some sort of extension mechanism that says "hey, here's some additional data that the pyc might want to make use of". Or would that overly complicate matters? ChrisA From masklinn at masklinn.net Sat Mar 23 14:26:30 2013 From: masklinn at masklinn.net (Masklinn) Date: Sat, 23 Mar 2013 14:26:30 +0100 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes In-Reply-To: References: <514CDBCC.4080407@gmx.de> Message-ID: <30264E4D-C98E-45DE-B1F3-9BA5B7E9C6EC@masklinn.net> On 2013-03-23, at 03:00 , Nick Coghlan wrote: > On Fri, Mar 22, 2013 at 3:42 PM, Gregory P. Smith wrote: >> >> On Fri, Mar 22, 2013 at 3:31 PM, Ronny Pfannschmidt >> wrote: >>> >>> Hi, >>> >>> while reviewing urllib.parse i noticed a pretty ugly pattern >>> >>> many functions had an attached global and in their own code they would >>> compile an regex on first use and assign it to that global >>> >>> its clear that compiling a regex is expensive, so having them be compiled >>> later at first use would be of some benefit >> >> >> It isn't expensive to do, it is expensive to do repeatedly for no reason. >> Thus the use of compiled regexes. Code like this would be better off >> refactored to reference a precompiled global rather than conditionally check >> if it needs compiling every time it is called. > > Alternatively, if there are a lot of different regexes, it may be > better to rely on the implicit cache inside the re module. Wouldn't it be better if there are *few* different regexes? Since the module itself caches 512 expressions (100 in Python 2) and does not use an LRU or other "smart" cache (it just clears the whole cache dict once the limit is breached as far as I can see), *and* any explicit call to re.compile will *still* use the internal cache (meaning even going through re.compile will count against the _MAXCACHE limit), all regex uses throughout the application (including standard library &al) will count against the built-in cache and increase the chance of the regex we want cached to be thrown out no? From phd at phdru.name Sat Mar 23 14:36:09 2013 From: phd at phdru.name (Oleg Broytman) Date: Sat, 23 Mar 2013 17:36:09 +0400 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes In-Reply-To: <514DA23B.9010301@egenix.com> References: <514CDBCC.4080407@gmx.de> <20130323131828.2053bee2@pitrou.net> <514DA23B.9010301@egenix.com> Message-ID: <20130323133609.GA30682@iskra.aviel.ru> On Sat, Mar 23, 2013 at 01:38:19PM +0100, "M.-A. Lemburg" wrote: > Wouldn't it make sense to add a way to pickle or marshal compiled REs ? > > The precompiled REs could then be loaded directly from the > pickle, avoiding the compiling overhead on startup. But with an overhead of opening files and unpickling. My wild guess [not backed by numbers] is that it would be as slow. I suspect the only way to speedup things would be to precompile regexps at compile (generating .pyc) time and saving compiled regexps in the byte code file. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From solipsis at pitrou.net Sat Mar 23 14:34:33 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 23 Mar 2013 14:34:33 +0100 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes References: <514CDBCC.4080407@gmx.de> <30264E4D-C98E-45DE-B1F3-9BA5B7E9C6EC@masklinn.net> Message-ID: <20130323143433.1ef5fd71@pitrou.net> On Sat, 23 Mar 2013 14:26:30 +0100 Masklinn wrote: > > Wouldn't it be better if there are *few* different regexes? Since the > module itself caches 512 expressions (100 in Python 2) and does not use > an LRU or other "smart" cache (it just clears the whole cache dict once > the limit is breached as far as I can see), *and* any explicit call to > re.compile will *still* use the internal cache (meaning even going > through re.compile will count against the _MAXCACHE limit), all regex > uses throughout the application (including standard library &al) will > count against the built-in cache and increase the chance of the regex > we want cached to be thrown out no? Well, it mostly sounds like the re cache should be made a bit smarter. Regards Antoine. From ezio.melotti at gmail.com Sat Mar 23 14:50:56 2013 From: ezio.melotti at gmail.com (Ezio Melotti) Date: Sat, 23 Mar 2013 15:50:56 +0200 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes In-Reply-To: <20130323143433.1ef5fd71@pitrou.net> References: <514CDBCC.4080407@gmx.de> <30264E4D-C98E-45DE-B1F3-9BA5B7E9C6EC@masklinn.net> <20130323143433.1ef5fd71@pitrou.net> Message-ID: On Sat, Mar 23, 2013 at 3:34 PM, Antoine Pitrou wrote: > On Sat, 23 Mar 2013 14:26:30 +0100 > Masklinn wrote: >> >> Wouldn't it be better if there are *few* different regexes? Since the >> module itself caches 512 expressions (100 in Python 2) and does not use >> an LRU or other "smart" cache (it just clears the whole cache dict once >> the limit is breached as far as I can see), *and* any explicit call to >> re.compile will *still* use the internal cache (meaning even going >> through re.compile will count against the _MAXCACHE limit), all regex >> uses throughout the application (including standard library &al) will >> count against the built-in cache and increase the chance of the regex >> we want cached to be thrown out no? > > Well, it mostly sounds like the re cache should be made a bit smarter. > See http://bugs.python.org/issue17441. Best Regards, Ezio Melotti > Regards > > Antoine. > From ezio.melotti at gmail.com Sat Mar 23 14:53:17 2013 From: ezio.melotti at gmail.com (Ezio Melotti) Date: Sat, 23 Mar 2013 15:53:17 +0200 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes In-Reply-To: <514DA599.5090309@egenix.com> References: <514CDBCC.4080407@gmx.de> <20130323131828.2053bee2@pitrou.net> <514DA23B.9010301@egenix.com> <514DA367.5030605@gmx.de> <514DA599.5090309@egenix.com> Message-ID: On Sat, Mar 23, 2013 at 2:52 PM, M.-A. Lemburg wrote: > On 23.03.2013 13:43, Ronny Pfannschmidt wrote: >> >>> Wouldn't it make sense to add a way to pickle or marshal compiled REs ? >>> >>> The precompiled REs could then be loaded directly from the >>> pickle, avoiding the compiling overhead on startup. >>> >> >> as far as i can tell that would need regex as part of the syntax to make sense fort use in modules >> i dont think such a change would be accepted and i dont even what to deal with the potential >> bikeshedding for such an integration > > I wasn't thinking of making it part of the Python byte-code. > > It would suffice to add pickle/marshal support for the > compiled RE code. This could then be loaded from a string > embedded in the module code on startup. > > E.g. > # rx = re.compile('.*') > rx = pickle.loads('asdfsadfasdf') > According to http://bugs.python.org/issue11454#msg170697, this would be twice as slow. Best Regards, Ezio Melotti > It would also be possible to seed the re module cache with > such pickle.loads, perhaps compiled at Python build time. > This would avoid having to change code in the stdlib to > load pickles. > > -- > Marc-Andre Lemburg > eGenix.com From jsbueno at python.org.br Sat Mar 23 15:09:30 2013 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Sat, 23 Mar 2013 11:09:30 -0300 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes In-Reply-To: References: <514CDBCC.4080407@gmx.de> <20130323131828.2053bee2@pitrou.net> <514DA23B.9010301@egenix.com> <514DA367.5030605@gmx.de> <514DA599.5090309@egenix.com> Message-ID: On 23 March 2013 10:15, Chris Angelico wrote: > On Sat, Mar 23, 2013 at 11:52 PM, M.-A. Lemburg wrote: >> It would suffice to add pickle/marshal support for the >> compiled RE code. This could then be loaded from a string >> embedded in the module code on startup. >> >> E.g. >> # rx = re.compile('.*') >> rx = pickle.loads('asdfsadfasdf') > > What would that do to versioning? Currently, as I understand it, the > compiled RE is a complete implementation detail; at any time, the re > module can change how it stores it. Pickles (again, as I understand it > - I may be wrong) should be readable on other versions of Python > (forward-compatibly, at least), on other architectures, etc, etc; > would this be a problem? > > Alternatively, at the expense of some storage space, there could be > some kind of fallback. If the tag doesn't perfectly match the creating > Python's tag, it ignores the dumped version and just compiles it as > normal. Pleas enote that compiled reg-expes can already be pickled straightforwardly. Unfortunatelly, to avoid the version issues you mention, from overlooking the pickled string, it looks like it just calls "re.compile" with the original regex on unpickle - so there would be no gain from the implementation as is. (I should stop being that lazy, and check what does unpickling a regexp actually does = Ah --Ezio found it while I was at it) > > Hmm. Here's a mad thought - a bit of latticed casementing, if you > like. Could the compiled regexes be stored in the .pyc file? That > already has version tagging done. All it'd take is some sort of > extension mechanism that says "hey, here's some additional data that > the pyc might want to make use of". Or would that overly complicate > matters? I can't see how this could be achieved but for adding a special syntax that would compile reg-exps at parsing time. Then, we might as well use Perl instead :-) But maybe some custom serializing could go straight into the sre_code that would proper serialize its objects as python-bytecode, and them some helper functions to load them from a custom made pyc file. These pre-generated pycs would be built at Python build time. > > ChrisA > _______________________________________________ From jsbueno at python.org.br Sat Mar 23 15:14:54 2013 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Sat, 23 Mar 2013 11:14:54 -0300 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes In-Reply-To: References: <514CDBCC.4080407@gmx.de> <20130323131828.2053bee2@pitrou.net> <514DA23B.9010301@egenix.com> <514DA367.5030605@gmx.de> <514DA599.5090309@egenix.com> Message-ID: On 23 March 2013 11:09, Joao S. O. Bueno wrote: > On 23 March 2013 10:15, Chris Angelico wrote: >> On Sat, Mar 23, 2013 at 11:52 PM, M.-A. Lemburg wrote: >>> It would suffice to add pickle/marshal support for the >>> compiled RE code. This could then be loaded from a string >>> embedded in the module code on startup. >>> >>> E.g. >>> # rx = re.compile('.*') >>> rx = pickle.loads('asdfsadfasdf') >> >> What would that do to versioning? Currently, as I understand it, the >> compiled RE is a complete implementation detail; at any time, the re >> module can change how it stores it. Pickles (again, as I understand it >> - I may be wrong) should be readable on other versions of Python >> (forward-compatibly, at least), on other architectures, etc, etc; >> would this be a problem? >> >> Alternatively, at the expense of some storage space, there could be >> some kind of fallback. If the tag doesn't perfectly match the creating >> Python's tag, it ignores the dumped version and just compiles it as >> normal. > > Pleas enote that compiled reg-expes can already be pickled > straightforwardly. > > Unfortunatelly, to avoid the version issues you mention, from overlooking > the pickled string, it looks like it just calls "re.compile" with the original > regex on unpickle - so there would be no gain from the implementation as is. > > (I should stop being that lazy, and check what does unpickling a > regexp actually does = > Ah --Ezio found it while I was at it) There it is, straight in re.py: import copyreg def _pickle(p): return _compile, (p.pattern, p.flags) copyreg.pickle(_pattern_type, _pickle, _compile) So, pickling regexps as they are now are definitely no speed-up. From rosuav at gmail.com Sat Mar 23 15:25:15 2013 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 24 Mar 2013 01:25:15 +1100 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes In-Reply-To: References: <514CDBCC.4080407@gmx.de> <20130323131828.2053bee2@pitrou.net> <514DA23B.9010301@egenix.com> <514DA367.5030605@gmx.de> <514DA599.5090309@egenix.com> Message-ID: On Sun, Mar 24, 2013 at 1:09 AM, Joao S. O. Bueno wrote: >> Hmm. Here's a mad thought - a bit of latticed casementing, if you >> like. Could the compiled regexes be stored in the .pyc file? That >> already has version tagging done. All it'd take is some sort of >> extension mechanism that says "hey, here's some additional data that >> the pyc might want to make use of". Or would that overly complicate >> matters? > > I can't see how this could be achieved but for adding a special > syntax that would compile reg-exps at parsing time. Then, we might as well use > Perl instead :-) Yeah, that's the most obvious form - some kind of regex literal syntax. I was thinking, though, that there might be some sort of extension to the pyc format that lets any module add precompiled data to it; the trouble would then be figuring out how to recognize what ought to get dumped into the pyc. It'd effectively need to be something that gets added to the code like: foo = re.compile('fo+') bar = re.compile('ba+r') re.precompile(foo,bar) That could then pre-populate some kind of cache that gets loaded with the pyc, and then when re.compile() gets a particular string, it looks it up in the cache and finds the precompiled version. Of course, this would quite possibly be more effort than it's worth. Complicating the pyc format in this way needs a lot of justification. ChrisA From masklinn at masklinn.net Sat Mar 23 15:35:18 2013 From: masklinn at masklinn.net (Masklinn) Date: Sat, 23 Mar 2013 15:35:18 +0100 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes In-Reply-To: <20130323143433.1ef5fd71@pitrou.net> References: <514CDBCC.4080407@gmx.de> <30264E4D-C98E-45DE-B1F3-9BA5B7E9C6EC@masklinn.net> <20130323143433.1ef5fd71@pitrou.net> Message-ID: <565593C4-00EB-4D2E-933E-F5EDE4D8CCD7@masklinn.net> On 2013-03-23, at 14:34 , Antoine Pitrou wrote: > On Sat, 23 Mar 2013 14:26:30 +0100 > Masklinn wrote: >> >> Wouldn't it be better if there are *few* different regexes? Since the >> module itself caches 512 expressions (100 in Python 2) and does not use >> an LRU or other "smart" cache (it just clears the whole cache dict once >> the limit is breached as far as I can see), *and* any explicit call to >> re.compile will *still* use the internal cache (meaning even going >> through re.compile will count against the _MAXCACHE limit), all regex >> uses throughout the application (including standard library &al) will >> count against the built-in cache and increase the chance of the regex >> we want cached to be thrown out no? > > Well, it mostly sounds like the re cache should be made a bit smarter. It should, but even with that I think it makes sense to explicitly cache regexps in the application, the re cache feels like an optimization more than semantics. Either that, or the re module should provide an instantiable cache object for lazy compilation and caching of regexps e.g. re.local_cache(maxsize=None) which would return an lru-caching proxy to re. Thus the caching of a module's regexps would be under the control of the module using them if desired (and important), and urllib.parse could fix its existing "ugly" pattern by using import re re = re.local_cache() and removing the conditional compile calls (or even the compile call and using re-level functions) Optionally, the cache could take e.g. an *args of regexp to precompile at module load/cache creation. From solipsis at pitrou.net Sat Mar 23 15:46:02 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 23 Mar 2013 15:46:02 +0100 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes References: <514CDBCC.4080407@gmx.de> <30264E4D-C98E-45DE-B1F3-9BA5B7E9C6EC@masklinn.net> <20130323143433.1ef5fd71@pitrou.net> <565593C4-00EB-4D2E-933E-F5EDE4D8CCD7@masklinn.net> Message-ID: <20130323154602.17eb66dd@pitrou.net> On Sat, 23 Mar 2013 15:35:18 +0100 Masklinn wrote: > > On 2013-03-23, at 14:34 , Antoine Pitrou wrote: > > > On Sat, 23 Mar 2013 14:26:30 +0100 > > Masklinn wrote: > >> > >> Wouldn't it be better if there are *few* different regexes? Since the > >> module itself caches 512 expressions (100 in Python 2) and does not use > >> an LRU or other "smart" cache (it just clears the whole cache dict once > >> the limit is breached as far as I can see), *and* any explicit call to > >> re.compile will *still* use the internal cache (meaning even going > >> through re.compile will count against the _MAXCACHE limit), all regex > >> uses throughout the application (including standard library &al) will > >> count against the built-in cache and increase the chance of the regex > >> we want cached to be thrown out no? > > > > Well, it mostly sounds like the re cache should be made a bit smarter. > > It should, but even with that I think it makes sense to explicitly cache > regexps in the application, the re cache feels like an optimization more > than semantics. Well, of course it is. A cache *is* an optimization. > Either that, or the re module should provide an instantiable cache object > for lazy compilation and caching of regexps e.g. > re.local_cache(maxsize=None) which would return an lru-caching proxy to > re. Thus the caching of a module's regexps would be under the control of > the module using them if desired (and important) IMO that's the wrong way to think about it. The whole point of a cache is that the higher levels don't have to think about it. Your CPU has L1, L2 and sometimes L3 caches so that you don't have to allocate your critical data structures in separate "faster" memory areas. That said, if you really want to manage your own cache, it should already be easy to do so using functools.lru_cache() (or any implementation of your choice). The re module doesn't have to provide a dedicated caching primitive. But, really, the point of a cache is to optimize performance *without* you tinkering with it. Regards Antoine. From masklinn at masklinn.net Sat Mar 23 16:30:39 2013 From: masklinn at masklinn.net (Masklinn) Date: Sat, 23 Mar 2013 16:30:39 +0100 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes In-Reply-To: <20130323154602.17eb66dd@pitrou.net> References: <514CDBCC.4080407@gmx.de> <30264E4D-C98E-45DE-B1F3-9BA5B7E9C6EC@masklinn.net> <20130323143433.1ef5fd71@pitrou.net> <565593C4-00EB-4D2E-933E-F5EDE4D8CCD7@masklinn.net> <20130323154602.17eb66dd@pitrou.net> Message-ID: <25020592-032D-485A-9929-BC2D63445662@masklinn.net> On 2013-03-23, at 15:46 , Antoine Pitrou wrote: > On Sat, 23 Mar 2013 15:35:18 +0100 > Masklinn wrote: >> >> On 2013-03-23, at 14:34 , Antoine Pitrou wrote: >> >>> On Sat, 23 Mar 2013 14:26:30 +0100 >>> Masklinn wrote: >>>> >>>> Wouldn't it be better if there are *few* different regexes? Since the >>>> module itself caches 512 expressions (100 in Python 2) and does not use >>>> an LRU or other "smart" cache (it just clears the whole cache dict once >>>> the limit is breached as far as I can see), *and* any explicit call to >>>> re.compile will *still* use the internal cache (meaning even going >>>> through re.compile will count against the _MAXCACHE limit), all regex >>>> uses throughout the application (including standard library &al) will >>>> count against the built-in cache and increase the chance of the regex >>>> we want cached to be thrown out no? >>> >>> Well, it mostly sounds like the re cache should be made a bit smarter. >> >> It should, but even with that I think it makes sense to explicitly cache >> regexps in the application, the re cache feels like an optimization more >> than semantics. > > Well, of course it is. A cache *is* an optimization. > >> Either that, or the re module should provide an instantiable cache object >> for lazy compilation and caching of regexps e.g. >> re.local_cache(maxsize=None) which would return an lru-caching proxy to >> re. Thus the caching of a module's regexps would be under the control of >> the module using them if desired (and important) > > IMO that's the wrong way to think about it. The whole point of a cache > is that the higher levels don't have to think about it. Your CPU has > L1, L2 and sometimes L3 caches so that you don't have to allocate your > critical data structures in separate "faster" memory areas. > > That said, if you really want to manage your own cache, it should > already be easy to do so using functools.lru_cache() (or any > implementation of your choice). The re module doesn't have to provide a > dedicated caching primitive. > > But, really, the point of a cache is to optimize performance *without* > you tinkering with it. Right, but in this case while I called it a cache the semantics really is a lazy singleton: only create the regex object when it's needed, but keep it around once it's been created. The issue with a "proper cache" is that it performs via heuristics and may or may not correctly improve performances as the heuristics will never match all possible programs with the ideal behavior. If it's known that we want to keep compiled regexps used by a module memoized (which is what the current urrlib.parse code does/assumes) cache semantics don't really work as ? depending on the rest of the application ? the cached module regexps may get evicted, unless the cache has an unlimited size. From solipsis at pitrou.net Sat Mar 23 16:33:49 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 23 Mar 2013 16:33:49 +0100 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes References: <514CDBCC.4080407@gmx.de> <30264E4D-C98E-45DE-B1F3-9BA5B7E9C6EC@masklinn.net> <20130323143433.1ef5fd71@pitrou.net> <565593C4-00EB-4D2E-933E-F5EDE4D8CCD7@masklinn.net> <20130323154602.17eb66dd@pitrou.net> <25020592-032D-485A-9929-BC2D63445662@masklinn.net> Message-ID: <20130323163349.5c45532a@pitrou.net> On Sat, 23 Mar 2013 16:30:39 +0100 Masklinn wrote: > > Right, but in this case while I called it a cache the semantics really > is a lazy singleton: only create the regex object when it's needed, but > keep it around once it's been created. Perhaps we need a functools.lazy_compute() function: pat = functools.lazy_compute(re.compile, r"my very long regex") Regards Antoine. From ncoghlan at gmail.com Sat Mar 23 17:08:38 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 23 Mar 2013 09:08:38 -0700 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes In-Reply-To: <20130323163349.5c45532a@pitrou.net> References: <514CDBCC.4080407@gmx.de> <30264E4D-C98E-45DE-B1F3-9BA5B7E9C6EC@masklinn.net> <20130323143433.1ef5fd71@pitrou.net> <565593C4-00EB-4D2E-933E-F5EDE4D8CCD7@masklinn.net> <20130323154602.17eb66dd@pitrou.net> <25020592-032D-485A-9929-BC2D63445662@masklinn.net> <20130323163349.5c45532a@pitrou.net> Message-ID: On Sat, Mar 23, 2013 at 8:33 AM, Antoine Pitrou wrote: > On Sat, 23 Mar 2013 16:30:39 +0100 > Masklinn wrote: >> >> Right, but in this case while I called it a cache the semantics really >> is a lazy singleton: only create the regex object when it's needed, but >> keep it around once it's been created. > > Perhaps we need a functools.lazy_compute() function: > > pat = functools.lazy_compute(re.compile, r"my very long regex") As in something like: def compute_once(f, *args, **kwds): value = not_called = object() @wraps(f): def compute_on_demand(): nonlocal value if value is not_called: value = f(*args, **kwds) return value return compute_on_demand _pattern = compute_once(re.compile, r"my very long regex") def use_pattern(data): return _pattern().search(data) Runtime overhead is then just the identity check for the initial sentinel value. The difference with both functools.partial and functools.lru_cache is that it wouldn't support customisation at call time - you have to fully define the operation up front, the only thing you're deferring is the actual call. That call will only happen once, with all subsequent calls returning the same value as the initial call. This is what allows the call time overhead to be stripped back to almost nothing. Seems like a reasonable justification for a dedicated tool to me. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From masklinn at masklinn.net Sat Mar 23 19:00:03 2013 From: masklinn at masklinn.net (Masklinn) Date: Sat, 23 Mar 2013 19:00:03 +0100 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes In-Reply-To: <20130323163349.5c45532a@pitrou.net> References: <514CDBCC.4080407@gmx.de> <30264E4D-C98E-45DE-B1F3-9BA5B7E9C6EC@masklinn.net> <20130323143433.1ef5fd71@pitrou.net> <565593C4-00EB-4D2E-933E-F5EDE4D8CCD7@masklinn.net> <20130323154602.17eb66dd@pitrou.net> <25020592-032D-485A-9929-BC2D63445662@masklinn.net> <20130323163349.5c45532a@pitrou.net> Message-ID: <0A25DB87-D589-4FA9-ABD7-4378558BF895@masklinn.net> On 2013-03-23, at 16:33 , Antoine Pitrou wrote: > On Sat, 23 Mar 2013 16:30:39 +0100 > Masklinn wrote: >> >> Right, but in this case while I called it a cache the semantics really >> is a lazy singleton: only create the regex object when it's needed, but >> keep it around once it's been created. > > Perhaps we need a functools.lazy_compute() function: > > pat = functools.lazy_compute(re.compile, r"my very long regex") Yes, I'm not even sure the argument is needed: functools.partial (or a lambda) can be used for that and lazy_compute/compute_once would take a function of arity 0 as parameter. From stefan_ml at behnel.de Sat Mar 23 20:02:41 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 23 Mar 2013 20:02:41 +0100 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes In-Reply-To: <30264E4D-C98E-45DE-B1F3-9BA5B7E9C6EC@masklinn.net> References: <514CDBCC.4080407@gmx.de> <30264E4D-C98E-45DE-B1F3-9BA5B7E9C6EC@masklinn.net> Message-ID: Masklinn, 23.03.2013 14:26: > On 2013-03-23, at 03:00 , Nick Coghlan wrote: >> On Fri, Mar 22, 2013 at 3:42 PM, Gregory P. Smith wrote: >>> On Fri, Mar 22, 2013 at 3:31 PM, Ronny Pfannschmidt wrote: >>>> while reviewing urllib.parse i noticed a pretty ugly pattern >>>> many functions had an attached global and in their own code they would >>>> compile an regex on first use and assign it to that global >>>> >>>> its clear that compiling a regex is expensive, so having them be compiled >>>> later at first use would be of some benefit >>> >>> It isn't expensive to do, it is expensive to do repeatedly for no reason. >>> Thus the use of compiled regexes. Code like this would be better off >>> refactored to reference a precompiled global rather than conditionally check >>> if it needs compiling every time it is called. >> >> Alternatively, if there are a lot of different regexes, it may be >> better to rely on the implicit cache inside the re module. > > Wouldn't it be better if there are *few* different regexes? Since the > module itself caches 512 expressions (100 in Python 2) and does not use > an LRU or other "smart" cache (it just clears the whole cache dict once > the limit is breached as far as I can see), *and* any explicit call to > re.compile will *still* use the internal cache (meaning even going > through re.compile will count against the _MAXCACHE limit), all regex > uses throughout the application (including standard library &al) will > count against the built-in cache and increase the chance of the regex > we want cached to be thrown out no? Remember that any precompiled regex that got thrown out of the cache will be rebuilt as soon as it's being used. So the problem only ever arises when you really have more than _MAXCACHE different regexes that are all being used within the same loop, and even then, they'd have to be used in (mostly) the same order to draw the cache completely useless. That's a very rare case, IMHO. In all other cases, whenever the number of different regexes that are being used within a loop is lower than _MAXCACHE, the cache will immediately bring a substantial net win. And if a regex is not being used in a loop, then it's really unlikely that its compilation time will dominate the runtime of your application (assuming that your application is doing more than just compiling regexes...). Stefan From mal at egenix.com Sat Mar 23 20:41:49 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Sat, 23 Mar 2013 20:41:49 +0100 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes In-Reply-To: References: <514CDBCC.4080407@gmx.de> <20130323131828.2053bee2@pitrou.net> <514DA23B.9010301@egenix.com> <514DA367.5030605@gmx.de> <514DA599.5090309@egenix.com> Message-ID: <514E057D.3000803@egenix.com> On 23.03.2013 14:53, Ezio Melotti wrote: > On Sat, Mar 23, 2013 at 2:52 PM, M.-A. Lemburg wrote: >> On 23.03.2013 13:43, Ronny Pfannschmidt wrote: >>> >>>> Wouldn't it make sense to add a way to pickle or marshal compiled REs ? >>>> >>>> The precompiled REs could then be loaded directly from the >>>> pickle, avoiding the compiling overhead on startup. >>>> >>> >>> as far as i can tell that would need regex as part of the syntax to make sense fort use in modules >>> i dont think such a change would be accepted and i dont even what to deal with the potential >>> bikeshedding for such an integration >> >> I wasn't thinking of making it part of the Python byte-code. >> >> It would suffice to add pickle/marshal support for the >> compiled RE code. This could then be loaded from a string >> embedded in the module code on startup. >> >> E.g. >> # rx = re.compile('.*') >> rx = pickle.loads('asdfsadfasdf') >> > > According to http://bugs.python.org/issue11454#msg170697, this would > be twice as slow. RE objects can already be pickled and that's also what was measured in that message. It doesn't actually pickle the RE "byte" code, though. Instead it just pickles the pattern and the flags and does a complete recompile when unpickling the RE object. I was talking about actually pickling the RE "byte" code that the re module generates to avoid the overhead of having to recompile the pattern. I'm pretty sure this would be faster :-) >> It would also be possible to seed the re module cache with >> such pickle.loads, perhaps compiled at Python build time. >> This would avoid having to change code in the stdlib to >> load pickles. On 23.03.2013 14:36, Oleg Broytman wrote: > On Sat, Mar 23, 2013 at 01:38:19PM +0100, "M.-A. Lemburg" wrote: >> Wouldn't it make sense to add a way to pickle or marshal compiled REs ? >> >> The precompiled REs could then be loaded directly from the >> pickle, avoiding the compiling overhead on startup. > > But with an overhead of opening files and unpickling. My wild guess > [not backed by numbers] is that it would be as slow. > > I suspect the only way to speedup things would be to precompile > regexps at compile (generating .pyc) time and saving compiled regexps in > the byte code file. The patterns used in the stdlib could be precompiled at Python build time and then stored away in a separate module, say _re_cache_preloader.py. This module would then be used to seed the re module cache. Since the cache works by looking at the patterns, no recompile would happen when calling re.compile() on these patterns. The approach is similar to the way the sysconfig module information is cached in a separate module to reduce startup time (something we also did in eGenix PyRun to improve startup time - and because we had to do it anyway, since there are no Makefile available to parse when running pyrun). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 23 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-03-13: Released eGenix pyOpenSSL 0.13 ... http://egenix.com/go39 ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From solipsis at pitrou.net Sat Mar 23 20:41:59 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 23 Mar 2013 20:41:59 +0100 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes References: <514CDBCC.4080407@gmx.de> <20130323131828.2053bee2@pitrou.net> <514DA23B.9010301@egenix.com> <514DA367.5030605@gmx.de> <514DA599.5090309@egenix.com> <514E057D.3000803@egenix.com> Message-ID: <20130323204159.39ae7c5f@pitrou.net> On Sat, 23 Mar 2013 20:41:49 +0100 "M.-A. Lemburg" wrote: > On 23.03.2013 14:53, Ezio Melotti wrote: > > On Sat, Mar 23, 2013 at 2:52 PM, M.-A. Lemburg wrote: > >> On 23.03.2013 13:43, Ronny Pfannschmidt wrote: > >>> > >>>> Wouldn't it make sense to add a way to pickle or marshal compiled REs ? > >>>> > >>>> The precompiled REs could then be loaded directly from the > >>>> pickle, avoiding the compiling overhead on startup. > >>>> > >>> > >>> as far as i can tell that would need regex as part of the syntax to make sense fort use in modules > >>> i dont think such a change would be accepted and i dont even what to deal with the potential > >>> bikeshedding for such an integration > >> > >> I wasn't thinking of making it part of the Python byte-code. > >> > >> It would suffice to add pickle/marshal support for the > >> compiled RE code. This could then be loaded from a string > >> embedded in the module code on startup. > >> > >> E.g. > >> # rx = re.compile('.*') > >> rx = pickle.loads('asdfsadfasdf') > >> > > > > According to http://bugs.python.org/issue11454#msg170697, this would > > be twice as slow. > > RE objects can already be pickled and that's also what was > measured in that message. It doesn't actually pickle > the RE "byte" code, though. Instead it just pickles the > pattern and the flags and does a complete recompile when > unpickling the RE object. > > I was talking about actually pickling the RE "byte" code > that the re module generates to avoid the overhead of > having to recompile the pattern. The problem is that for pickles to be durable, you would then need some kind of compatibility guarantee for the re bytecode. Otherwise you might add the bytecode version number to the pickle, and then ignore the bytecode when loading the pickle and the current version number is different; but that would mean people would lose the benefit of caching without being warned, which would make performance more fickle. Regards Antoine. From mal at egenix.com Sat Mar 23 22:19:02 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Sat, 23 Mar 2013 22:19:02 +0100 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes In-Reply-To: <20130323204159.39ae7c5f@pitrou.net> References: <514CDBCC.4080407@gmx.de> <20130323131828.2053bee2@pitrou.net> <514DA23B.9010301@egenix.com> <514DA367.5030605@gmx.de> <514DA599.5090309@egenix.com> <514E057D.3000803@egenix.com> <20130323204159.39ae7c5f@pitrou.net> Message-ID: <514E1C46.7030006@egenix.com> On 23.03.2013 20:41, Antoine Pitrou wrote: > On Sat, 23 Mar 2013 20:41:49 +0100 > "M.-A. Lemburg" wrote: >> On 23.03.2013 14:53, Ezio Melotti wrote: >>> On Sat, Mar 23, 2013 at 2:52 PM, M.-A. Lemburg wrote: >>>> On 23.03.2013 13:43, Ronny Pfannschmidt wrote: >>>>> >>>>>> Wouldn't it make sense to add a way to pickle or marshal compiled REs ? >>>>>> >>>>>> The precompiled REs could then be loaded directly from the >>>>>> pickle, avoiding the compiling overhead on startup. >>>>>> >>>>> >>>>> as far as i can tell that would need regex as part of the syntax to make sense fort use in modules >>>>> i dont think such a change would be accepted and i dont even what to deal with the potential >>>>> bikeshedding for such an integration >>>> >>>> I wasn't thinking of making it part of the Python byte-code. >>>> >>>> It would suffice to add pickle/marshal support for the >>>> compiled RE code. This could then be loaded from a string >>>> embedded in the module code on startup. >>>> >>>> E.g. >>>> # rx = re.compile('.*') >>>> rx = pickle.loads('asdfsadfasdf') >>>> >>> >>> According to http://bugs.python.org/issue11454#msg170697, this would >>> be twice as slow. >> >> RE objects can already be pickled and that's also what was >> measured in that message. It doesn't actually pickle >> the RE "byte" code, though. Instead it just pickles the >> pattern and the flags and does a complete recompile when >> unpickling the RE object. >> >> I was talking about actually pickling the RE "byte" code >> that the re module generates to avoid the overhead of >> having to recompile the pattern. > > The problem is that for pickles to be durable, you would then need some > kind of compatibility guarantee for the re bytecode. > > Otherwise you might add the bytecode version number to the pickle, and > then ignore the bytecode when loading the pickle and the current > version number is different; but that would mean people would lose the > benefit of caching without being warned, which would make performance > more fickle. Hmm, I'm not following you. The patterns would get compiled once at Python build time when installing the stdlib. The bytecode version wouldn't change for those compiled patterns - unless, of course, you upgrade to a new Python version, but then you'd rebuild the bytecode versions of the REs :-) To make them generally useful, I agree, you would have to add a RE compiler version to the bytecode pickle, but AFAICS this should not affect the usefulness for the stdlib RE cache. The whole idea is really very similar to the Python VM bytecode caching Python is using to speedup imports of modules. Perhaps we could have a GSoC student give it a try and see whether it makes results in noticable startup time speedups ?! -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 23 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-03-13: Released eGenix pyOpenSSL 0.13 ... http://egenix.com/go39 ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From solipsis at pitrou.net Sat Mar 23 22:20:42 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 23 Mar 2013 22:20:42 +0100 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes References: <514CDBCC.4080407@gmx.de> <20130323131828.2053bee2@pitrou.net> <514DA23B.9010301@egenix.com> <514DA367.5030605@gmx.de> <514DA599.5090309@egenix.com> <514E057D.3000803@egenix.com> <20130323204159.39ae7c5f@pitrou.net> <514E1C46.7030006@egenix.com> Message-ID: <20130323222042.56b09593@pitrou.net> On Sat, 23 Mar 2013 22:19:02 +0100 "M.-A. Lemburg" wrote: > > Hmm, I'm not following you. The patterns would get compiled once > at Python build time when installing the stdlib. The bytecode > version wouldn't change for those compiled patterns - unless, of > course, you upgrade to a new Python version, but then you'd > rebuild the bytecode versions of the REs :-) > > To make them generally useful, I agree, you would have to add a > RE compiler version to the bytecode pickle, but AFAICS this > should not affect the usefulness for the stdlib RE cache. Ah, you're talking only about the stdlib. Well, sure, that would work, but we have to remember to regenerate those pickles by hand each time the re bytecode is updated (which doesn't happen often, admittedly). That's a bit of a maintenance burden. > The whole idea is really very similar to the Python VM bytecode > caching Python is using to speedup imports of modules. Except that the VM bytecode caching works automatically and transparently :-) > Perhaps we could have a GSoC student give it a try and see > whether it makes results in noticable startup time speedups ?! That's a rather smallish topic for a GSoC project, IMHO. Regards Antoine. From mal at egenix.com Sat Mar 23 22:31:31 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Sat, 23 Mar 2013 22:31:31 +0100 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes In-Reply-To: <20130323222042.56b09593@pitrou.net> References: <514CDBCC.4080407@gmx.de> <20130323131828.2053bee2@pitrou.net> <514DA23B.9010301@egenix.com> <514DA367.5030605@gmx.de> <514DA599.5090309@egenix.com> <514E057D.3000803@egenix.com> <20130323204159.39ae7c5f@pitrou.net> <514E1C46.7030006@egenix.com> <20130323222042.56b09593@pitrou.net> Message-ID: <514E1F33.2060406@egenix.com> On 23.03.2013 22:20, Antoine Pitrou wrote: > On Sat, 23 Mar 2013 22:19:02 +0100 > "M.-A. Lemburg" wrote: >> >> Hmm, I'm not following you. The patterns would get compiled once >> at Python build time when installing the stdlib. The bytecode >> version wouldn't change for those compiled patterns - unless, of >> course, you upgrade to a new Python version, but then you'd >> rebuild the bytecode versions of the REs :-) >> >> To make them generally useful, I agree, you would have to add a >> RE compiler version to the bytecode pickle, but AFAICS this >> should not affect the usefulness for the stdlib RE cache. > > Ah, you're talking only about the stdlib. > Well, sure, that would work, but we have to remember to regenerate > those pickles by hand each time the re bytecode is updated (which > doesn't happen often, admittedly). That's a bit of a maintenance burden. No, that would happen at build time automatically. setup.py would create the module with the pickled RE bytecodes by scanning the stdlib modules for RE patterns, the re module would use this to seed its cache. That's the high-level idea. I'm sure there are a few pitfalls along the way :-) >> The whole idea is really very similar to the Python VM bytecode >> caching Python is using to speedup imports of modules. > > Except that the VM bytecode caching works automatically and > transparently :-) Should be the same for the REs in the stdlib. The user wouldn't notice (except for the speedup hopefully). Code in the stdlib compiling the REs wouldn't need to be touched either, since the cache in the re module would simply reuse the compiled versions. >> Perhaps we could have a GSoC student give it a try and see >> whether it makes results in noticable startup time speedups ?! > > That's a rather smallish topic for a GSoC project, IMHO. Well, you could extend it by adding some RE optimization tasks on top of it :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 23 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-03-13: Released eGenix pyOpenSSL 0.13 ... http://egenix.com/go39 ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From bruce at leapyear.org Sun Mar 24 00:03:19 2013 From: bruce at leapyear.org (Bruce Leban) Date: Sat, 23 Mar 2013 16:03:19 -0700 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes In-Reply-To: <514CDBCC.4080407@gmx.de> References: <514CDBCC.4080407@gmx.de> Message-ID: To summarize: - compiling regexes is slow so applications frequently compute it once and save it - compiling all the regexes at startup slows down startup for regexes that may never be used - a common pattern is to compute once at time of use and it would be nice to optimize this pattern - the regex library has a cache feature which means that frequently it will be optimized automatically - however, there's no guarantee that the regex you care about won't fall out of the cache. I think this addresses all the issues better than compute_lazy: re.compile(r'...', keep=True) When keep=True is specified, the regex library keeps the cached value for the lifetime of the process. The regex is computed only once on first use and you don't need to create a place to store it. Furthermore, if you use the same regex in more than one place, once with keep=True, the other uses will automatically be optimized. --- Bruce Latest blog post: Alice's Puzzle Page http://www.vroospeak.com Learn how hackers think: http://j.mp/gruyere-security -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg at krypto.org Sun Mar 24 00:14:56 2013 From: greg at krypto.org (Gregory P. Smith) Date: Sat, 23 Mar 2013 16:14:56 -0700 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes In-Reply-To: References: <514CDBCC.4080407@gmx.de> Message-ID: keep=True defeats the purpose of a caching strategy. An re.compile call within some code somewhere is typically not in a position to know if it is going to be called a lot. I think the code, as things are now, with dynamic construction at runtime based on a simple test is the best of both worlds to avoid the more complicated cost of calling re.compile and going through its cache logic. If the caching is ever is improved in the future to be faster, the code can arguably be simplified to use re.search or re.match directly and rely solely on the caching. ie: don't change anything. On Sat, Mar 23, 2013 at 4:03 PM, Bruce Leban wrote: > To summarize: > > - compiling regexes is slow so applications frequently compute it once and > save it > - compiling all the regexes at startup slows down startup for regexes that > may never be used > - a common pattern is to compute once at time of use and it would be nice > to optimize this pattern > - the regex library has a cache feature which means that frequently it > will be optimized automatically > - however, there's no guarantee that the regex you care about won't fall > out of the cache. > > I think this addresses all the issues better than compute_lazy: > > re.compile(r'...', keep=True) > > When keep=True is specified, the regex library keeps the cached value for > the lifetime of the process. The regex is computed only once on first use > and you don't need to create a place to store it. Furthermore, if you use > the same regex in more than one place, once with keep=True, the other uses > will automatically be optimized. > > --- Bruce > Latest blog post: Alice's Puzzle Page http://www.vroospeak.com > Learn how hackers think: http://j.mp/gruyere-security > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bruce at leapyear.org Sun Mar 24 00:34:26 2013 From: bruce at leapyear.org (Bruce Leban) Date: Sat, 23 Mar 2013 16:34:26 -0700 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes In-Reply-To: References: <514CDBCC.4080407@gmx.de> Message-ID: On Sat, Mar 23, 2013 at 4:14 PM, Gregory P. Smith wrote: > keep=True defeats the purpose of a caching strategy. An re.compile call > within some code somewhere is typically not in a position to know if it is > going to be called a lot. > > I think the code, as things are now, with dynamic construction at runtime > based on a simple test is the best of both worlds to avoid the more > complicated cost of calling re.compile and going through its cache logic. > If the caching is ever is improved in the future to be faster, the code > can arguably be simplified to use re.search or re.match directly and rely > solely on the caching. > > ie: don't change anything. > > Truth is people are currently doing caching themselves, by compiling and then keeping the compiled regex. Saying they're not in a position to know whether or not to do that isn't going to change that. Is it worthwhile having the regex library facilitate this manual caching? --- Bruce Latest blog post: Alice's Puzzle Page http://www.vroospeak.com Learn how hackers think: http://j.mp/gruyere-security -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg at krypto.org Sun Mar 24 00:48:47 2013 From: greg at krypto.org (Gregory P. Smith) Date: Sat, 23 Mar 2013 16:48:47 -0700 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes In-Reply-To: References: <514CDBCC.4080407@gmx.de> Message-ID: On Sat, Mar 23, 2013 at 4:34 PM, Bruce Leban wrote: > > On Sat, Mar 23, 2013 at 4:14 PM, Gregory P. Smith wrote: > >> keep=True defeats the purpose of a caching strategy. An re.compile call >> within some code somewhere is typically not in a position to know if it is >> going to be called a lot. >> >> I think the code, as things are now, with dynamic construction at runtime >> based on a simple test is the best of both worlds to avoid the more >> complicated cost of calling re.compile and going through its cache logic. >> If the caching is ever is improved in the future to be faster, the code >> can arguably be simplified to use re.search or re.match directly and rely >> solely on the caching. >> >> ie: don't change anything. >> >> > Truth is people are currently doing caching themselves, by compiling and > then keeping the compiled regex. Saying they're not in a position to know > whether or not to do that isn't going to change that. Is it worthwhile > having the regex library facilitate this manual caching? > In the absense of profiling numbers showing otherwise, i'd rather see all forms of manual caching like the conditional checks or a keep=True go away as it's dirty and encourages premature "optimization". -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Sun Mar 24 00:48:42 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 24 Mar 2013 00:48:42 +0100 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes References: <514CDBCC.4080407@gmx.de> Message-ID: <20130324004842.24cd98e7@pitrou.net> On Sat, 23 Mar 2013 16:48:47 -0700 "Gregory P. Smith" wrote: > On Sat, Mar 23, 2013 at 4:34 PM, Bruce Leban wrote: > > > > > On Sat, Mar 23, 2013 at 4:14 PM, Gregory P. Smith wrote: > > > >> keep=True defeats the purpose of a caching strategy. An re.compile call > >> within some code somewhere is typically not in a position to know if it is > >> going to be called a lot. > >> > >> I think the code, as things are now, with dynamic construction at runtime > >> based on a simple test is the best of both worlds to avoid the more > >> complicated cost of calling re.compile and going through its cache logic. > >> If the caching is ever is improved in the future to be faster, the code > >> can arguably be simplified to use re.search or re.match directly and rely > >> solely on the caching. > >> > >> ie: don't change anything. > >> > >> > > Truth is people are currently doing caching themselves, by compiling and > > then keeping the compiled regex. Saying they're not in a position to know > > whether or not to do that isn't going to change that. Is it worthwhile > > having the regex library facilitate this manual caching? > > > > In the absense of profiling numbers showing otherwise, i'd rather see all > forms of manual caching like the conditional checks or a keep=True go away > as it's dirty and encourages premature "optimization". Agreed. Regards Antoine. From eliben at gmail.com Sun Mar 24 04:39:28 2013 From: eliben at gmail.com (Eli Bendersky) Date: Sat, 23 Mar 2013 20:39:28 -0700 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes In-Reply-To: References: <514CDBCC.4080407@gmx.de> Message-ID: On Sat, Mar 23, 2013 at 4:03 PM, Bruce Leban wrote: > To summarize: > > - compiling regexes is slow so applications frequently compute it once and > save it > - compiling all the regexes at startup slows down startup for regexes that > may never be used > - a common pattern is to compute once at time of use and it would be nice > to optimize this pattern > - the regex library has a cache feature which means that frequently it > will be optimized automatically > - however, there's no guarantee that the regex you care about won't fall > out of the cache. > > I think this addresses all the issues better than compute_lazy: > > re.compile(r'...', keep=True) > > When keep=True is specified, the regex library keeps the cached value for > the lifetime of the process. The regex is computed only once on first use > and you don't need to create a place to store it. Furthermore, if you use > the same regex in more than one place, once with keep=True, the other uses > will automatically be optimized. > Nice summary. The real problem is, I think, that many developers are not aware of the default caching done by the re module. I have a hunch that if this was better known, fewer manual optimization attempts would spring up. How about examining what the size of that re cache is, and how much memory it typically occupies. Perhaps this cache can be changed to fit more regexes? Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Sun Mar 24 08:38:32 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 24 Mar 2013 08:38:32 +0100 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes In-Reply-To: References: <514CDBCC.4080407@gmx.de> Message-ID: Gregory P. Smith, 24.03.2013 00:48: > On Sat, Mar 23, 2013 at 4:34 PM, Bruce Leban wrote: >> On Sat, Mar 23, 2013 at 4:14 PM, Gregory P. Smith wrote: >>> keep=True defeats the purpose of a caching strategy. An re.compile call >>> within some code somewhere is typically not in a position to know if it is >>> going to be called a lot. >>> >>> I think the code, as things are now, with dynamic construction at runtime >>> based on a simple test is the best of both worlds to avoid the more >>> complicated cost of calling re.compile and going through its cache logic. >>> If the caching is ever is improved in the future to be faster, the code >>> can arguably be simplified to use re.search or re.match directly and rely >>> solely on the caching. >>> >>> ie: don't change anything. >> >> Truth is people are currently doing caching themselves, by compiling and >> then keeping the compiled regex. Saying they're not in a position to know >> whether or not to do that isn't going to change that. Is it worthwhile >> having the regex library facilitate this manual caching? > > In the absense of profiling numbers showing otherwise, i'd rather see all > forms of manual caching like the conditional checks or a keep=True go away > as it's dirty and encourages premature "optimization". +1 If I had been "more aware" of the re internal cache during the last years, I would have avoided at least a couple of re.compile() calls in my code, I guess. Maybe this is something that the documentation of re.compile() can help with, by telling people explicitly that this apparently cool feature of pre-compiling actually has a drawback in it (startup time + a bit of memory usage) and that they won't notice a runtime difference in most cases anyway. Stefan From stefan_ml at behnel.de Sun Mar 24 08:48:26 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 24 Mar 2013 08:48:26 +0100 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes In-Reply-To: References: <514CDBCC.4080407@gmx.de> Message-ID: Eli Bendersky, 24.03.2013 04:39: > How about examining what the size of that re cache is, and how much memory > it typically occupies. Perhaps this cache can be changed to fit more > regexes? The problem is that there is no "default workload" to measure against. If a stupid dispatch engine automatically generates tons of regexes to compare, say, URLs against, you can make the cache as large as you want and it won't help. I doubt that that's a serious use case, though - combining all of them into a single regex (and then actually pre-compiling that statically instead of relying on the cache) would be way smarter. Apart from extreme cases like the above, a cache size of 512 sounds *plenty* large, more like a "we don't know, so make it large" kind of choice. Maybe measuring could actually help in making it *smaller*, but I guess that's simply not worth it. If the space is not used, it won't grow that large anyway. Stefan From Ronny.Pfannschmidt at gmx.de Mon Mar 25 20:30:52 2013 From: Ronny.Pfannschmidt at gmx.de (Ronny Pfannschmidt) Date: Mon, 25 Mar 2013 20:30:52 +0100 Subject: [Python-ideas] rebooting the lazy regex discussion as something stricty more narrow for lazy expensively computed globals Message-ID: <5150A5EC.9040200@gmx.de> Hi, since the previous discussions raised lots of bikeshedding in maintenance-pain directions about pick/marshal in sourcecode, i'd like to reboot the discussion in a more narrow scope the pattern is that the stdlib has lazy-computed private globals in various modules i would like to propose an alternative way of doing those instead of code such as _hostprog = None def splithost(url): """splithost('//host[:port]/path') --> 'host[:port]', '/path'.""" global _hostprog if _hostprog is None: import re _hostprog = re.compile('^//([^/?]*)(.*)$') ... i would prefer to see code like from functools import lazy_global @lazy_global def _hostprog(): return re.compile('^//([^/?]*)(.*)$') as far as i can tell the implementation for simple cases of expensive things will not need smart proxying, just some getattr hook an untested example implementation could thus look like the following (note that it would end up creating the object twice in thread races a lock may be necessary) class LazyGlobal(object) def __init__(self, func): self.__func = func def __getattr__(self, name): try: obj = self.__computed except AttributeError: self.__computed = obj = self.__func() # replace myself in the module scope to get rid of indirection # in the except block cause if someone imports us we need to stay here self.__func.__globals__[self.__func.__name__] = obj return getattr(obj, name) From steve at pearwood.info Tue Mar 26 00:55:01 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 26 Mar 2013 10:55:01 +1100 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes In-Reply-To: <514CDBCC.4080407@gmx.de> References: <514CDBCC.4080407@gmx.de> Message-ID: <5150E3D5.3070008@pearwood.info> On 23/03/13 09:31, Ronny Pfannschmidt wrote: > Hi, > > while reviewing urllib.parse i noticed a pretty ugly pattern > > many functions had an attached global and in their own code they would compile an regex on first use and assign it to that global Since there are only eight of them, and none of them are particularly big or complicated regexes, I believe that they should be unconditionally compiled at startup. E.g. instead of this code: # from urllib.parse _typeprog = None def splittype(url): """splittype('type:opaquestring') --> 'type', 'opaquestring'.""" global _typeprog if _typeprog is None: import re _typeprog = re.compile('^([^/:]+):') match = _typeprog.match(url) [...] something like this is much simpler and more obvious: _typeprog = re.compile('^([^/:]+):') def splittype(url): """splittype('type:opaquestring') --> 'type', 'opaquestring'.""" match = _typeprog.match(url) [...] or we can get rid of the global altogether: def splittype(url, _typeprog=re.compile('^([^/:]+):')): """splittype('type:opaquestring') --> 'type', 'opaquestring'.""" match = _typeprog.match(url) [...] > its clear that compiling a regex is expensive, so having them be compiled later at first use would be of some benefit Sounds like premature optimization to me. I've extracted out the regex patterns, and timed how long it takes to compile them. On my computer, it's a one-off cost of about 3 milliseconds. Here's the code I used to time it: # === start === import re import timeit patterns = [ ('^([^/:]+):', ), ('^//([^/?]*)(.*)$', ), ('^(.*)@(.*)$', ), ('^([^:]*):(.*)$', re.S), ('^(.*):([0-9]+)$', ), ('^(.*):(.*)$', ), ('^(.*)\?([^?]*)$', ), ('^(.*)#([^#]*)$', ), ('^([^=]*)=(.*)$', ), ] setup = "from __main__ import re, patterns" t1 = timeit.Timer(""" re.purge() for pat in patterns: re.compile(*pat) """, setup) t2 = timeit.Timer(""" re.purge() for pat in patterns: pass """, setup) print("compiling urllib.parse patterns:") a = min(t1.repeat(number=1000, repeat=7)) b = min(t2.repeat(number=1000, repeat=7)) print(a-b, "ms") # === end === Admittedly this would (probably) increase the time taken to import urllib.parse by about 50% (from 5-6ms to 8-9ms), but I don't see this as significant. It's not even a real cost -- you still have to compile the patterns at some point, the only question is whether they are done up front or as needed. -- Steven From g.rodola at gmail.com Tue Mar 26 20:37:16 2013 From: g.rodola at gmail.com (=?ISO-8859-1?Q?Giampaolo_Rodol=E0?=) Date: Tue, 26 Mar 2013 20:37:16 +0100 Subject: [Python-ideas] Asking for feedback about issue 17552 (socket.sendfile()) Message-ID: http://bugs.python.org/issue17552 I hope it's alright to bother python-ideas asking for feedback but considering the design decisions involved I thought it would have been legitimate. Please feel free to comment on the issue rather than here. Thanks in advance, - Giampaolo https://code.google.com/p/pyftpdlib/ https://code.google.com/p/psutil/ https://code.google.com/p/pysendfile/ From brian at python.org Tue Mar 26 20:49:20 2013 From: brian at python.org (Brian Curtin) Date: Tue, 26 Mar 2013 14:49:20 -0500 Subject: [Python-ideas] Google Summer of Code - Organization Deadline Approaching - March 29 Message-ID: Just an FYI that there are under 3 days to apply to Google Summer of Code for mentoring organizations: http://www.google-melange.com/gsoc/homepage/google/gsoc2013. The student application deadline is later on in May. If you run a project that is interested in applying under the Python umbrella organization, contact Terri Oda at terri at zone12.com From ndbecker2 at gmail.com Wed Mar 27 16:47:45 2013 From: ndbecker2 at gmail.com (Neal Becker) Date: Wed, 27 Mar 2013 11:47:45 -0400 Subject: [Python-ideas] argparse - add support for environment variables (a proposal) References: <71869e7d-605c-4599-8010-0c195e86e982@googlegroups.com> Message-ID: Here's an idea. We have 4 main sources of config: 1. app defaults 2. config file 3. env var 4. command line Instead of adding anything to the code for each of these parsers, suppose that each of them accepted a dictionary of options in a common form, so that they could be composed easily. For example, suppose we call config file parser 1st, and it returns a dict of what items are set in the config file (not the defaults). Then argparse is called (without using and defaults), passing it that dictionary, which it can add to or overide. Finally, options not in the dict get defaults applied. Having not dug into technical details, I'm imagining that either arparse already can accept a dict or options, or could be easily modified to. Or, we simply call argparse normally, then take it's dict and we merge the dicts outside of arparse. From miki.tebeka at gmail.com Wed Mar 27 18:15:35 2013 From: miki.tebeka at gmail.com (Miki Tebeka) Date: Wed, 27 Mar 2013 10:15:35 -0700 (PDT) Subject: [Python-ideas] argparse - add support for environment variables (a proposal) In-Reply-To: References: <71869e7d-605c-4599-8010-0c195e86e982@googlegroups.com> Message-ID: IMO one env which is a ChainMap ( http://docs.python.org/dev/library/collections#collections.ChainMap) will be enough. This will give you more flexibility and you'll be able to chain more "environments" if needed. On Wednesday, March 27, 2013 8:47:45 AM UTC-7, Neal Becker wrote: > > Here's an idea. We have 4 main sources of config: > > 1. app defaults > 2. config file > 3. env var > 4. command line > > Instead of adding anything to the code for each of these parsers, suppose > that > each of them accepted a dictionary of options in a common form, so that > they > could be composed easily. > > For example, suppose we call config file parser 1st, and it returns a dict > of > what items are > set in the config file (not the defaults). Then argparse is called > (without > using and defaults), passing it that dictionary, which it can add to or > overide. > Finally, options not in the dict get defaults applied. > > Having not dug into technical details, I'm imagining that either arparse > already > can accept a dict or options, or could be easily modified to. Or, we > simply call > argparse normally, then take it's dict and we merge the dicts outside of > arparse. > > _______________________________________________ > Python-ideas mailing list > Python... at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Thu Mar 28 00:06:36 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 27 Mar 2013 19:06:36 -0400 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes In-Reply-To: References: <514CDBCC.4080407@gmx.de> Message-ID: On 3/24/2013 3:38 AM, Stefan Behnel wrote: > Gregory P. Smith, 24.03.2013 00:48: >> In the absense of profiling numbers showing otherwise, i'd rather see all >> forms of manual caching like the conditional checks or a keep=True go away >> as it's dirty and encourages premature "optimization". > > +1 > > If I had been "more aware" of the re internal cache during the last years, > I would have avoided at least a couple of re.compile() calls in my code, I > guess. > > Maybe this is something that the documentation of re.compile() can help > with, by telling people explicitly that this apparently cool feature of > pre-compiling actually has a drawback in it (startup time + a bit of memory > usage) and that they won't notice a runtime difference in most cases anyway. With a decent re cache size, .compile seems more like an attractive nuisance that something useful. -- Terry Jan Reedy From steve at pearwood.info Thu Mar 28 02:25:17 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 28 Mar 2013 12:25:17 +1100 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes In-Reply-To: References: <514CDBCC.4080407@gmx.de> Message-ID: <51539BFD.60901@pearwood.info> On 28/03/13 10:06, Terry Reedy wrote: > On 3/24/2013 3:38 AM, Stefan Behnel wrote: >> Gregory P. Smith, 24.03.2013 00:48: >>> In the absense of profiling numbers showing otherwise, i'd rather see all >>> forms of manual caching like the conditional checks or a keep=True go away >>> as it's dirty and encourages premature "optimization". >> >> +1 >> >> If I had been "more aware" of the re internal cache during the last years, >> I would have avoided at least a couple of re.compile() calls in my code, I >> guess. >> >> Maybe this is something that the documentation of re.compile() can help >> with, by telling people explicitly that this apparently cool feature of >> pre-compiling actually has a drawback in it (startup time + a bit of memory >> usage) and that they won't notice a runtime difference in most cases anyway. > > With a decent re cache size, .compile seems more like an attractive nuisance that something useful. On the contrary, I think that it is the cache which is an (unattractive) nuisance. Like any cache, performance is only indirectly under your control. You cannot know for sure whether re.match(some_pattern, text) will be a cheap cache hit or an expensive re-compilation. All you can do is keep increasing the size of the cache until the chance of a cache miss is "low enough", whatever that means for you, and hope. I cannot think of any object in the Python standard library where the recommended API is to repeatedly convert from strings each time you need the object. We do this: x = Decimal(some_string) y = x**3 z = x.exp() not this: y = Decimal(some_string)**3 z = Decimal(some_string).exp() hoping that the string will be in a cache and the conversion will be fast. So why do we do this? result = re.match(some_string, text) other_result = re.match(some_string, other_text) -- Steven From rosuav at gmail.com Thu Mar 28 02:38:42 2013 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 28 Mar 2013 12:38:42 +1100 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes In-Reply-To: <51539BFD.60901@pearwood.info> References: <514CDBCC.4080407@gmx.de> <51539BFD.60901@pearwood.info> Message-ID: On Thu, Mar 28, 2013 at 12:25 PM, Steven D'Aprano wrote: > We do this: > > x = Decimal(some_string) > y = x**3 > z = x.exp() > > not this: > > y = Decimal(some_string)**3 > z = Decimal(some_string).exp() > > hoping that the string will be in a cache and the conversion will be fast. > So why do we do this? > > result = re.match(some_string, text) > other_result = re.match(some_string, other_text) Would it be better if, instead of: pat = re.compile(some_string) it were spelled: pat = re.RegExp(some_string) ? It'd match Decimal and so on, while still being exactly the same thing ultimately - you turn the textual regex into an object. ChrisA From steve at pearwood.info Thu Mar 28 02:56:20 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 28 Mar 2013 12:56:20 +1100 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes In-Reply-To: References: <514CDBCC.4080407@gmx.de> <51539BFD.60901@pearwood.info> Message-ID: <5153A344.5050706@pearwood.info> On 28/03/13 12:38, Chris Angelico wrote: > On Thu, Mar 28, 2013 at 12:25 PM, Steven D'Aprano wrote: >> We do this: >> >> x = Decimal(some_string) >> y = x**3 >> z = x.exp() >> >> not this: >> >> y = Decimal(some_string)**3 >> z = Decimal(some_string).exp() >> >> hoping that the string will be in a cache and the conversion will be fast. >> So why do we do this? >> >> result = re.match(some_string, text) >> other_result = re.match(some_string, other_text) > > Would it be better if, instead of: > > pat = re.compile(some_string) > > it were spelled: > > pat = re.RegExp(some_string) > > ? It'd match Decimal and so on, while still being exactly the same > thing ultimately - you turn the textual regex into an object. No, you seem to have missed my point. I don't care whether we have a builder like re.compile() that turns a string into a regular expression object, or we use the RegExp type constructor directly. What I care about is that we don't recommend that people rely on the cache to handle that conversion, as Terry seems to be suggesting. -- Steven From rosuav at gmail.com Thu Mar 28 03:06:42 2013 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 28 Mar 2013 13:06:42 +1100 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes In-Reply-To: <5153A344.5050706@pearwood.info> References: <514CDBCC.4080407@gmx.de> <51539BFD.60901@pearwood.info> <5153A344.5050706@pearwood.info> Message-ID: On Thu, Mar 28, 2013 at 12:56 PM, Steven D'Aprano wrote: > On 28/03/13 12:38, Chris Angelico wrote: >> >> On Thu, Mar 28, 2013 at 12:25 PM, Steven D'Aprano >> wrote: >>> >>> We do this: >>> >>> x = Decimal(some_string) >>> y = x**3 >>> z = x.exp() >>> >>> not this: >>> >>> y = Decimal(some_string)**3 >>> z = Decimal(some_string).exp() >>> >>> hoping that the string will be in a cache and the conversion will be >>> fast. >>> So why do we do this? >>> >>> result = re.match(some_string, text) >>> other_result = re.match(some_string, other_text) >> >> >> Would it be better if, instead of: >> >> pat = re.compile(some_string) >> >> it were spelled: >> >> pat = re.RegExp(some_string) >> >> ? It'd match Decimal and so on, while still being exactly the same >> thing ultimately - you turn the textual regex into an object. > > > > No, you seem to have missed my point. > > I don't care whether we have a builder like re.compile() that turns a string > into a regular expression object, or we use the RegExp type constructor > directly. What I care about is that we don't recommend that people rely on > the cache to handle that conversion, as Terry seems to be suggesting. Yes, that's what I mean. If compile were renamed RegExp and the cache abolished, would people find this odd, or would they be happy to do their own compiled-regex retention? I suspect the latter. There could still be (non-caching) re.match and friends, but it'd be understood that they are less efficient for multiple usage. ChrisA From shane at umbrellacode.com Thu Mar 28 03:29:17 2013 From: shane at umbrellacode.com (Shane Green) Date: Wed, 27 Mar 2013 19:29:17 -0700 Subject: [Python-ideas] =?windows-1252?q?list=85pushed=2C_or_something?= Message-ID: I'm not sure if there's anything inherently wrong with this idea, and I am well aware how incredibly easy it is to implement as an extension of the built-ins, but, I find it very useful to have variations of list().append(obj) and set().add(obj) that return, obj. I've never come up with perfect method names; sometimes I go with past-tense versions, "added" and "appended" (or "pushed"). I often use these methods in places where performance is a factor, such as using a set() to filter repeats from an input sequence. I was thinking it may be worth considering adding it to core, or perhaps creating collections types with these features, so they are high performance and standardized. They can be particularly useful in generator recipes, etc. Shane Green www.umbrellacode.com 408-692-4666 | shane at umbrellacode.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Thu Mar 28 03:42:57 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 28 Mar 2013 11:42:57 +0900 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes In-Reply-To: References: <514CDBCC.4080407@gmx.de> Message-ID: <87fvzgb5xq.fsf@uwakimon.sk.tsukuba.ac.jp> Terry Reedy writes: > With a decent re cache size, .compile seems more like an attractive > nuisance that something useful. In applications like text editors, you often have scads of ephemeral regexps (entered by users) but some you'd like to keep around (python_keyword_re, for example). I don't know if this is actually a performance hit in such applications, but like Steven d'A, I'm not comfortable in relying on a cache for performance-critical apps. It seems to me that the cache is a backstop to reduce the chance that an application repeatedly compiles a given regexp in an inner loop, but can't guarantee elimination of all such performance problems. I'm also not clear on why you consider it an attractive nuisance. It's annoying enough that people will only do it for regexps they consider important, so I doubt people will be compiling thousands of regexps that only get used once in a blue moon. What's the loss to having the facility available (aside from the overhead of maintenance and documentation -- "attractive nuisance" implies users with holes in their feet :-)? From tjreedy at udel.edu Thu Mar 28 05:49:24 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 28 Mar 2013 00:49:24 -0400 Subject: [Python-ideas] =?utf-8?q?list=E2=80=A6pushed=2C_or_something?= In-Reply-To: References: Message-ID: On 3/27/2013 10:29 PM, Shane Green wrote: > I'm not sure if there's anything inherently wrong with this idea, and I > am well aware how incredibly easy it is to implement as an extension of > the built-ins, but, I find it very useful to have /variations/ of > list().append(obj) and set().add(obj) that return, obj. There are other people who agree with you, but it is Guido's design decision from the beginning of Python that mutation methods do not return the object mutated. -- Terry Jan Reedy From guido at python.org Thu Mar 28 05:52:46 2013 From: guido at python.org (Guido van Rossum) Date: Wed, 27 Mar 2013 21:52:46 -0700 Subject: [Python-ideas] =?windows-1252?q?list=85pushed=2C_or_something?= In-Reply-To: References: Message-ID: Also, we shouldn't introduce multiple ways of spelling the same operation if we can help it. On Wednesday, March 27, 2013, Terry Reedy wrote: > On 3/27/2013 10:29 PM, Shane Green wrote: > >> I'm not sure if there's anything inherently wrong with this idea, and I >> am well aware how incredibly easy it is to implement as an extension of >> the built-ins, but, I find it very useful to have /variations/ of >> list().append(obj) and set().add(obj) that return, obj. >> > > There are other people who agree with you, but it is Guido's design > decision from the beginning of Python that mutation methods do not return > the object mutated. > > -- > Terry Jan Reedy > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From cs at zip.com.au Thu Mar 28 05:27:48 2013 From: cs at zip.com.au (Cameron Simpson) Date: Thu, 28 Mar 2013 15:27:48 +1100 Subject: [Python-ideas] =?utf-8?q?list=E2=80=A6pushed=2C_or_something?= In-Reply-To: References: Message-ID: <20130328042748.GA61954@cskk.homeip.net> On 27Mar2013 19:29, Shane Green wrote: | I'm not sure if there's anything inherently wrong with this idea, and I am well aware | how incredibly easy it is to implement as an extension of the built-ins, but, I find it | very useful to have variations of list().append(obj) and set().add(obj) that return, | obj. I've never come up with perfect method names; sometimes I go with past-tense | versions, "added" and "appended" (or "pushed"). I often use these methods in places | where performance is a factor, such as using a set() to filter repeats from an input | sequence. I don't suppose you could post example code? I suspect the thinking goes that functions/methods that do something shouldn't just be "identity" functions where f(x) == x. And if they are defined to be identity functions you may as well return None, since the calling code isn't learning anything new. At any rate, that would be part of my thinking, and for me it is enough to not favour the proposal without a better demo of where it is a win, or of how horribly I have misconstrued your suggestion. Cheers, -- If everyone is thinking alike, then someone isn't thinking. - Patton From shane at umbrellacode.com Thu Mar 28 06:11:04 2013 From: shane at umbrellacode.com (Shane Green) Date: Wed, 27 Mar 2013 22:11:04 -0700 Subject: [Python-ideas] =?windows-1252?q?list=85pushed=2C_or_something?= In-Reply-To: References: Message-ID: <51B85D9B-BFEC-4134-BECD-F8A8043B6A6B@umbrellacode.com> Actually this was the value doing the mutating, i.e, the i pushed or added, would be returned as operation's output. I see what you're saying, and I wasn't thinking of chaining so much as being able to include the operation in a expression and, in particular, a generator scenario. Ordered sequence of unique values from sequence with possible repeats [seen.added(value) for value in sequence if value not in seen] * * again, please forgive the method names, I was hoping to crowd source something better ;-) Replaces workarounds like: [seen.setdefault(value, value) for value in sequence if value not in seen] or seen = dict(sequence); [unique.pop(value) for value in sequence if value in unique] Like I said, it's incredibly simple to just extend set and list and add these in user space. Shane Green www.umbrellacode.com 408-692-4666 | shane at umbrellacode.com On Mar 27, 2013, at 9:49 PM, Terry Reedy wrote: > On 3/27/2013 10:29 PM, Shane Green wrote: >> I'm not sure if there's anything inherently wrong with this idea, and I >> am well aware how incredibly easy it is to implement as an extension of >> the built-ins, but, I find it very useful to have /variations/ of >> list().append(obj) and set().add(obj) that return, obj. > > There are other people who agree with you, but it is Guido's design decision from the beginning of Python that mutation methods do not return the object mutated. > > -- > Terry Jan Reedy > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From bruce at leapyear.org Thu Mar 28 06:28:41 2013 From: bruce at leapyear.org (Bruce Leban) Date: Wed, 27 Mar 2013 22:28:41 -0700 Subject: [Python-ideas] =?windows-1252?q?list=85pushed=2C_or_something?= In-Reply-To: <51B85D9B-BFEC-4134-BECD-F8A8043B6A6B@umbrellacode.com> References: <51B85D9B-BFEC-4134-BECD-F8A8043B6A6B@umbrellacode.com> Message-ID: On Wed, Mar 27, 2013 at 10:11 PM, Shane Green wrote: > [seen.added(value) for value in sequence if value not in seen] * > Here's an easy way to do it: >>> seen = set() >>> seq = [3,2,1,2,3,4,5,4] >>> [seen.add(v) or v for v in seq if v not in seen] [3, 2, 1, 4, 5] >>> seen {1, 2, 3, 4, 5} --- Bruce Latest blog post: Alice's Puzzle Page http://www.vroospeak.com Learn how hackers think: http://j.mp/gruyere-security -------------- next part -------------- An HTML attachment was scrubbed... URL: From shane at umbrellacode.com Thu Mar 28 06:48:11 2013 From: shane at umbrellacode.com (Shane Green) Date: Wed, 27 Mar 2013 22:48:11 -0700 Subject: [Python-ideas] =?windows-1252?q?list=85pushed=2C_or_something?= In-Reply-To: References: <51B85D9B-BFEC-4134-BECD-F8A8043B6A6B@umbrellacode.com> Message-ID: <5F47CC16-F708-4823-8DE1-605BCFA653A7@umbrellacode.com> That's clever: even works for zero because it's returned by or as second false. Cool. So I suppose I have to come up with more examples now ;-) Actually, on that point, I actually think the seen.added(value) (with a better name) is quite a bit cleaner than the one using "or". Clever as it is, I think someone learning the language would flinch when they saw that? :-) Shane Green www.umbrellacode.com 408-692-4666 | shane at umbrellacode.com On Mar 27, 2013, at 10:28 PM, Bruce Leban wrote: > > On Wed, Mar 27, 2013 at 10:11 PM, Shane Green wrote: > [seen.added(value) for value in sequence if value not in seen] * > > Here's an easy way to do it: > > >>> seen = set() > >>> seq = [3,2,1,2,3,4,5,4] > >>> [seen.add(v) or v for v in seq if v not in seen] > [3, 2, 1, 4, 5] > >>> seen > {1, 2, 3, 4, 5} > > > --- Bruce > Latest blog post: Alice's Puzzle Page http://www.vroospeak.com > Learn how hackers think: http://j.mp/gruyere-security -------------- next part -------------- An HTML attachment was scrubbed... URL: From bruce at leapyear.org Thu Mar 28 06:58:54 2013 From: bruce at leapyear.org (Bruce Leban) Date: Wed, 27 Mar 2013 22:58:54 -0700 Subject: [Python-ideas] =?windows-1252?q?list=85pushed=2C_or_something?= In-Reply-To: <5F47CC16-F708-4823-8DE1-605BCFA653A7@umbrellacode.com> References: <51B85D9B-BFEC-4134-BECD-F8A8043B6A6B@umbrellacode.com> <5F47CC16-F708-4823-8DE1-605BCFA653A7@umbrellacode.com> Message-ID: On Wed, Mar 27, 2013 at 10:48 PM, Shane Green wrote: > That's clever: even works for zero because it's returned by or as second > false. Cool. So I suppose I have to come up with more examples now ;-) > > Actually, on that point, I actually think the seen.added(value) (with a > better name) is quite a bit cleaner than the one using "or". Clever as it > is, I think someone learning the language would flinch when they saw that? > :-) > Yes, I suppose it's a bit obscure but you're only going to use it in a case where you value brevity over clarity, right? The C comma operator which always returns the second value, not depending that the first value is false. You can write this in Python as: (foo, bar)[1] or is there some cleaner way to write that? #define comma and False or :-) --- Bruce Latest blog post: Alice's Puzzle Page http://www.vroospeak.com Learn how hackers think: http://j.mp/gruyere-security -------------- next part -------------- An HTML attachment was scrubbed... URL: From shane at umbrellacode.com Thu Mar 28 07:54:39 2013 From: shane at umbrellacode.com (Shane Green) Date: Wed, 27 Mar 2013 23:54:39 -0700 Subject: [Python-ideas] =?windows-1252?q?list=85pushed=2C_or_something?= In-Reply-To: References: <51B85D9B-BFEC-4134-BECD-F8A8043B6A6B@umbrellacode.com> <5F47CC16-F708-4823-8DE1-605BCFA653A7@umbrellacode.com> Message-ID: <9032C471-1E2C-4C95-BCA5-4CC475C1D853@umbrellacode.com> Always makes me think of my favorite C snippet: if (attack = true) { launch_nukes(); } Well, I have to admit that I'm drawing a blank on all the great use cases I had in mind, and it didn't seem to light any particular fires with the list, so I'd say this isn't a compelling enough story to take anymore time with at the moment. If a good?and not obscure?example comes to mind, I'll send it out, otherwise, thanks for the feedback! Shane Green www.umbrellacode.com 408-692-4666 | shane at umbrellacode.com On Mar 27, 2013, at 10:58 PM, Bruce Leban wrote: > > On Wed, Mar 27, 2013 at 10:48 PM, Shane Green wrote: > That's clever: even works for zero because it's returned by or as second false. Cool. So I suppose I have to come up with more examples now ;-) > > Actually, on that point, I actually think the seen.added(value) (with a better name) is quite a bit cleaner than the one using "or". Clever as it is, I think someone learning the language would flinch when they saw that? :-) > > Yes, I suppose it's a bit obscure but you're only going to use it in a case where you value brevity over clarity, right? > > The C comma operator which always returns the second value, not depending that the first value is false. You can write this in Python as: > > (foo, bar)[1] > > or is there some cleaner way to write that? > > #define comma and False or > :-) > > --- Bruce > Latest blog post: Alice's Puzzle Page http://www.vroospeak.com > Learn how hackers think: http://j.mp/gruyere-security > -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Thu Mar 28 14:20:39 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 28 Mar 2013 14:20:39 +0100 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes References: <514CDBCC.4080407@gmx.de> <51539BFD.60901@pearwood.info> Message-ID: <20130328142039.4b829006@pitrou.net> On Thu, 28 Mar 2013 12:25:17 +1100 Steven D'Aprano wrote: > On 28/03/13 10:06, Terry Reedy wrote: > > On 3/24/2013 3:38 AM, Stefan Behnel wrote: > >> Gregory P. Smith, 24.03.2013 00:48: > >>> In the absense of profiling numbers showing otherwise, i'd rather see all > >>> forms of manual caching like the conditional checks or a keep=True go away > >>> as it's dirty and encourages premature "optimization". > >> > >> +1 > >> > >> If I had been "more aware" of the re internal cache during the last years, > >> I would have avoided at least a couple of re.compile() calls in my code, I > >> guess. > >> > >> Maybe this is something that the documentation of re.compile() can help > >> with, by telling people explicitly that this apparently cool feature of > >> pre-compiling actually has a drawback in it (startup time + a bit of memory > >> usage) and that they won't notice a runtime difference in most cases anyway. > > > > With a decent re cache size, .compile seems more like an attractive nuisance that something useful. > > > On the contrary, I think that it is the cache which is an (unattractive) nuisance. > > Like any cache, performance is only indirectly under your control. You cannot know for sure whether re.match(some_pattern, text) will be a cheap cache hit or an expensive re-compilation. CPython is full of caches so, if that's what you worry about, your problem is bigger than simply regex patterns. (your CPU is full of caches too) Regards Antoine. From shibturn at gmail.com Thu Mar 28 15:34:52 2013 From: shibturn at gmail.com (Richard Oudkerk) Date: Thu, 28 Mar 2013 14:34:52 +0000 Subject: [Python-ideas] re.compile_lazy - on first use compiled regexes In-Reply-To: <20130328142039.4b829006@pitrou.net> References: <514CDBCC.4080407@gmx.de> <51539BFD.60901@pearwood.info> <20130328142039.4b829006@pitrou.net> Message-ID: An alternative would be to use a lazy proxy. Given LazyProxy defined as below you can use pat = LazyProxy(re.compile, r"\w+") instead of pat = re.compile(r"\w+") This causes a minor slow down (~7%) if you use pat.search() in a loop. class LazyProxy(object): def __init__(self, func, *args): self.__func_args = (func, args) def __getattr__(self, name): try: obj = object.__getattribute__(self, '__obj') except AttributeError: func, args = self.__func_args obj = self.__obj = func(*args) res = getattr(obj, name) setattr(self, name, res) return res -- Richard From python at mrabarnett.plus.com Thu Mar 28 16:15:52 2013 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 28 Mar 2013 15:15:52 +0000 Subject: [Python-ideas] =?windows-1252?q?list=85pushed=2C_or_something?= In-Reply-To: References: <51B85D9B-BFEC-4134-BECD-F8A8043B6A6B@umbrellacode.com> Message-ID: <51545EA8.8070003@mrabarnett.plus.com> On 28/03/2013 05:28, Bruce Leban wrote: > > On Wed, Mar 27, 2013 at 10:11 PM, Shane Green > wrote: > > [seen.added(value) for value in sequence if value not in seen] * > > > Here's an easy way to do it: > > >>> seen = set() > >>> seq = [3,2,1,2,3,4,5,4] > >>> [seen.add(v) or v for v in seq if v not in seen] > [3, 2, 1, 4, 5] > >>> seen > {1, 2, 3, 4, 5} > I think I would prefer a "unique" function that yields unique items: def unique(items): seen = set() for item in items: if item not in seen: seen.add(item) yield item >>> seq = [3,2,1,2,3,4,5,4] >>> list(unique(seq)) [3, 2, 1, 4, 5] From ericsnowcurrently at gmail.com Fri Mar 29 05:26:53 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 28 Mar 2013 22:26:53 -0600 Subject: [Python-ideas] type.__signature__ (and Argument Clinic) Message-ID: We now have a useful function signature abstraction in 3.3 with PEP 362. An outgrowth of this has been Larry Hasting's efforts on Argument Clinic. One open question is (was?) how to give a docstring to the C versions of __new__ and __init__. Though in the morning I may wonder what I was thinking, here's a possibility that popped into my head: let classes have a __signature__ attribute that is the signature for meta.__call__, as well as cls.__new__ and cls.__init__. This makes sense since the three methods are, together, the default factory for instances of the type. One would expect them to have the same signature. Keeping that signature on the class parallels how it is kept on function objects (rather than on func.__call__). Thoughts? -eric From shane at umbrellacode.com Fri Mar 29 05:32:33 2013 From: shane at umbrellacode.com (Shane Green) Date: Thu, 28 Mar 2013 21:32:33 -0700 Subject: [Python-ideas] =?windows-1252?q?list=85pushed=2C_or_something?= In-Reply-To: <51545EA8.8070003@mrabarnett.plus.com> References: <51B85D9B-BFEC-4134-BECD-F8A8043B6A6B@umbrellacode.com> <51545EA8.8070003@mrabarnett.plus.com> Message-ID: Yes, that would be a nice function. The unique items from a ordered sequence was just supposed to be an example of something you could do with the concept I was going for: push/add methods that returned the value pushed/added, so a 'list.pushed(value)' could replace value inline. Of course it's turning out the unique items may best example which, in and of itselft, isn't very compelling argument for my original idea because there are better ways to solve this problem, as you've pointed out. On Mar 28, 2013, at 8:15 AM, MRAB wrote: > On 28/03/2013 05:28, Bruce Leban wrote: >> >> On Wed, Mar 27, 2013 at 10:11 PM, Shane Green > > wrote: >> >> [seen.added(value) for value in sequence if value not in seen] * >> >> >> Here's an easy way to do it: >> >> >>> seen = set() >> >>> seq = [3,2,1,2,3,4,5,4] >> >>> [seen.add(v) or v for v in seq if v not in seen] >> [3, 2, 1, 4, 5] >> >>> seen >> {1, 2, 3, 4, 5} >> > I think I would prefer a "unique" function that yields unique items: > > def unique(items): > seen = set() > > for item in items: > if item not in seen: > seen.add(item) > yield item > > >>> seq = [3,2,1,2,3,4,5,4] > >>> list(unique(seq)) > [3, 2, 1, 4, 5] > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From pyideas at rebertia.com Fri Mar 29 05:40:09 2013 From: pyideas at rebertia.com (Chris Rebert) Date: Thu, 28 Mar 2013 21:40:09 -0700 Subject: [Python-ideas] type.__signature__ (and Argument Clinic) In-Reply-To: References: Message-ID: On Thu, Mar 28, 2013 at 9:26 PM, Eric Snow wrote: > We now have a useful function signature abstraction in 3.3 with PEP > 362. An outgrowth of this has been Larry Hasting's efforts on > Argument Clinic. One open question is (was?) how to give a docstring > to the C versions of __new__ and __init__. In case anyone else doesn't follow python-dev and was wondering what Argument Clinic was: http://mail.python.org/pipermail/python-dev/2012-December/122920.html ? Chris From shane at umbrellacode.com Fri Mar 29 05:55:17 2013 From: shane at umbrellacode.com (Shane Green) Date: Thu, 28 Mar 2013 21:55:17 -0700 Subject: [Python-ideas] =?windows-1252?q?list=85pushed=2C_or_something?= In-Reply-To: <51545EA8.8070003@mrabarnett.plus.com> References: <51B85D9B-BFEC-4134-BECD-F8A8043B6A6B@umbrellacode.com> <51545EA8.8070003@mrabarnett.plus.com> Message-ID: And I was suggesting something along these lines would be generally useful (if not pointlessly easy): class Set(set): __slots__ = () def added(self, value): super(Set, self).add(value) return value def unique(items): seen = Set() return (seen.added(item) for item in items if item not in seen) On Mar 28, 2013, at 8:15 AM, MRAB wrote: > On 28/03/2013 05:28, Bruce Leban wrote: >> >> On Wed, Mar 27, 2013 at 10:11 PM, Shane Green > > wrote: >> >> [seen.added(value) for value in sequence if value not in seen] * >> >> >> Here's an easy way to do it: >> >> >>> seen = set() >> >>> seq = [3,2,1,2,3,4,5,4] >> >>> [seen.add(v) or v for v in seq if v not in seen] >> [3, 2, 1, 4, 5] >> >>> seen >> {1, 2, 3, 4, 5} >> > I think I would prefer a "unique" function that yields unique items: > > def unique(items): > seen = set() > > for item in items: > if item not in seen: > seen.add(item) > yield item > > >>> seq = [3,2,1,2,3,4,5,4] > >>> list(unique(seq)) > [3, 2, 1, 4, 5] > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Fri Mar 29 06:34:07 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 28 Mar 2013 23:34:07 -0600 Subject: [Python-ideas] type.__signature__ (and Argument Clinic) In-Reply-To: References: Message-ID: (+Mark and Larry who might not be on this list) On Thu, Mar 28, 2013 at 10:40 PM, Chris Rebert wrote: > On Thu, Mar 28, 2013 at 9:26 PM, Eric Snow wrote: >> We now have a useful function signature abstraction in 3.3 with PEP >> 362. An outgrowth of this has been Larry Hasting's efforts on >> Argument Clinic. One open question is (was?) how to give a docstring >> to the C versions of __new__ and __init__. > > In case anyone else doesn't follow python-dev and was wondering what > Argument Clinic was: > http://mail.python.org/pipermail/python-dev/2012-December/122920.html How funny. I hadn't noticed that all the discussions on Argument Clinic have taken place on python-dev. Thanks for pointing that out, Chris. You can also check out PEPs 436 and 437 [1][2]. -eric [1] http://www.python.org/dev/peps/pep-0436/ [2] http://www.python.org/dev/peps/pep-0437/ From jsbueno at python.org.br Fri Mar 29 12:40:16 2013 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Fri, 29 Mar 2013 08:40:16 -0300 Subject: [Python-ideas] =?utf-8?q?list=E2=80=A6pushed=2C_or_something?= In-Reply-To: References: Message-ID: On 27 March 2013 23:29, Shane Green wrote: > I'm not sure if there's anything inherently wrong with this idea, and I am > well aware how incredibly easy it is to implement as an extension of the > built-ins, but, I find it very useful to have variations of > list().append(obj) and set().add(obj) that return, obj. I've never come up > with perfect method names; sometimes I go with past-tense versions, "added" > and "appended" (or "pushed"). I often use these methods in places where > performance is a factor, such as using a set() to filter repeats from an > input sequence. > > I was thinking it may be worth considering adding it to core, or perhaps > creating collections types with these features, so they are high performance > and standardized. They can be particularly useful in generator recipes, > etc. I understand that the equivalent methods in other languages - sometimes frameworks int hose languages - allow one to chain method calls on the parent object, so they can happly write things along: list().append(1).append(0).sort().append(2) Python style, as well placed on the thread is that methods that perform changes to the underlying object return None, thus not allowing such constructs to mutable objects - even though one can happily do something like: image_name = url.split("/")[-1].split(".")[0] You can easily have the former behavior if you wrap your object in a construct that, whenever a called method would return "None", returns the original object itself. That could be placed in a utility module - and probably there is even some "MyHacks" package on pypi with functionality like that. If a naive implementation fits your needs, this one would work: class Chain: def __init__(self, obj, root=None): self.__obj = obj def __getattr__(self, attr): val = getattr(self.__obj, attr) if callable(val): self.__callable = val return self return val def __call__(self, *args, **kw): val = self.__callable(*args, **kw) if val is None: return self return val --------------- >>> a = [] >>> Chain(a).append(5).append(6).append(-1).sort().append(3) <__main__.Chain object at 0x12b6f50> >>> a [-1, 5, 6, 3] I'd be -0 for something like that on the stlib, though - but if it was there, I'd look around "functools" (but it is obviously more like an "objecttool") js -><- From tjreedy at udel.edu Fri Mar 29 19:00:05 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 29 Mar 2013 14:00:05 -0400 Subject: [Python-ideas] =?utf-8?q?list=E2=80=A6pushed=2C_or_something?= In-Reply-To: References: Message-ID: On 3/29/2013 7:40 AM, Joao S. O. Bueno wrote: > I understand that the equivalent methods in other languages - sometimes > frameworks int hose languages - allow one to chain method calls on the > parent object, so they can happly write things along: > > list().append(1).append(0).sort().append(2) > > Python style, as well placed on the thread is that methods that perform > changes to the underlying object return None, Unless the method returns something else other than the underlying mutable. Examples are list.pop, set.pop, dict.pop, and dict.popitem, which return the item (or pair) removed. -- Terry Jan Reedy From shane at umbrellacode.com Fri Mar 29 20:40:28 2013 From: shane at umbrellacode.com (Shane Green) Date: Fri, 29 Mar 2013 12:40:28 -0700 Subject: [Python-ideas] =?windows-1252?q?list=85pushed=2C_or_something?= In-Reply-To: References: Message-ID: <12B7289B-11F9-4449-BB0C-2F98246652A7@umbrellacode.com> Yes, these are good points. To be clear about one thing, though, what I was suggesting behaves very differently than the mechanism most other languages may use to enable chaining. Chaining is usually based around the operations on X, returning a reference to X after completion (of course this only makes sense for mutator methods, otherwise nothing would been done or gotten). Given my recommendation: X.pushed(Y) -> X IFF (and-only-if) Y is X. However, X.pushed(Y) -> Y is always true. The idea was not to be able chain invocations, but to be able to use theses operations inline in places the plain value, Y, would have appeared. It's such a trivial matter it's almost moot point anyhow... The only time "list.puhed(x)" wou On Mar 29, 2013, at 4:40 AM, Joao S. O. Bueno wrote: > On 27 March 2013 23:29, Shane Green wrote: >> I'm not sure if there's anything inherently wrong with this idea, and I am >> well aware how incredibly easy it is to implement as an extension of the >> built-ins, but, I find it very useful to have variations of >> list().append(obj) and set().add(obj) that return, obj. I've never come up >> with perfect method names; sometimes I go with past-tense versions, "added" >> and "appended" (or "pushed"). I often use these methods in places where >> performance is a factor, such as using a set() to filter repeats from an >> input sequence. >> >> I was thinking it may be worth considering adding it to core, or perhaps >> creating collections types with these features, so they are high performance >> and standardized. They can be particularly useful in generator recipes, >> etc. > > I understand that the equivalent methods in other languages - sometimes > frameworks int hose languages - allow one to chain method calls on the > parent object, so they can happly write things along: > > list().append(1).append(0).sort().append(2) > > Python style, as well placed on the thread is that methods that perform > changes to the underlying object return None, thus not allowing > such constructs to mutable objects - even though one can happily > do something like: > > image_name = url.split("/")[-1].split(".")[0] > > You can easily have the former behavior if you wrap your object > in a construct that, whenever a called method would return "None", > returns the original object itself. That could be placed in a utility > module - and probably there is even some "MyHacks" package on > pypi with functionality like that. > > If a naive implementation fits your needs, this one would work: > > > class Chain: > def __init__(self, obj, root=None): > self.__obj = obj > def __getattr__(self, attr): > val = getattr(self.__obj, attr) > if callable(val): > self.__callable = val > return self > return val > def __call__(self, *args, **kw): > val = self.__callable(*args, **kw) > if val is None: > return self > return val > > --------------- >>>> a = [] >>>> Chain(a).append(5).append(6).append(-1).sort().append(3) > <__main__.Chain object at 0x12b6f50> >>>> a > [-1, 5, 6, 3] > > > I'd be -0 for something like that on the stlib, though - but if it was there, > I'd look around "functools" (but it is obviously more like an "objecttool") > > > js > -><- From stephen at xemacs.org Sat Mar 30 01:12:41 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 30 Mar 2013 09:12:41 +0900 Subject: [Python-ideas] =?utf-8?q?list=E2=80=A6pushed=2C_or_something?= In-Reply-To: <12B7289B-11F9-4449-BB0C-2F98246652A7@umbrellacode.com> References: <12B7289B-11F9-4449-BB0C-2F98246652A7@umbrellacode.com> Message-ID: <87txntagp2.fsf@uwakimon.sk.tsukuba.ac.jp> Shane Green writes: > Given my recommendation: > X.pushed(Y) -> X IFF (and-only-if) Y is X. > > However, > X.pushed(Y) -> Y is always true. FWIW, Steve McConnell (Code Complete) recommends that X.pushed(foo()) quite often should be written as foo_value = foo() # but with a descriptive name! X.pushed(foo_value) # sic, probably pushed -> push? even if "foo_value" is only used once. It's not like you save anything but one line by putting foo() inside the parentheses, it's just a name binding that is resolved by the compiler anyway. My understanding of Guido's decisions is that he agrees with McConnell on this point.