From alon at horev.net Fri Jun 1 12:07:28 2012 From: alon at horev.net (Alon Horev) Date: Fri, 1 Jun 2012 13:07:28 +0300 Subject: [Python-ideas] setprofile and settrace inconsistency Message-ID: Hi, When setting a trace function with settrace, the trace function when called with a new scope can return another trace function or None, indicating the inner scope should not be traced. I used settrace for some time but calling the trace function for every line of code is a performance killer. So I moved on to setprofile, which calls a trace function every function entry/exit. now here's the problem: the return value from the trace function is ignored (intentionally), denying the possibility to skip tracing of 'hot' or 'not interesting' code. I would like to propose two alternatives: 1. setprofile will not ignore the return value and mimic settrace's behavior. 2. setprofile is just a wrapper around settrace that limits it's functionality, lets make settrace more flexible so setprofile will be redundant. here's how: settrace will recieve an argument called 'events', the trace function will fire only on events contained in that list. for example: setprofile = partial(settrace, events=['call', 'return']) I personally prefer the second. Some context to this issue: I'm building a python tracer - a logger that records each and every function call. In order for it to run in production systems, the overhead should be minimal. I would like to allow the user to say which function/module/classes to trace or skip, for example: the user will skip all math/cpu intensive operations. another example: the user will want to trace his django app code but not the django framework. your thoughts? Thanks, Alon Horev -------------- next part -------------- An HTML attachment was scrubbed... URL: From techtonik at gmail.com Fri Jun 1 17:08:21 2012 From: techtonik at gmail.com (anatoly techtonik) Date: Fri, 1 Jun 2012 18:08:21 +0300 Subject: [Python-ideas] stdlib crowdsourcing In-Reply-To: References: Message-ID: On Tue, May 29, 2012 at 9:02 AM, Nick Coghlan wrote: > Once again, you're completely ignoring all existing knowledge and > expertise on open collaboration and trying to reinvent the world. It's > *not going to happen*. It's too boring to live in a world of existing knowledge and expertise, and yes, I am not aware of any open collaboration stuff expertise. Any reading recommendations with concentrated knowledge that can fit my brain? > The standard library is just the curated core, and *yes*, it's damn > hard to get anything added to it (deliberately so). There's a place > where anyone can post anything they want, and see if others find it > useful: PyPI. The major drawbacks of remote packages in general is that it bring back project compilation from the old days. The biggest Python advantage at all times was "copy and run" ability. The drawbacks of PyPI for this proposal are: 1. every function you need will require a separate upload to PyPI 2. you can't upload function with the same stdlib name, but slightly different implementation as it is used in different projects 3. you can't find functions that people recommend to be included into stdlib 4. it is hard (impossible) to gather feedback on the quality of these proposals > The standard library provides tools to upload to PyPI, and, as of 3.3, > will even include tools to download and install from it. I am glad 3.3 is giving virtualenv and bootstrap stuff. It would really rock, if the new feature won't be settled in stone right after release and will gain a few UX iterations with allowed break-ability. As for PyPI, the major drawback of it is security - DNS attack for a couple of minutes, and one of your automatically deployed nodes is trojan ready. I remember PyPI password are stored in clear-text on developer's machine, but I don't remember if anyone turned off HTTP basic authorization on PyPI to protect passwords travelling to PyPI with every upload from intercepting. It would be an interesting exercise to sniff PyPI passwords over WiFi during next conference (i.e. https://ep2012.europython.eu/) and match those to the developer's accounts on *.python.org ;) > If you don't like our ecosystem (it's hard to tell whether or not you > do: everything you post is about how utterly awful and unusable > everything is, yet you're still here years later). You're absolutely right - I like the Python ecosystem, otherwise I wouldn't stick there. It is like a vintage car - awesome, nice looking, and there is even this new twisted pyusion engine inside, but.. well - it's not for youngsters. > If you think the PyPI UI is awful or inadequate, follow the example of > crate.io or pythonpackage.com and *create your own*. There's far more > to the Python universe than just core development, stop trying to > shoehorn everything into a place where it doesn't belong. I have absolutely no idea how aforementioned post touches PyPI UI. Speaking about PyPI enhancements and ecosystem, instead of reinventing bicycles I'd rather patch existing one. The only problem is that patches are not accepted. https://bitbucket.org/loewis/pypi/pull-request/1/fix-imports-add-logging-to-console-in From cenkalti at gmail.com Fri Jun 1 23:10:03 2012 From: cenkalti at gmail.com (=?UTF-8?Q?Cenk_Alt=C4=B1?=) Date: Sat, 2 Jun 2012 00:10:03 +0300 Subject: [Python-ideas] Adding list.pluck() Message-ID: Hello All, pluck() is a beautiful function which is in underscore.js library. Described as "A convenient version of what is perhaps the most common use-case for map: extracting a list of property values." http://documentcloud.github.com/underscore/#pluck What about it implementing for python lists? And maybe for other iterables? From phd at phdru.name Fri Jun 1 23:16:30 2012 From: phd at phdru.name (Oleg Broytman) Date: Sat, 2 Jun 2012 01:16:30 +0400 Subject: [Python-ideas] Adding list.pluck() In-Reply-To: References: Message-ID: <20120601211630.GA24569@iskra.aviel.ru> On Sat, Jun 02, 2012 at 12:10:03AM +0300, Cenk Alt?? wrote: > pluck() is a beautiful function which is in underscore.js library. > Described as "A convenient version of what is perhaps the most common > use-case for map: extracting a list of property values." > > http://documentcloud.github.com/underscore/#pluck > > What about it implementing for python lists? And maybe for other iterables? Like operator.attrgetter? http://docs.python.org/library/operator.html#operator.attrgetter Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From mikegraham at gmail.com Fri Jun 1 23:18:45 2012 From: mikegraham at gmail.com (Mike Graham) Date: Fri, 1 Jun 2012 17:18:45 -0400 Subject: [Python-ideas] Adding list.pluck() In-Reply-To: References: Message-ID: On Fri, Jun 1, 2012 at 5:10 PM, Cenk Alt? wrote: > Hello All, > > pluck() is a beautiful function which is in underscore.js library. > Described as "A convenient version of what is perhaps the most common > use-case for map: extracting a list of property values." > > http://documentcloud.github.com/underscore/#pluck > > What about it implementing for python lists? And maybe for other iterables? Using a generator expression or list comprehension to do this is so easy and readable I don't see why we'd want something new in Python. Mike From alexandre.zani at gmail.com Fri Jun 1 23:25:24 2012 From: alexandre.zani at gmail.com (Alexandre Zani) Date: Fri, 1 Jun 2012 14:25:24 -0700 Subject: [Python-ideas] Adding list.pluck() In-Reply-To: <20120601211630.GA24569@iskra.aviel.ru> References: <20120601211630.GA24569@iskra.aviel.ru> Message-ID: What if it's a list of objects instead of a list of dicts? List comprehension already makes this easy: [i['name'] for i in l] I don't think this would add as much in python as it adds in javascript. On Fri, Jun 1, 2012 at 2:16 PM, Oleg Broytman wrote: > On Sat, Jun 02, 2012 at 12:10:03AM +0300, Cenk Alt?? wrote: >> pluck() is a beautiful function which is in underscore.js library. >> Described as "A convenient version of what is perhaps the most common >> use-case for map: extracting a list of property values." >> >> http://documentcloud.github.com/underscore/#pluck >> >> What about it implementing for python lists? And maybe for other iterables? > > ? Like operator.attrgetter? > http://docs.python.org/library/operator.html#operator.attrgetter > > Oleg. > -- > ? ? Oleg Broytman ? ? ? ? ? ?http://phdru.name/ ? ? ? ? ? ?phd at phdru.name > ? ? ? ? ? Programmers don't die, they just GOSUB without RETURN. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From cenkalti at gmail.com Fri Jun 1 23:48:04 2012 From: cenkalti at gmail.com (=?UTF-8?Q?Cenk_Alt=C4=B1?=) Date: Sat, 2 Jun 2012 00:48:04 +0300 Subject: [Python-ideas] Adding list.pluck() In-Reply-To: References: <20120601211630.GA24569@iskra.aviel.ru> Message-ID: l.pluck('name') is more readable IMO. On Sat, Jun 2, 2012 at 12:25 AM, Alexandre Zani wrote: > What if it's a list of objects instead of a list of dicts? List > comprehension already makes this easy: > > [i['name'] for i in l] > > I don't think this would add as much in python as it adds in javascript. > > On Fri, Jun 1, 2012 at 2:16 PM, Oleg Broytman wrote: >> On Sat, Jun 02, 2012 at 12:10:03AM +0300, Cenk Alt?? wrote: >>> pluck() is a beautiful function which is in underscore.js library. >>> Described as "A convenient version of what is perhaps the most common >>> use-case for map: extracting a list of property values." >>> >>> http://documentcloud.github.com/underscore/#pluck >>> >>> What about it implementing for python lists? And maybe for other iterables? >> >> ? Like operator.attrgetter? >> http://docs.python.org/library/operator.html#operator.attrgetter >> >> Oleg. >> -- >> ? ? Oleg Broytman ? ? ? ? ? ?http://phdru.name/ ? ? ? ? ? ?phd at phdru.name >> ? ? ? ? ? Programmers don't die, they just GOSUB without RETURN. >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From alexandre.zani at gmail.com Sat Jun 2 00:53:48 2012 From: alexandre.zani at gmail.com (Alexandre Zani) Date: Fri, 1 Jun 2012 15:53:48 -0700 Subject: [Python-ideas] Adding list.pluck() In-Reply-To: References: <20120601211630.GA24569@iskra.aviel.ru> Message-ID: I must confess that I don't find "pluck" a very intuitive name for this functionality. For me it was evocative of what pop currently does. That's an N of 1 so maybe I'm just wrong on that one. More importantly, this would make the use of a list method dependent upon the type of the contained items. (works for dicts and nothing else) That would be unprecedented for list methods and potentially confusing. What would be the behavior if the list contains non-dicts? On Fri, Jun 1, 2012 at 2:48 PM, Cenk Alt? wrote: > l.pluck('name') is more readable IMO. > > On Sat, Jun 2, 2012 at 12:25 AM, Alexandre Zani > wrote: >> What if it's a list of objects instead of a list of dicts? List >> comprehension already makes this easy: >> >> [i['name'] for i in l] >> >> I don't think this would add as much in python as it adds in javascript. >> >> On Fri, Jun 1, 2012 at 2:16 PM, Oleg Broytman wrote: >>> On Sat, Jun 02, 2012 at 12:10:03AM +0300, Cenk Alt?? wrote: >>>> pluck() is a beautiful function which is in underscore.js library. >>>> Described as "A convenient version of what is perhaps the most common >>>> use-case for map: extracting a list of property values." >>>> >>>> http://documentcloud.github.com/underscore/#pluck >>>> >>>> What about it implementing for python lists? And maybe for other iterables? >>> >>> ? Like operator.attrgetter? >>> http://docs.python.org/library/operator.html#operator.attrgetter >>> >>> Oleg. >>> -- >>> ? ? Oleg Broytman ? ? ? ? ? ?http://phdru.name/ ? ? ? ? ? ?phd at phdru.name >>> ? ? ? ? ? Programmers don't die, they just GOSUB without RETURN. >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> http://mail.python.org/mailman/listinfo/python-ideas >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas From mwm at mired.org Sat Jun 2 01:07:04 2012 From: mwm at mired.org (Mike Meyer) Date: Fri, 1 Jun 2012 19:07:04 -0400 Subject: [Python-ideas] Adding list.pluck() In-Reply-To: References: <20120601211630.GA24569@iskra.aviel.ru> Message-ID: <20120601190704.319502c5@bhuda.mired.org> On Sat, 2 Jun 2012 00:48:04 +0300 Cenk Alt? wrote: > l.pluck('name') is more readable IMO. Only because you already associate "pluck" with that meaning. As others said, "pluck" to me implies something like "pop". The list comprehension spelling doesn't suffer from this problem, and provides a lot more flexibility. If you don't like list comprehensions, use map and the operator module. Even if it is more readable, it's more semantic load. It's another container operator (and one that's only useful in the special case of a list of maps) people have to learn. Since it saves 0 lines of code over either of existing mechanisms, the extra load comes for no advantage. -1 http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From malaclypse2 at gmail.com Sat Jun 2 01:16:37 2012 From: malaclypse2 at gmail.com (Jerry Hill) Date: Fri, 1 Jun 2012 19:16:37 -0400 Subject: [Python-ideas] Adding list.pluck() In-Reply-To: References: <20120601211630.GA24569@iskra.aviel.ru> Message-ID: On Fri, Jun 1, 2012 at 5:48 PM, Cenk Alt? wrote: > l.pluck('name') is more readable IMO. That's not how the library you linked to works, as far as I can tell. Based on the sample usage at http://documentcloud.github.com/underscore/#pluck, pluck is a function taking an iterable of dictionaries and the key, so I think the python equivalent is: def pluck(iterable, key): return [item[key] for item in iterable] stooges = [{'name' : 'moe', 'age' : 40}, {'name' : 'larry', 'age' : 50}, {'name' : 'curly', 'age' : 60}] print (pluck(stooges, 'name')) >>> ['moe', 'larry', 'curly'] -- Jerry From cenkalti at gmail.com Sat Jun 2 07:25:13 2012 From: cenkalti at gmail.com (=?UTF-8?Q?Cenk_Alt=C4=B1?=) Date: Sat, 2 Jun 2012 08:25:13 +0300 Subject: [Python-ideas] Adding list.pluck() In-Reply-To: References: <20120601211630.GA24569@iskra.aviel.ru> Message-ID: If the item has definded a method "__getitem__" it will be called, else "__getattribute__" is called. I know list comprehensions are same thing but .pluck seems easier to read and write I think (no need to write a temporary variable in list comprehension, and also square brackets). Just an idea... On Sat, Jun 2, 2012 at 1:53 AM, Alexandre Zani wrote: > I must confess that I don't find "pluck" a very intuitive name for > this functionality. For me it was evocative of what pop currently > does. That's an N of 1 so maybe I'm just wrong on that one. > > More importantly, this would make the use of a list method dependent > upon the type of the contained items. (works for dicts and nothing > else) That would be unprecedented for list methods and potentially > confusing. What would be the behavior if the list contains non-dicts? > > On Fri, Jun 1, 2012 at 2:48 PM, Cenk Alt? wrote: >> l.pluck('name') is more readable IMO. >> >> On Sat, Jun 2, 2012 at 12:25 AM, Alexandre Zani >> wrote: >>> What if it's a list of objects instead of a list of dicts? List >>> comprehension already makes this easy: >>> >>> [i['name'] for i in l] >>> >>> I don't think this would add as much in python as it adds in javascript. >>> >>> On Fri, Jun 1, 2012 at 2:16 PM, Oleg Broytman wrote: >>>> On Sat, Jun 02, 2012 at 12:10:03AM +0300, Cenk Alt?? wrote: >>>>> pluck() is a beautiful function which is in underscore.js library. >>>>> Described as "A convenient version of what is perhaps the most common >>>>> use-case for map: extracting a list of property values." >>>>> >>>>> http://documentcloud.github.com/underscore/#pluck >>>>> >>>>> What about it implementing for python lists? And maybe for other iterables? >>>> >>>> ? Like operator.attrgetter? >>>> http://docs.python.org/library/operator.html#operator.attrgetter >>>> >>>> Oleg. >>>> -- >>>> ? ? Oleg Broytman ? ? ? ? ? ?http://phdru.name/ ? ? ? ? ? ?phd at phdru.name >>>> ? ? ? ? ? Programmers don't die, they just GOSUB without RETURN. >>>> _______________________________________________ >>>> Python-ideas mailing list >>>> Python-ideas at python.org >>>> http://mail.python.org/mailman/listinfo/python-ideas >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> http://mail.python.org/mailman/listinfo/python-ideas From masklinn at masklinn.net Sat Jun 2 16:54:44 2012 From: masklinn at masklinn.net (Masklinn) Date: Sat, 2 Jun 2012 16:54:44 +0200 Subject: [Python-ideas] Adding list.pluck() In-Reply-To: References: <20120601211630.GA24569@iskra.aviel.ru> Message-ID: <07102B0D-CE6E-4A61-A0FB-104D6741BFE1@masklinn.net> On 2 juin 2012, at 07:25, Cenk Alt? wrote: > If the item has definded a method "__getitem__" it will be called, > else "__getattribute__" is called. > That is compl?te insanity and goes against pretty much all existing python code. Some types make keys available as both items and attributes but I do not know of any operation which does so. > I know list comprehensions are same thing but .pluck seems easier to > read and write I think (no need to write a temporary variable in list > comprehension, and also square brackets). Just an idea... Not a useful one, if you dislike the iteration variable of comprehensions you may use 'map' with itemgetter or attrgetter. Not to mention they are more flexible than pluck (they can extract multiple items or attributes, and attrgetter supports "deep" dotted paths) From grosser.meister.morti at gmx.net Sat Jun 2 17:54:50 2012 From: grosser.meister.morti at gmx.net (=?UTF-8?B?TWF0aGlhcyBQYW56ZW5iw7Zjaw==?=) Date: Sat, 02 Jun 2012 17:54:50 +0200 Subject: [Python-ideas] Adding list.pluck() In-Reply-To: References: Message-ID: <4FCA374A.60909@gmx.net> There are already at least two easy ways to do this: >>> stooges=[{'name': 'moe', 'age': 40}, {'name': 'larry', 'age': 50}, {'name': 'curly', 'age': 60}] >>> [guy['name'] for guy in stooges] ['moe', 'larry', 'curly'] >>> from operator import itemgetter >>> map(itemgetter('name'),stooges) ['moe', 'larry', 'curly'] Also I'm used to such functions being called "collect" (Ruby) or "map" (Python, jQuery) and accepting a function/block as an argument. In Ruby-on-Rails it can be &:name as a shorthand for {|item| item[:name]}, which is equivalent to itemgetter('name') in Python. So if you insist of making it shorter (but less readable) you could do: >>> from operator import itemgetter as G >>> map(G('name'),stooges) ['moe', 'larry', 'curly'] On 06/01/2012 11:10 PM, Cenk Alt? wrote: > Hello All, > > pluck() is a beautiful function which is in underscore.js library. > Described as "A convenient version of what is perhaps the most common > use-case for map: extracting a list of property values." > > http://documentcloud.github.com/underscore/#pluck > > What about it implementing for python lists? And maybe for other iterables? From guido at python.org Sat Jun 2 18:06:56 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 2 Jun 2012 09:06:56 -0700 Subject: [Python-ideas] Adding list.pluck() In-Reply-To: <4FCA374A.60909@gmx.net> References: <4FCA374A.60909@gmx.net> Message-ID: Forgive the out of context drive-by comments... On Sat, Jun 2, 2012 at 8:54 AM, Mathias Panzenb?ck wrote: > There are already at least two easy ways to do this: > >>>> stooges=[{'name': 'moe', 'age': 40}, {'name': 'larry', 'age': 50}, >>>> {'name': 'curly', 'age': 60}] >>>> [guy['name'] for guy in stooges] > ['moe', 'larry', 'curly'] Bingo. Doesn't need improvements. >>>> from operator import itemgetter >>>> map(itemgetter('name'),stooges) > ['moe', 'larry', 'curly'] If I saw this I would have to think a lot harder before I figured what it meant. (Especially without the output example.) Let's remember KISS. -- --Guido van Rossum (python.org/~guido) From ironfroggy at gmail.com Sat Jun 2 18:28:52 2012 From: ironfroggy at gmail.com (Calvin Spealman) Date: Sat, 2 Jun 2012 12:28:52 -0400 Subject: [Python-ideas] Adding list.pluck() In-Reply-To: References: Message-ID: On Fri, Jun 1, 2012 at 5:10 PM, Cenk Alt? wrote: > Hello All, > > pluck() is a beautiful function which is in underscore.js library. > Described as "A convenient version of what is perhaps the most common > use-case for map: extracting a list of property values." > > http://documentcloud.github.com/underscore/#pluck > > What about it implementing for python lists? And maybe for other iterables? This is a case where a simple list comprehension or generator expression would be a lot easier to understand than remembering what a rarely used method name does. Also, it couples two distinct interfaces, iterables and mappings, in a way that is generally frowned upon in Python. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy From ironfroggy at gmail.com Sat Jun 2 19:17:49 2012 From: ironfroggy at gmail.com (Calvin Spealman) Date: Sat, 2 Jun 2012 13:17:49 -0400 Subject: [Python-ideas] setprofile and settrace inconsistency In-Reply-To: References: Message-ID: On Fri, Jun 1, 2012 at 6:07 AM, Alon Horev wrote: > Hi, > > When setting a trace function with settrace, the trace function when called > with a new scope can return another trace function or None, indicating the > inner scope should not be traced. > I used settrace for some time but calling the trace function for every line > of code is a performance killer. > So I moved on to setprofile, which calls a trace function every function > entry/exit. now here's the problem: the return value from the trace function > is ignored (intentionally), denying the?possibility to skip tracing of 'hot' > or 'not interesting' code. > > I would like to propose two alternatives: > 1. setprofile will not ignore the return value and mimic settrace's > behavior. > 2. setprofile is just a wrapper around settrace that limits > it's?functionality, lets make settrace more flexible so setprofile will be > redundant. here's how: settrace will recieve an argument called 'events', > the trace function will fire only on events contained in that list. for > example: setprofile = partial(settrace, events=['call', 'return']) I particularly like the additional parameter for settrace(). > I personally prefer the second. > > Some context to this issue: > I'm building a python tracer - a logger that records each and every function > call. In order for it to run in production systems, the overhead should be > minimal. I would like to allow the user to say which function/module/classes > to trace or skip, for example: the user will skip all math/cpu intensive > operations. another example: the user will want to trace his django app code > but not the django framework. > > your thoughts? > > ? ? ? ? ? ? ? ? ? Thanks, Alon Horev > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy From ironfroggy at gmail.com Sat Jun 2 19:24:05 2012 From: ironfroggy at gmail.com (Calvin Spealman) Date: Sat, 2 Jun 2012 13:24:05 -0400 Subject: [Python-ideas] stdlib crowdsourcing In-Reply-To: References: Message-ID: On Fri, Jun 1, 2012 at 11:08 AM, anatoly techtonik wrote: > On Tue, May 29, 2012 at 9:02 AM, Nick Coghlan wrote: >> Once again, you're completely ignoring all existing knowledge and >> expertise on open collaboration and trying to reinvent the world. It's >> *not going to happen*. > > It's too boring to live in a world of existing knowledge and > expertise, Frankly, this one fragment is enough to stop me reading further. Who wants to learn from the vast and broad experience when you could simply randomize the rules of reality through ignorance and stubbornness? I sound fickle, because I am. > and yes, I am not aware of any open collaboration stuff > expertise. Any reading recommendations with concentrated knowledge > that can fit my brain? > >> The standard library is just the curated core, and *yes*, it's damn >> hard to get anything added to it (deliberately so). There's a place >> where anyone can post anything they want, and see if others find it >> useful: PyPI. > > The major drawbacks of remote packages in general is that it bring > back project compilation from the old days. The biggest Python > advantage at all times was "copy and run" ability. > > The drawbacks of PyPI for this proposal are: > 1. every function you need will require a separate upload to PyPI > 2. you can't upload function with the same stdlib name, but slightly > different implementation as it is used in different projects > 3. you can't find functions that people recommend to be included into stdlib > 4. it is hard (impossible) to gather feedback on the quality of these proposals > >> The standard library provides tools to upload to PyPI, and, as of 3.3, >> will even include tools to download and install from it. > > I am glad 3.3 is giving virtualenv and bootstrap stuff. It would > really rock, if the new feature won't be settled in stone right after > release and will gain a few UX iterations with allowed break-ability. > > As for PyPI, the major drawback of it is security - DNS attack for a > couple of minutes, and one of your automatically deployed nodes is > trojan ready. I remember PyPI password are stored in clear-text on > developer's machine, but I don't remember if anyone turned off HTTP > basic authorization on PyPI to protect passwords travelling to PyPI > with every upload from intercepting. It would be an interesting > exercise to sniff PyPI passwords over WiFi during next conference > (i.e. https://ep2012.europython.eu/) and match those to the > developer's accounts on *.python.org ;) > >> If you don't like our ecosystem (it's hard to tell whether or not you >> do: everything you post is about how utterly awful and unusable >> everything is, yet you're still here years later). > > You're absolutely right - I like the Python ecosystem, otherwise I > wouldn't stick there. It is like a vintage car - awesome, nice > looking, and there is even this new twisted pyusion engine inside, > but.. well - it's not for youngsters. > >> If you think the PyPI UI is awful or inadequate, follow the example of >> crate.io or pythonpackage.com and *create your own*. There's far more >> to the Python universe than just core development, stop trying to >> shoehorn everything into a place where it doesn't belong. > > I have absolutely no idea how aforementioned post touches PyPI UI. > Speaking about PyPI enhancements and ecosystem, instead of reinventing > bicycles I'd rather patch existing one. The only problem is that > patches are not accepted. > https://bitbucket.org/loewis/pypi/pull-request/1/fix-imports-add-logging-to-console-in > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy From storchaka at gmail.com Sat Jun 2 20:01:53 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 02 Jun 2012 21:01:53 +0300 Subject: [Python-ideas] Adding list.pluck() In-Reply-To: References: <4FCA374A.60909@gmx.net> Message-ID: On 02.06.12 19:06, Guido van Rossum wrote: > On Sat, Jun 2, 2012 at 8:54 AM, Mathias Panzenb?ck > wrote: >>>>> from operator import itemgetter >>>>> map(itemgetter('name'),stooges) >> ['moe', 'larry', 'curly'] > > If I saw this I would have to think a lot harder before I figured what > it meant. (Especially without the output example.) And this is not true in Python 3. From anikom15 at gmail.com Sat Jun 2 20:50:21 2012 From: anikom15 at gmail.com (Westley =?iso-8859-1?Q?Mart=EDnez?=) Date: Sat, 2 Jun 2012 11:50:21 -0700 Subject: [Python-ideas] Adding list.pluck() In-Reply-To: References: <4FCA374A.60909@gmx.net> Message-ID: <20120602185021.GA3249@kubrick> On Sat, Jun 02, 2012 at 09:01:53PM +0300, Serhiy Storchaka wrote: > On 02.06.12 19:06, Guido van Rossum wrote: > >On Sat, Jun 2, 2012 at 8:54 AM, Mathias Panzenb?ck > > wrote: > >>>>>from operator import itemgetter > >>>>>map(itemgetter('name'),stooges) > >>['moe', 'larry', 'curly'] > > > >If I saw this I would have to think a lot harder before I figured what > >it meant. (Especially without the output example.) > > And this is not true in Python 3. > > map returns a generator in Python 3. From grosser.meister.morti at gmx.net Sat Jun 2 21:54:56 2012 From: grosser.meister.morti at gmx.net (=?ISO-8859-1?Q?Mathias_Panzenb=F6ck?=) Date: Sat, 02 Jun 2012 21:54:56 +0200 Subject: [Python-ideas] Adding list.pluck() In-Reply-To: <20120602185021.GA3249@kubrick> References: <4FCA374A.60909@gmx.net> <20120602185021.GA3249@kubrick> Message-ID: <4FCA6F90.1070109@gmx.net> On 06/02/2012 08:50 PM, Westley Mart?nez wrote: > On Sat, Jun 02, 2012 at 09:01:53PM +0300, Serhiy Storchaka wrote: >> On 02.06.12 19:06, Guido van Rossum wrote: >>> On Sat, Jun 2, 2012 at 8:54 AM, Mathias Panzenb?ck >>> wrote: >>>>>> >from operator import itemgetter >>>>>>> map(itemgetter('name'),stooges) >>>> ['moe', 'larry', 'curly'] >>> >>> If I saw this I would have to think a lot harder before I figured what >>> it meant. (Especially without the output example.) >> >> And this is not true in Python 3. >> >> > > map returns a generator in Python 3. Yes, yes. I opened a Python 2 shell to write the example code. Python 2 is still the default in most (all?) Linux distributions. To get a list from that just wrap list() around it. I consider this behaviour (that map returns a generator) in fact superior to what's available in other languages. You can then pass that to whatever constructor you like (e.g. set() or tuple()) or take only some of the values and then stop without calculating (and allocating) it all. -panzi From g.rodola at gmail.com Mon Jun 4 01:09:31 2012 From: g.rodola at gmail.com (=?ISO-8859-1?Q?Giampaolo_Rodol=E0?=) Date: Mon, 4 Jun 2012 01:09:31 +0200 Subject: [Python-ideas] Expose Linux-specific APIs in resource module Message-ID: >From "man getrlimit" we have 5 linux-specific constants which are currently not exposed by resource module: RLIMIT_MSGQUEUE RLIMIT_NICE RLIMIT_RTPRIO RLIMIT_RTTIME RLIMIT_SIGPENDING Also, we have prlimit(), which is useful to get/set resources in a per-process fashion based on process PID. If desirable I can submit a patch for this. --- Giampaolo http://code.google.com/p/pyftpdlib/ http://code.google.com/p/psutil/ http://code.google.com/p/pysendfile/ From techtonik at gmail.com Mon Jun 4 11:47:48 2012 From: techtonik at gmail.com (anatoly techtonik) Date: Mon, 4 Jun 2012 12:47:48 +0300 Subject: [Python-ideas] shutil.run (Was: shutil.runret and shutil.runout) In-Reply-To: References: <20120522163053.684b43d0@bhuda.mired.org> <4FBD965A.4040801@pearwood.info> Message-ID: On Thu, May 24, 2012 at 6:24 AM, geremy condra wrote: > On Wed, May 23, 2012 at 7:00 PM, Steven D'Aprano > wrote: >> >> anatoly techtonik wrote: >> >>> I am all ears how to make shutil.run() more secure. Right now I must >>> confess that I don't even realize.how serious is this problems, so if >>> anyone can came up with a real-world example with explanation of >>> security concern that could be copied "as-is" into documentation, it >>> will surely be appreciated not only by me. >> >> >> Start here: >> >> http://cwe.mitre.org/top25/index.html >> >> Code injection attacks include two of the top three security >> vulnerabilities, over even buffer overflows. >> >> One sub-category of code injection: >> >> OS Command Injection >> http://cwe.mitre.org/data/definitions/78.html Great links. Thanks. Do they still too generic to be placed in docs? > > I talked about this in my pycon talk this year. It's easy to avoid and > disastrous to get wrong. Please don't do it this way. Sorry, don't have too much time to watch it right now. Any specific slides, ideas or exceprts? -- anatoly t. From debatem1 at gmail.com Tue Jun 5 08:00:34 2012 From: debatem1 at gmail.com (geremy condra) Date: Mon, 4 Jun 2012 23:00:34 -0700 Subject: [Python-ideas] shutil.run (Was: shutil.runret and shutil.runout) In-Reply-To: References: <20120522163053.684b43d0@bhuda.mired.org> <4FBD965A.4040801@pearwood.info> Message-ID: On Mon, Jun 4, 2012 at 2:47 AM, anatoly techtonik wrote: > On Thu, May 24, 2012 at 6:24 AM, geremy condra wrote: > > On Wed, May 23, 2012 at 7:00 PM, Steven D'Aprano > > wrote: > >> > >> anatoly techtonik wrote: > >> > >>> I am all ears how to make shutil.run() more secure. Right now I must > >>> confess that I don't even realize.how serious is this problems, so if > >>> anyone can came up with a real-world example with explanation of > >>> security concern that could be copied "as-is" into documentation, it > >>> will surely be appreciated not only by me. > >> > >> > >> Start here: > >> > >> http://cwe.mitre.org/top25/index.html > >> > >> Code injection attacks include two of the top three security > >> vulnerabilities, over even buffer overflows. > >> > >> One sub-category of code injection: > >> > >> OS Command Injection > >> http://cwe.mitre.org/data/definitions/78.html > > Great links. Thanks. Do they still too generic to be placed in docs? > > > > > I talked about this in my pycon talk this year. It's easy to avoid and > > disastrous to get wrong. Please don't do it this way. > > Sorry, don't have too much time to watch it right now. Any specific > slides, ideas or exceprts? > The main idea was just that by combining a bit of awareness of common security anti-patterns (like this one) with a good test regimen and some script kiddie tools you can protect yourself from a lot of common vulnerabilities without being a security guru. I demonstrated how that process works on something fairly similar to this, but if you're interested in more details I'm happy to blather on or dredge up my slides. Geremy Condra -- > anatoly t. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Tue Jun 5 08:14:52 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 5 Jun 2012 16:14:52 +1000 Subject: [Python-ideas] shutil.run (Was: shutil.runret and shutil.runout) In-Reply-To: References: <20120522163053.684b43d0@bhuda.mired.org> <4FBD965A.4040801@pearwood.info> Message-ID: <20120605061451.GA17873@ando> On Mon, Jun 04, 2012 at 11:00:34PM -0700, geremy condra wrote: > The main idea was just that by combining a bit of awareness of common > security anti-patterns (like this one) with a good test regimen and some > script kiddie tools you can protect yourself from a lot of common > vulnerabilities without being a security guru. I demonstrated how that > process works on something fairly similar to this, but if you're interested > in more details I'm happy to blather on or dredge up my slides. I am interested in more details. Would this make a good How (Not) To for the documentation? -- Steven From debatem1 at gmail.com Tue Jun 5 08:45:43 2012 From: debatem1 at gmail.com (geremy condra) Date: Mon, 4 Jun 2012 23:45:43 -0700 Subject: [Python-ideas] shutil.run (Was: shutil.runret and shutil.runout) In-Reply-To: <20120605061451.GA17873@ando> References: <20120522163053.684b43d0@bhuda.mired.org> <4FBD965A.4040801@pearwood.info> <20120605061451.GA17873@ando> Message-ID: On Mon, Jun 4, 2012 at 11:14 PM, Steven D'Aprano wrote: > On Mon, Jun 04, 2012 at 11:00:34PM -0700, geremy condra wrote: > > > The main idea was just that by combining a bit of awareness of common > > security anti-patterns (like this one) with a good test regimen and some > > script kiddie tools you can protect yourself from a lot of common > > vulnerabilities without being a security guru. I demonstrated how that > > process works on something fairly similar to this, but if you're > interested > > in more details I'm happy to blather on or dredge up my slides. > > I am interested in more details. Would this make a good How (Not) To for > the documentation? > Combined with some other material I have on hand it might. Only problem would be that I don't really know my way around Sphinx- if there are any doc wizards on hand to help with formatting we could probably make a pretty quick job of it. Geremy Condra > > > -- > Steven > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Jun 5 09:08:29 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 5 Jun 2012 17:08:29 +1000 Subject: [Python-ideas] shutil.run (Was: shutil.runret and shutil.runout) In-Reply-To: References: <20120522163053.684b43d0@bhuda.mired.org> <4FBD965A.4040801@pearwood.info> <20120605061451.GA17873@ando> Message-ID: On Tue, Jun 5, 2012 at 4:45 PM, geremy condra wrote: > Combined with some other material I have on hand it might. Only problem > would be that I don't really know my way around Sphinx- if there are any doc > wizards on hand to help with formatting we could probably make a pretty > quick job of it. Yep, if you can provide a plain text version, we can take it from there. I suggest attaching it to http://bugs.python.org/issue13515 (which is about taking a more consistent and holistic approach to documenting security considerations in the library reference without having modules like subprocess stuck as a wall of red security warning notices) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From rurpy at yahoo.com Tue Jun 5 19:20:01 2012 From: rurpy at yahoo.com (Rurpy) Date: Tue, 5 Jun 2012 10:20:01 -0700 (PDT) Subject: [Python-ideas] changing sys.stdout encoding Message-ID: <1338916801.8871.YahooMailClassic@web161506.mail.bf1.yahoo.com> In my first foray into Python3 I've encountered this problem: I work in a multi-language environment. I've written a number of tools, mostly command-line, that generate output on stdout. Because these tools and their output are used by various people in varying environments, the tools all have an --encoding option to provide output that meets the needs and preferences of the output's ultimate consumers. In converting them to Python3, I found the best (if not very pleasant) way to do this in Python3 was to put something like this near the top of each tool[*1]: import codecs sys.stdout = codecs.getwriter(opts.encoding)(sys.stdout.buffer) What I want to be able to put there instead is: sys.stdout.set_encoding (opts.encoding) The former I found on the internet -- there is zero probability I could have figured that out from the Python docs. It is obscure to anyone (who has like me generally only needed to deal with .encode() and .decode()) who hasn't encountered it before or dealt much with the codecs module. It is excessively complex for what is conceptually a simple and straight-forward operation. It requires the import of the codecs module in programs that other- wise don't need it [*2], and the reading of the codecs docs (not a shining example of clarity themselves) to understand it. In short it is butt ugly relative to what I generally get in Python. Would it be feasible to provide something like .set_encoding() on textio streams? (Or make .encoding a writeable property?; it seems to intentionally be non-writeable for some reason but is that reason really unavoidable?) If doing this for textio in general is too hard, then what about encapsulating the codecs stuff above in a sys.set_encoding() function? Needing to change the encoding of a sys.std* stream is not an uncommon need and a user should not have to go through the codecs dance above to do so IMO. ---- [*1] There are other ways to change stdout's encoding but they all have problems AFAICT. PYTHONIOENCODING can't easily be changed dynamically within program. Reopening stdout as binary, or using the binary interface to text stdout, requires a explicit encode call at each write site. Overloading print() is obscure because it requires reader to notice print was overloaded. [*2] I don't mean the actual import of the codecs module which occurs anyway; I mean the extra visual and cognitive noise introduced by the presence of the import statement in the source. From stephen at xemacs.org Tue Jun 5 21:37:16 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 06 Jun 2012 04:37:16 +0900 Subject: [Python-ideas] changing sys.stdout encoding In-Reply-To: <1338916801.8871.YahooMailClassic@web161506.mail.bf1.yahoo.com> References: <1338916801.8871.YahooMailClassic@web161506.mail.bf1.yahoo.com> Message-ID: <87zk8ha6tv.fsf@uwakimon.sk.tsukuba.ac.jp> Rurpy writes: > It is excessively complex for what is conceptually a simple and > straight-forward operation. The operation is not conceptually straightforward. The problem is that you can't just change the encoding of an open stream, encodings are generally stateful. The straightforward way to deal with this issue is to close the stream and reinitialize it. Your proposed .set_encoding() method implies something completely different about what's going on. I wouldn't object to a method with the semantics of reinitialization, but it should have a name implying reinitialization. It probably should also error if the stream is open and has been written to. > Needing to change the encoding of a sys.std* stream is not an > uncommon need and a user should not have to go through the > codecs dance above to do so IMO. I suspect needing to *change* the encoding of an open stream is generally quite rare. Needing to *initialize* the std* streams with an appropriate codec is common. That's why it doesn't so much matter that PYTHONIOENCODING can't be changed within a program. I agree that use of PYTHONIOENCODING is pretty awkward. From amauryfa at gmail.com Tue Jun 5 23:22:27 2012 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Tue, 5 Jun 2012 23:22:27 +0200 Subject: [Python-ideas] changing sys.stdout encoding In-Reply-To: <87zk8ha6tv.fsf@uwakimon.sk.tsukuba.ac.jp> References: <1338916801.8871.YahooMailClassic@web161506.mail.bf1.yahoo.com> <87zk8ha6tv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: 2012/6/5 Stephen J. Turnbull > I wouldn't object to a method with the semantics of reinitialization, > but it should have a name implying reinitialization. It probably > should also error if the stream is open and has been written to. > What do you think of the following method TextIOWrapper.reset_encoding? (the assert statements should certainly be replaced by some IOError) :: def reset_encoding(self, encoding, errors='strict'): if self._decoder: # No decoded chars awaiting read assert self._decoded_chars_used == len(self._decoded_chars) # Nothing in the input buffer buf, flag = self._decoder.getstate() assert buf == b'' if self._encoder: # Nothing in the output buffer buf = self._encoder.encode('', final=True) assert buf == b'' # Reset the decoders self._decoder = None self._encoder = None # Now change the encoding self._encoding = encoding self._errors = errors -- Amaury Forgeot d'Arc -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Wed Jun 6 01:34:00 2012 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 6 Jun 2012 01:34:00 +0200 Subject: [Python-ideas] changing sys.stdout encoding In-Reply-To: <1338916801.8871.YahooMailClassic@web161506.mail.bf1.yahoo.com> References: <1338916801.8871.YahooMailClassic@web161506.mail.bf1.yahoo.com> Message-ID: 2012/6/5 Rurpy : > In my first foray into Python3 I've encountered this problem: > I work in a multi-language environment. ?I've written a number > of tools, mostly command-line, that generate output on stdout. > Because these tools and their output are used by various people > in varying environments, the tools all have an --encoding option > to provide output that meets the needs and preferences of the > output's ultimate consumers. What happens if the specified encoding is different than the encoding of the console? Mojibake? If the output is used as in the input of another program, does the other program use the same encoding? In my experience, using an encoding different than the locale encoding for input/output (stdout, environment variables, command line arguments, etc.) causes various issues. So I'm curious of your use cases. > In converting them to Python3, I found the best (if not very > pleasant) way to do this in Python3 was to put something like > this near the top of each tool[*1]: > > ?import codecs > ?sys.stdout = codecs.getwriter(opts.encoding)(sys.stdout.buffer) In Python 3, you should use io.TextIOWrapper instead of codecs.StreamWriter. It's more efficient and has less bugs. > What I want to be able to put there instead is: > > ?sys.stdout.set_encoding (opts.encoding) I don't think that your use case merit a new method on io.TextIOWrapper: replacing sys.stdout does work and should be used instead. TextIOWrapper is generic and your use case if specific to sys.std* streams. It would be surprising to change the encoding of an arbitrary file after it is opened. At least, I don't see the use case. For example, tokenize.open() opens a Python source code file with the right encoding. It starts by reading the file in binary mode to detect the encoding, and then use TextIOWrapper to get a text file without having to reopen the file. It would be possible to start with a text file and then change the encoding, but it would be less elegant. > sys.stdout = codecs.getwriter(opts.encoding)(sys.stdout.buffer) You should also flush sys.stdout (and maybe also sys.stdout.buffer) before replacing it. > It requires the import of the codecs module in programs that other- > wise don't need it [*2], and the reading of the codecs docs (not > a shining example of clarity themselves) to understand it. It's maybe difficult to change the encoding of sys.stdout at runtime because it is NOT a good idea :-) > Needing to change the encoding of a sys.std* stream is not an > uncommon need and a user should not have to go through the > codecs dance above to do so IMO. Replacing sys.std* works but has issues: output written before the replacement is encoded to a different encoding for example. The best way is to change your locale encoding (using LC_ALL, LC_CTYPE or LANG environment variable on UNIX), or simply to set PYTHONIOENCODING environment variable. > [*1] There are other ways to change stdout's encoding but they > ?all have problems AFAICT. ?PYTHONIOENCODING can't easily be > ?changed dynamically within program. Ah? Detect if PYTHONIOENCODING is present (or if sys.stdout.encoding is the requested encoding), if not: restart the program with PYTHONIOENCODING=encoding. > ?Overloading print() is obscure > ?because it requires reader to notice print was overloaded. Why not writing the output into a file, instead of stdout? Victor From python at mrabarnett.plus.com Wed Jun 6 01:56:55 2012 From: python at mrabarnett.plus.com (MRAB) Date: Wed, 06 Jun 2012 00:56:55 +0100 Subject: [Python-ideas] changing sys.stdout encoding In-Reply-To: References: <1338916801.8871.YahooMailClassic@web161506.mail.bf1.yahoo.com> Message-ID: <4FCE9CC7.3080705@mrabarnett.plus.com> On 06/06/2012 00:34, Victor Stinner wrote: > 2012/6/5 Rurpy: >> In my first foray into Python3 I've encountered this problem: >> I work in a multi-language environment. I've written a number >> of tools, mostly command-line, that generate output on stdout. >> Because these tools and their output are used by various people >> in varying environments, the tools all have an --encoding option >> to provide output that meets the needs and preferences of the >> output's ultimate consumers. > > What happens if the specified encoding is different than the encoding > of the console? Mojibake? > > If the output is used as in the input of another program, does the > other program use the same encoding? > > In my experience, using an encoding different than the locale encoding > for input/output (stdout, environment variables, command line > arguments, etc.) causes various issues. So I'm curious of your use > cases. > >> In converting them to Python3, I found the best (if not very >> pleasant) way to do this in Python3 was to put something like >> this near the top of each tool[*1]: >> >> import codecs >> sys.stdout = codecs.getwriter(opts.encoding)(sys.stdout.buffer) > > In Python 3, you should use io.TextIOWrapper instead of > codecs.StreamWriter. It's more efficient and has less bugs. > >> What I want to be able to put there instead is: >> >> sys.stdout.set_encoding (opts.encoding) > > I don't think that your use case merit a new method on > io.TextIOWrapper: replacing sys.stdout does work and should be used > instead. TextIOWrapper is generic and your use case if specific to > sys.std* streams. > > It would be surprising to change the encoding of an arbitrary file > after it is opened. At least, I don't see the use case. > [snip] And if you _do_ want multiple encodings in a file, it's clearer to open the file as binary and then explicitly encode to bytes and write _that_ to the file. From stephen at xemacs.org Wed Jun 6 05:28:57 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 06 Jun 2012 12:28:57 +0900 Subject: [Python-ideas] changing sys.stdout encoding In-Reply-To: References: <1338916801.8871.YahooMailClassic@web161506.mail.bf1.yahoo.com> <87zk8ha6tv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87wr3l9kzq.fsf@uwakimon.sk.tsukuba.ac.jp> Amaury Forgeot d'Arc writes: > 2012/6/5 Stephen J. Turnbull > > > I wouldn't object to a method with the semantics of reinitialization, > > but it should have a name implying reinitialization. It probably > > should also error if the stream is open and has been written to. > > > > What do you think of the following method TextIOWrapper.reset_encoding? > (the assert statements should certainly be replaced by some > IOError) I think that it's an attractive nuisance because it doesn't close the stream, and therefore permits changing the encoding without any warning partway through the stream. There are two reasonable (for a very generous definition of "reasonable") ways to handle multiple scripts in one stream: Unicode and ISO 2022. Simply changing encodings in the middle is a recipe for disaster in the absence of a higher-level protocol for signaling this change (that's the role ISO 2022 fulfils, but it is detested by almost everybody...). If you want to do that kind of thing, the "import codecs; sys.stdout = ..." idiom is available, but I don't see a need to make it convenient. But the OP's request is pretty clearly not for a generic .set_encoding(), it's for a more convenient way to initialize the stream for users. Aside to Victor: at least on Mac OS X, I find that Python 3.2 (current MacPorts, I can investigate further if you need it) doesn't respect the language environment as I would expect it to. "LC_ALL=ja_JP.UTF8 python32" will give me an out-of-range Unicode error if I try to input Japanese using "import sys; sys.stdin.readline()" -- I have to use "PYTHONIOENCODING=UTF8" to get useful behavior. There may also be cases where multiple users with different language needs are working at the same workstation. For both of these cases a command-line option to initialize the encoding would be convenient. From ncoghlan at gmail.com Wed Jun 6 07:49:16 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 6 Jun 2012 15:49:16 +1000 Subject: [Python-ideas] changing sys.stdout encoding In-Reply-To: <87wr3l9kzq.fsf@uwakimon.sk.tsukuba.ac.jp> References: <1338916801.8871.YahooMailClassic@web161506.mail.bf1.yahoo.com> <87zk8ha6tv.fsf@uwakimon.sk.tsukuba.ac.jp> <87wr3l9kzq.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Wed, Jun 6, 2012 at 1:28 PM, Stephen J. Turnbull wrote: > For both of these cases a command-line option to initialize the > encoding would be convenient. Before adding yet-another-command-line-option, the cases where the existing environment variable support can't be used from the command line, but a new option could be, should be clearly enumerated. $ python3 Python 3.2.1 (default, Jul 11 2011, 18:54:42) [GCC 4.6.1 20110627 (Red Hat 4.6.1-1)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.stdout.encoding 'UTF-8' >>> $ PYTHONIOENCODING=latin-1 python3 Python 3.2.1 (default, Jul 11 2011, 18:54:42) [GCC 4.6.1 20110627 (Red Hat 4.6.1-1)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.stdout.encoding 'latin-1' >>> Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From rurpy at yahoo.com Wed Jun 6 08:05:35 2012 From: rurpy at yahoo.com (Rurpy) Date: Tue, 5 Jun 2012 23:05:35 -0700 (PDT) Subject: [Python-ideas] changing sys.stdout encoding In-Reply-To: <87zk8ha6tv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <1338962735.37241.YahooMailClassic@web161503.mail.bf1.yahoo.com> On 06/05/2012 01:37 PM, Stephen J. Turnbull wrote: > Rurpy writes: > > > It is excessively complex for what is conceptually a simple and > > straight-forward operation. > > The operation is not conceptually straightforward. The problem is > that you can't just change the encoding of an open stream, encodings > are generally stateful. The straightforward way to deal with this > issue is to close the stream and reinitialize it. Your proposed > .set_encoding() method implies something completely different about > what's going on. I'm not sure why stateful matters. When you change encoding you discard whatever state exists and start with the new encoder in it's initial state. If there is a partially en/decoded character then wouldn't do the same thing you'd do if the same condition arose at EOF? > I wouldn't object to a method with the semantics of reinitialization, > but it should have a name implying reinitialization. It probably > should also error if the stream is open and has been written to. > > > Needing to change the encoding of a sys.std* stream is not an > > uncommon need and a user should not have to go through the > > codecs dance above to do so IMO. > > I suspect needing to *change* the encoding of an open stream is > generally quite rare. Needing to *initialize* the std* streams with > an appropriate codec is common. That's why it doesn't so much matter > that PYTHONIOENCODING can't be changed within a program. You are correct that my current concern is reinitializing the encoding(s) of the sys.std* streams prior to doing any operations with them. I thought that changing the encoding at any point would be a straight-forward generalization. However I have in the past encountered mixed encoding outputting programs in two contexts; generating test data (i think is was for automatic detection and extraction of information), and bundling multiple differently-encoded data sets in one package that were pulled apart again downstream That both uses probably could have been designed better is irrelevant; a hypothetical python programmer's job would have been to produce a python program that would fit into the the existing processes. However I don't want to dwell on this because it is not my main concern now, I thought I would just mention it for the record. > I agree that use of PYTHONIOENCODING is pretty awkward. From rurpy at yahoo.com Wed Jun 6 08:14:26 2012 From: rurpy at yahoo.com (Rurpy) Date: Tue, 5 Jun 2012 23:14:26 -0700 (PDT) Subject: [Python-ideas] changing sys.stdout encoding In-Reply-To: Message-ID: <1338963266.96156.YahooMailClassic@web161502.mail.bf1.yahoo.com> On 06/05/2012 11:49 PM, Nick Coghlan wrote: > On Wed, Jun 6, 2012 at 1:28 PM, Stephen J. Turnbull wrote: >> For both of these cases a command-line option to initialize the >> encoding would be convenient. A Python interpreter command line option? That would not particularly help my use case much. > Before adding yet-another-command-line-option, the cases where the > existing environment variable support can't be used from the command > line, but a new option could be, should be clearly enumerated. > > $ python3 > Python 3.2.1 (default, Jul 11 2011, 18:54:42) > [GCC 4.6.1 20110627 (Red Hat 4.6.1-1)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> import sys >>>> sys.stdout.encoding > 'UTF-8' >>>> > $ PYTHONIOENCODING=latin-1 python3 > Python 3.2.1 (default, Jul 11 2011, 18:54:42) > [GCC 4.6.1 20110627 (Red Hat 4.6.1-1)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> import sys >>>> sys.stdout.encoding > 'latin-1' I don't think that works on Windows. From rurpy at yahoo.com Wed Jun 6 08:17:18 2012 From: rurpy at yahoo.com (Rurpy) Date: Tue, 5 Jun 2012 23:17:18 -0700 (PDT) Subject: [Python-ideas] changing sys.stdout encoding Message-ID: <1338963438.55650.YahooMailClassic@web161501.mail.bf1.yahoo.com> On 06/05/2012 05:34 PM, Victor Stinner wrote: > 2012/6/5 Rurpy : >> In my first foray into Python3 I've encountered this problem: >> I work in a multi-language environment. I've written a number >> of tools, mostly command-line, that generate output on stdout. >> Because these tools and their output are used by various people >> in varying environments, the tools all have an --encoding option >> to provide output that meets the needs and preferences of the >> output's ultimate consumers. > > What happens if the specified encoding is different than the encoding > of the console? Mojibake? When output is directed to te console, yes. Would one expect something else? > If the output is used as in the input of another program, does the > other program use the same encoding? Yes of course (when not misused). That's why they have --encoding options. (Obviously details vary depending on requirements of the various tools.) > In my experience, using an encoding different than the locale encoding > for input/output (stdout, environment variables, command line > arguments, etc.) causes various issues. So I'm curious of your use > cases. I gave the use case in my original post: + I work in a multi-language environment. I've written a number + of tools, mostly command-line, that generate output on stdout. + Because these tools and their output are used by various people + in varying environments, the tools all have an --encoding option + to provide output that meets the needs and preferences of the + output's ultimate consumers. They are often used like: ./extractor.py --encoding=euc-jp dataset >somefile And of course some tools require something similar for stdin encodings. >> In converting them to Python3, I found the best (if not very >> pleasant) way to do this in Python3 was to put something like >> this near the top of each tool[*1]: >> >> import codecs >> sys.stdout = codecs.getwriter(opts.encoding)(sys.stdout.buffer) > > In Python 3, you should use io.TextIOWrapper instead of > codecs.StreamWriter. It's more efficient and has less bugs. Thanks, I'll do that. But surely this is a strong argument for encapsulating the ability to change (or reinitialize) the std* encodings. I did fair amount of searching on the internet (many orders of magnitude more time that it would have taken to look up sys.stdout.set_encoding() in the documentation) and *still* ended up with a suboptimal solution. >> What I want to be able to put there instead is: >> >> sys.stdout.set_encoding (opts.encoding) > > I don't think that your use case merit a new method on > io.TextIOWrapper: replacing sys.stdout does work and should be used > instead. TextIOWrapper is generic and your use case if specific to > sys.std* streams. > > It would be surprising to change the encoding of an arbitrary file > after it is opened. At least, I don't see the use case. I gave a couple that I encountered in the past, in my response to Steven Turnbull. However, now I am more concerned with just resetting the encoding at the beginning of the program. > For example, tokenize.open() opens a Python source code file with the > right encoding. It starts by reading the file in binary mode to detect > the encoding, and then use TextIOWrapper to get a text file without > having to reopen the file. It would be possible to start with a text > file and then change the encoding, but it would be less elegant. That's a rather different use case than mine, yes? >> sys.stdout = codecs.getwriter(opts.encoding)(sys.stdout.buffer) > > You should also flush sys.stdout (and maybe also sys.stdout.buffer) > before replacing it. > >> It requires the import of the codecs module in programs that other- >> wise don't need it [*2], and the reading of the codecs docs (not >> a shining example of clarity themselves) to understand it. > > It's maybe difficult to change the encoding of sys.stdout at runtime > because it is NOT a good idea :-) Why would that be? My tools already do that, they meet their usability requirements and I have noticed no ill effects. The code (except for the piece I am complaining about) is about as simple and obvious as it is possible to get. Am I missing something? >> Needing to change the encoding of a sys.std* stream is not an >> uncommon need and a user should not have to go through the >> codecs dance above to do so IMO. > > Replacing sys.std* works but has issues: output written before the > replacement is encoded to a different encoding for example. The best > way is to change your locale encoding (using LC_ALL, LC_CTYPE or LANG > environment variable on UNIX), or simply to set PYTHONIOENCODING > environment variable. Those solutions are not only NOT the best solution (IMO) -- they are completely unacceptable. If I had to build my programs as shell scripts that manipulate environment variables before calling my Python program, I would dump Python for some other language. >> [*1] There are other ways to change stdout's encoding but they >> all have problems AFAICT. PYTHONIOENCODING can't easily be >> changed dynamically within program. > > Ah? Detect if PYTHONIOENCODING is present (or if sys.stdout.encoding > is the requested encoding), if not: restart the program with > PYTHONIOENCODING=encoding. For what I need to do (print() to sys.stdout with a different encoding than what Python guessed I'd want), your proposal seems absurdly convoluted to me. sys.stdout is set to encoding A. I want it to write using encoding B. The obvious, simplest, most desirable solution (barring technical difficulties) is just change the encoding. >> Overloading print() is obscure >> because it requires reader to notice print was overloaded. > > Why not writing the output into a file, instead of stdout? Because the interface for these tools already exists and the users of the tools are happy with them the way they are. And even if that weren't the case, it is not the role of a general purpose programming language to say a standard convention such as file redirection should be relegated to second-class status simply because the programmer needs a different output encoding than the language designers thought he would. From pyideas at rebertia.com Wed Jun 6 08:32:24 2012 From: pyideas at rebertia.com (Chris Rebert) Date: Tue, 5 Jun 2012 23:32:24 -0700 Subject: [Python-ideas] changing sys.stdout encoding In-Reply-To: <1338963266.96156.YahooMailClassic@web161502.mail.bf1.yahoo.com> References: <1338963266.96156.YahooMailClassic@web161502.mail.bf1.yahoo.com> Message-ID: On Tue, Jun 5, 2012 at 11:14 PM, Rurpy wrote: > On 06/05/2012 11:49 PM, Nick Coghlan wrote: >> On Wed, Jun 6, 2012 at 1:28 PM, Stephen J. Turnbull wrote: >> Before adding yet-another-command-line-option, the cases where the >> existing environment variable support can't be used from the command >> line, but a new option could be, should be clearly enumerated. >> >> $ python3 >> Python 3.2.1 (default, Jul 11 2011, 18:54:42) >> [GCC 4.6.1 20110627 (Red Hat 4.6.1-1)] on linux2 >> Type "help", "copyright", "credits" or "license" for more information. >>>>> import sys >>>>> sys.stdout.encoding >> 'UTF-8' >>>>> >> $ PYTHONIOENCODING=latin-1 python3 >> Python 3.2.1 (default, Jul 11 2011, 18:54:42) >> [GCC 4.6.1 20110627 (Red Hat 4.6.1-1)] on linux2 >> Type "help", "copyright", "credits" or "license" for more information. >>>>> import sys >>>>> sys.stdout.encoding >> 'latin-1' > > I don't think that works on Windows. You just need to use the "set" command/built-in (http://ss64.com/nt/set.html ; or the PowerShell equivalent) to set the environment variable. It's 1 extra line. Blame Windows for not being POSIXy enough. Cheers, Chris From rurpy at yahoo.com Wed Jun 6 09:09:34 2012 From: rurpy at yahoo.com (Rurpy) Date: Wed, 6 Jun 2012 00:09:34 -0700 (PDT) Subject: [Python-ideas] changing sys.stdout encoding Message-ID: <1338966574.75723.YahooMailClassic@web161506.mail.bf1.yahoo.com> On 06/05/2012 05:56 PM, MRAB wrote: > On 06/06/2012 00:34, Victor Stinner wrote: >> 2012/6/5 Rurpy: >>> In my first foray into Python3 I've encountered this problem: >>> I work in a multi-language environment. I've written a number >>> of tools, mostly command-line, that generate output on stdout. >>> Because these tools and their output are used by various people >>> in varying environments, the tools all have an --encoding option >>> to provide output that meets the needs and preferences of the >>> output's ultimate consumers. >> >> What happens if the specified encoding is different than the encoding >> of the console? Mojibake? >> >> If the output is used as in the input of another program, does the >> other program use the same encoding? >> >> In my experience, using an encoding different than the locale encoding >> for input/output (stdout, environment variables, command line >> arguments, etc.) causes various issues. So I'm curious of your use >> cases. >> >>> In converting them to Python3, I found the best (if not very >>> pleasant) way to do this in Python3 was to put something like >>> this near the top of each tool[*1]: >>> >>> import codecs >>> sys.stdout = codecs.getwriter(opts.encoding)(sys.stdout.buffer) >> >> In Python 3, you should use io.TextIOWrapper instead of >> codecs.StreamWriter. It's more efficient and has less bugs. >> >>> What I want to be able to put there instead is: >>> >>> sys.stdout.set_encoding (opts.encoding) >> >> I don't think that your use case merit a new method on >> io.TextIOWrapper: replacing sys.stdout does work and should be used >> instead. TextIOWrapper is generic and your use case if specific to >> sys.std* streams. >> >> It would be surprising to change the encoding of an arbitrary file >> after it is opened. At least, I don't see the use case. >> > [snip] > > And if you _do_ want multiple encodings in a file, it's clearer to open > the file as binary and then explicitly encode to bytes and write _that_ > to the file. But is it really? The following is very simple and the level of python expertise required is minimal. It (would) works fine with redirection. One could substitute any other ordinary open (for write) text file for sys.stdout. [off the top of my head] text = 'This is %s text: ??????????' sys.stdout.set_encoding ('sjis') print (text % 'sjis') sys.stdout.set_encoding ('euc-jp') print (text % 'euc-jp') sys.stdout.set_encoding ('iso2022-jp') print (text % 'iso2022-jp') As for your suggestion, how do I reopen sys.stdout in binary mode? I don't need to do that often and don't know off the top of my head. (And it's too late for me to look it up.) And what happens to redirected output when I close and reopen the stream? I can open a regular filename instead. But remember to make the last two opens with "a" rather than "w". And don't forget the "\n" at the end of the text line. Could you show me an code example of your suggestion for comparison? Disclaimer: As I said before, I am not particularly advocating for a for a set_encoding() method -- my primary suggestion is a programatic way to change the sys.std* encodings prior to first use. Here I am just questioning the claim that a set_encoding() method would not be clearer than existing alternatives. From rurpy at yahoo.com Wed Jun 6 09:36:39 2012 From: rurpy at yahoo.com (Rurpy) Date: Wed, 6 Jun 2012 00:36:39 -0700 (PDT) Subject: [Python-ideas] changing sys.stdout encoding In-Reply-To: Message-ID: <1338968199.47997.YahooMailClassic@web161502.mail.bf1.yahoo.com> On 06/06/2012 12:32 AM, Chris Rebert wrote: > On Tue, Jun 5, 2012 at 11:14 PM, Rurpy wrote: >> On 06/05/2012 11:49 PM, Nick Coghlan wrote: >>> On Wed, Jun 6, 2012 at 1:28 PM, Stephen J. Turnbull wrote: [...] >>> $ PYTHONIOENCODING=latin-1 python3 [...] >> I don't think that works on Windows. > > You just need to use the "set" command/built-in > (http://ss64.com/nt/set.html ; or the PowerShell equivalent) to set > the environment variable. It's 1 extra line. Blame Windows for not > being POSIXy enough. There's a lot more than that I blame Windows for. :-) There's another extra line to restore the environment to its original setting too. And when you forget to do that remember to straighten out the output of the next python program you run. Also, does not PYTHONIOENCODING affect all three streams? That would rule it out of consideration in my use case. But even if not, I'm sorry, compared with running a single command with an encoding option, I think messing with environment variables is not really a workable solution. About the closest I see to do this in practice would be to wrap each python program up in a .bat script. This is really case of the Python tail wagging the application dog. From tarek at ziade.org Wed Jun 6 09:56:17 2012 From: tarek at ziade.org (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Wed, 06 Jun 2012 09:56:17 +0200 Subject: [Python-ideas] Supporting already opened sockets in our socket-based server classes Message-ID: <4FCF0D21.2070208@ziade.org> Hello What about allowing all our socket servers -- from SocketServer to WSGIServer, to run with an existing socket. The use case is to make it easier to write applications that use the pre-fork model to run several processes against the same socket. Basically: - the main process creates a socket, binds it and listen to it - the main process forks some subprocesses and pass them the socket fd value - each subprocess recreates a socket object using socket.fromfd() -- so it does not bind it - each subprocess can accept() connection on the socket I have a working prototype here : https://github.com/tarekziade/chaussette/blob/master/chaussette/server.py (don't look at the code I made it quickly just as a proof of concept) What I am proposing is the following syntax: if the host passed to the class is of the form: fd://12 The class will try to create a socket object against the file descriptor 12, and will not bind() it neither accept() it. How does that sounds ? If people like the idea I can try to build a patch for 3.x, and I can certainly release a backport for 2.x Cheers Tarek From stephen at xemacs.org Wed Jun 6 10:39:22 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 06 Jun 2012 17:39:22 +0900 Subject: [Python-ideas] changing sys.stdout encoding In-Reply-To: <1338968199.47997.YahooMailClassic@web161502.mail.bf1.yahoo.com> References: <1338968199.47997.YahooMailClassic@web161502.mail.bf1.yahoo.com> Message-ID: <87pq9cal6t.fsf@uwakimon.sk.tsukuba.ac.jp> Rurpy writes: > But even if not, I'm sorry, compared with running a single > command with an encoding option, I think messing with > environment variables is not really a workable solution. You have a workable 2-line solution, which you posted. It's ugly and hard to find, and it should be, to discourage people from thinking it's something they might *want* to do. But they shouldn't; people in multilingual environments should be using UTF-8 externally unless they have really really special needs (and even then they should probably be using UTF-8 embedded in markup that serves those needs). > This is really case of the Python tail wagging the application dog. If you need to do it often, just make a function out of it. It doesn't need to be a built-in. From stephen at xemacs.org Wed Jun 6 10:26:21 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 06 Jun 2012 17:26:21 +0900 Subject: [Python-ideas] changing sys.stdout encoding In-Reply-To: <1338962735.37241.YahooMailClassic@web161503.mail.bf1.yahoo.com> References: <87zk8ha6tv.fsf@uwakimon.sk.tsukuba.ac.jp> <1338962735.37241.YahooMailClassic@web161503.mail.bf1.yahoo.com> Message-ID: <87r4tsalsi.fsf@uwakimon.sk.tsukuba.ac.jp> Rurpy writes: > I'm not sure why stateful matters. When you change encoding > you discard whatever state exists How do you know what *I* want to do? Silently discarding buffer contents would suck. > If there is a partially en/decoded character then wouldn't do the > same thing you'd do if the same condition arose at EOF? Again speaking for *myself*, almost certainly not. On input, if it happens *before* EOF it's incomplete input, and I should wait for it to be completed. If it happens on output, there's a bug somewhere, and I probably want to do some kind of error recovery. > However I have in the past encountered mixed encoding outputting > programs in two contexts; generating test data (i think is was > for automatic detection and extraction of information), and > bundling multiple differently-encoded data sets in one package > that were pulled apart again downstream. > > That both uses probably could have been designed better is irrelevant; > a hypothetical python programmer's job would have been to produce > a python program that would fit into the the existing processes. No, it's not irrelevant that it's bad design. Python should not go out of its way to cater to bad design, if bad design can be worked around with existing facilities. Here there are at least two ways to do it: the method of changing sys.std*'s text encoding that you posted, and switching sys.std* to binary and doing explicit encoding and decoding of strings to be input or output. I have also encountered mixed encoding, in my students' filesystems (it was not uncommon to see /home/j.r.exchangestudent/KOI8-R/SHIFT_JIS and similar). That doesn't mean it should be made easier to generate! From solipsis at pitrou.net Wed Jun 6 14:28:50 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 06 Jun 2012 14:28:50 +0200 Subject: [Python-ideas] Supporting already opened sockets in our socket-based server classes In-Reply-To: <4FCF0D21.2070208@ziade.org> References: <4FCF0D21.2070208@ziade.org> Message-ID: Le 06/06/2012 09:56, Tarek Ziad? a ?crit : > > What I am proposing is the following syntax: > > if the host passed to the class is of the form: > > fd://12 > > The class will try to create a socket object against the file descriptor > 12, and will not bind() it neither accept() it. Passing a pseudo-URL where a host name is expected sounds like a bad idea. Also, I don't understand the "neither accept() it" part. Surely you need to accept() incoming connections, so perhaps you mean "neither listen() it"? (also, I'm not sure calling listen() another time is a problem) Regards Antoine. From tarek at ziade.org Wed Jun 6 17:23:15 2012 From: tarek at ziade.org (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Wed, 06 Jun 2012 17:23:15 +0200 Subject: [Python-ideas] Supporting already opened sockets in our socket-based server classes In-Reply-To: References: <4FCF0D21.2070208@ziade.org> Message-ID: <4FCF75E3.2030805@ziade.org> On 6/6/12 2:28 PM, Antoine Pitrou wrote: > Le 06/06/2012 09:56, Tarek Ziad? a ?crit : >> >> What I am proposing is the following syntax: >> >> if the host passed to the class is of the form: >> >> fd://12 >> >> The class will try to create a socket object against the file descriptor >> 12, and will not bind() it neither accept() it. > > Passing a pseudo-URL where a host name is expected sounds like a bad idea. Well, unix sockets are using this convention to point paths to unix sockets. e.g. unix:///some/path in general, theURI scheme seems widely used out there, https://en.wikipedia.org/wiki/URI_scheme What do you propose ? another option ? > Also, I don't understand the "neither accept() it" part. Surely you > need to accept() incoming connections, so perhaps you mean "neither > listen() it"? > Yeah that was a typo -- I do listen() before I fork > (also, I'm not sure calling listen() another time is a problem) I don't think so, but the usual pattern I have seen is to call listen() before the forking > > Regards > > Antoine. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From solipsis at pitrou.net Wed Jun 6 19:05:39 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 06 Jun 2012 19:05:39 +0200 Subject: [Python-ideas] Supporting already opened sockets in our socket-based server classes In-Reply-To: <4FCF75E3.2030805@ziade.org> References: <4FCF0D21.2070208@ziade.org> <4FCF75E3.2030805@ziade.org> Message-ID: Le 06/06/2012 17:23, Tarek Ziad? a ?crit : > > Well, unix sockets are using this convention to point paths to unix > sockets. > > e.g. unix:///some/path Which unix sockets? In socketserver? > in general, theURI scheme seems widely used out there, > > https://en.wikipedia.org/wiki/URI_scheme My point is that if the parameter is currently a hostname, it isn't a URI (AFAIK). Starting to mix both concepts could quickly become confusing. > What do you propose ? another option ? I think that's better indeed. Regards Antoine. From mwm at mired.org Wed Jun 6 19:46:18 2012 From: mwm at mired.org (Mike Meyer) Date: Wed, 6 Jun 2012 13:46:18 -0400 Subject: [Python-ideas] Supporting already opened sockets in our socket-based server classes In-Reply-To: <4FCF75E3.2030805@ziade.org> References: <4FCF0D21.2070208@ziade.org> <4FCF75E3.2030805@ziade.org> Message-ID: <20120606134618.2cb9613e@bhuda.mired.org> On Wed, 06 Jun 2012 17:23:15 +0200 Tarek Ziad? wrote: > On 6/6/12 2:28 PM, Antoine Pitrou wrote: > > Le 06/06/2012 09:56, Tarek Ziad? a ?crit : > >> > >> What I am proposing is the following syntax: > >> > >> if the host passed to the class is of the form: > >> > >> fd://12 > >> > >> The class will try to create a socket object against the file descriptor > >> 12, and will not bind() it neither accept() it. > > > > Passing a pseudo-URL where a host name is expected sounds like a bad idea. > > Well, unix sockets are using this convention to point paths to unix sockets. > > e.g. unix:///some/path I think what you're trying to achieve has merit, but you're doing it in the wrong place. Using a URL-like string instead of a host name? Really? So how about a new subclass, "PreForkedTCPServer", that takes the file descriptor instead of the host/port pair when created? You'd probably want to tweak the class tree somewhat, but that seems like a more palatable API for what you're trying to do. http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From tarek at ziade.org Wed Jun 6 23:45:18 2012 From: tarek at ziade.org (=?UTF-8?B?VGFyZWsgWmlhZMOp?=) Date: Wed, 06 Jun 2012 23:45:18 +0200 Subject: [Python-ideas] Supporting already opened sockets in our socket-based server classes In-Reply-To: <20120606134618.2cb9613e@bhuda.mired.org> References: <4FCF0D21.2070208@ziade.org> <4FCF75E3.2030805@ziade.org> <20120606134618.2cb9613e@bhuda.mired.org> Message-ID: <4FCFCF6E.4050808@ziade.org> On 6/6/12 7:46 PM, Mike Meyer wrote: > On Wed, 06 Jun 2012 17:23:15 +0200 > Tarek Ziad? wrote: > >> On 6/6/12 2:28 PM, Antoine Pitrou wrote: >>> Le 06/06/2012 09:56, Tarek Ziad? a ?crit : >>>> What I am proposing is the following syntax: >>>> >>>> if the host passed to the class is of the form: >>>> >>>> fd://12 >>>> >>>> The class will try to create a socket object against the file descriptor >>>> 12, and will not bind() it neither accept() it. >>> Passing a pseudo-URL where a host name is expected sounds like a bad idea. >> Well, unix sockets are using this convention to point paths to unix sockets. >> >> e.g. unix:///some/path > I think what you're trying to achieve has merit, but you're doing it > in the wrong place. Using a URL-like string instead of a host name? > Really? > > So how about a new subclass, "PreForkedTCPServer", that takes the file > descriptor instead of the host/port pair when created? You'd probably > want to tweak the class tree somewhat, but that seems like a more > palatable API for what you're trying to do. Yeah that makes sense. will try this - thanks for the feedback > > Howdy! Was teaching a new user to Python the ropes a short while ago and ran into an interesting headspace problem: the for/else syntax fails the obviousness and consistency tests. When used in an if/else block the conditional code is executed if the conditional passes, and the else block is executed if the conditional fails. Compared to for loops where the for code is repeated and the else code executed if we "naturally fall off the loop". (The new user's reaction was "why the hoek would I ever use for/else?") I forked Python 3.3 to experiment with an alternate implementation that follows the logic of pass/fail implied by if/else: (and to refactor the stdlib, but that's a different issue ;) for x in range(20): if x > 10: break else: pass # we had no values to iterate finally: pass # we naturally fell off the loop It abuses finally (to avoid tying up a potentially common word as a reserved word like "done") but makes possible an important distinction without having to perform potentially expensive length calculations (which may not even be possible!) on the value being iterated: that is, handling the case where there were no values in the collection or returned by the generator. Templating engines generally implement this type of structure. Of course this type of breaking change in semantics puts this idea firmly into Python 4 land. I'll isolate the for/else/finally code from my fork and post a patch this week-end, hopefully. ? Alice. From steve at pearwood.info Thu Jun 7 01:45:36 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 07 Jun 2012 09:45:36 +1000 Subject: [Python-ideas] for/else statements considered harmful In-Reply-To: References: Message-ID: <4FCFEBA0.7040009@pearwood.info> Alice Bevan?McGregor wrote: > Howdy! > > Was teaching a new user to Python the ropes a short while ago and ran > into an interesting headspace problem: the for/else syntax fails the > obviousness and consistency tests. When used in an if/else block the > conditional code is executed if the conditional passes, and the else > block is executed if the conditional fails. Compared to for loops where > the for code is repeated and the else code executed if we "naturally > fall off the loop". (The new user's reaction was "why the hoek would I > ever use for/else?") Yes, I love for/else and while/else but regret the name. The else is conceptually unlike the else in if/else, and leads to the common confusion that the else suite if the iterable is empty. > I forked Python 3.3 to experiment with an alternate implementation that > follows the logic of pass/fail implied by if/else: (and to refactor the > stdlib, but that's a different issue ;) > > for x in range(20): > if x > 10: break > else: > pass # we had no values to iterate > finally: > pass # we naturally fell off the loop +10000 :) > It abuses finally (to avoid tying up a potentially common word as a > reserved word like "done") but makes possible an important distinction > without having to perform potentially expensive length calculations > (which may not even be possible!) on the value being iterated: that is, > handling the case where there were no values in the collection or > returned by the generator. > > Templating engines generally implement this type of structure. Of > course this type of breaking change in semantics puts this idea firmly > into Python 4 land. Sadly, yes. Where were you when Python 3.0 was still being planned? :) > I'll isolate the for/else/finally code from my fork and post a patch > this week-end, hopefully. > > ? Alice. Many thanks. -- Steven From bruce at leapyear.org Thu Jun 7 01:58:50 2012 From: bruce at leapyear.org (Bruce Leban) Date: Wed, 6 Jun 2012 16:58:50 -0700 Subject: [Python-ideas] for/else statements considered harmful In-Reply-To: References: Message-ID: If we could go back in time I would completely agree. But since we can't, flipping meaning of else would be too error inducing and therefore not at all likely. So at risk of bike shedding I would suggest for ... [ else not: ] else [ finally ] : If a context-sensitive keyword would work I'd go for something more like for ... [ else empty: ] else [ no match ] : This would not introduce any incompatibilities. --- Bruce (from my phone) On Jun 6, 2012 4:31 PM, "Alice Bevan?McGregor" wrote: > Howdy! > > Was teaching a new user to Python the ropes a short while ago and ran into > an interesting headspace problem: the for/else syntax fails the obviousness > and consistency tests. When used in an if/else block the conditional code > is executed if the conditional passes, and the else block is executed if > the conditional fails. Compared to for loops where the for code is > repeated and the else code executed if we "naturally fall off the loop". > (The new user's reaction was "why the hoek would I ever use for/else?") > > I forked Python 3.3 to experiment with an alternate implementation that > follows the logic of pass/fail implied by if/else: (and to refactor the > stdlib, but that's a different issue ;) > > for x in range(20): > if x > 10: break > else: > pass # we had no values to iterate > finally: > pass # we naturally fell off the loop > > It abuses finally (to avoid tying up a potentially common word as a > reserved word like "done") but makes possible an important distinction > without having to perform potentially expensive length calculations (which > may not even be possible!) on the value being iterated: that is, handling > the case where there were no values in the collection or returned by the > generator. > > Templating engines generally implement this type of structure. Of course > this type of breaking change in semantics puts this idea firmly into Python > 4 land. > > I'll isolate the for/else/finally code from my fork and post a patch this > week-end, hopefully. > > ? Alice. > > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Thu Jun 7 02:15:18 2012 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 07 Jun 2012 01:15:18 +0100 Subject: [Python-ideas] for/else statements considered harmful In-Reply-To: References: Message-ID: <4FCFF296.5090404@mrabarnett.plus.com> On 07/06/2012 00:20, Alice Bevan?McGregor wrote: > Howdy! > > Was teaching a new user to Python the ropes a short while ago and ran > into an interesting headspace problem: the for/else syntax fails the > obviousness and consistency tests. When used in an if/else block the > conditional code is executed if the conditional passes, and the else > block is executed if the conditional fails. Compared to for loops > where the for code is repeated and the else code executed if we > "naturally fall off the loop". (The new user's reaction was "why the > hoek would I ever use for/else?") > I find the easiest way to think of it is imagine you're searching a list. If you find what you're looking for you break, else you do something else. From cmjohnson.mailinglist at gmail.com Thu Jun 7 02:44:04 2012 From: cmjohnson.mailinglist at gmail.com (Carl M. Johnson) Date: Wed, 6 Jun 2012 14:44:04 -1000 Subject: [Python-ideas] for/else statements considered harmful In-Reply-To: References: Message-ID: <3392757C-81F5-4AE8-B3DC-303EA5693066@gmail.com> On Jun 6, 2012, at 1:58 PM, Bruce Leban wrote: > If a context-sensitive keyword would work I'd go for something more like > > for ... > [ else empty: ] > else [ no match ] : > > This would not introduce any incompatibilities. Since None is now a keyword, you could say "else if None" but that might be confusing, since None is different than empty. From ncoghlan at gmail.com Thu Jun 7 02:53:22 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 7 Jun 2012 10:53:22 +1000 Subject: [Python-ideas] for/else statements considered harmful In-Reply-To: References: Message-ID: On Thu, Jun 7, 2012 at 9:58 AM, Bruce Leban wrote: > If we could go back in time I would completely agree. But since we can't, > flipping meaning of else would be too error inducing and therefore not at > all likely. The meaning of the "else:" clause on for and while loops is actually much closer to the sense in "try/except/else" sense than it is to the sense in "if/else". Consider the following: for x in range(20): if x > 10: break else: # Reached the end of the loop As an approximate short hand for: class BreakLoop(Exception): pass try: for x in range(20): if x > 10: raise BreakLoop except BreakLoop: pass else: # Reached the end of the loop It's not implemented anything like that (and the analogy doesn't hold in many other respects), but in terms of the semantics of the respective else clauses it's an exact match. Part of the problem is that the "else:" clause on while loops is often explained as follows (and I've certainly been guilty of this), which I now think exacerbates the confusion rather than reducing it: The following code: x = 0 while x < 10: x += 1 if x == y: break else: # Made it to 10 Can be seen as equivalent to: x = 0 while 1: if x < 10: pass else: # Made it to 10 x += 1 if x == y: break This actually ends up reinforcing the erroneous connection to if statements, when we really need to be encouraging people to think of this clause in terms of try statements, with "break" playing the role of an exception being raised. So I think what we actually have is a documentation problem where we need to be actively encouraging the "while/else", "for/else" -> "try/except/else" link and discouraging any attempts to think of this construct in terms of if statements (as that is a clear recipe for confusion). If anything were to change at the language level, my preference would be to further reinforce the try/except/else connection by allowing an "except break" clause: for x in range(20): if x > 10: break except break: # Bailed out early else: # Reached the end of the loop To critique the *specific* proposal presented at the start of the thread, there are three main problems with it: 1. It doesn't match the expected semantics of a "finally:" clause. In try/finally the finally clause executes regardless of how the suite execution is terminated (whether via an exception, reaching the end of the suite, or leaving the suite early via a return, break or continue control flow statement). That is explicitly not the case here (as a loop's else clause only executes in the case of normal loop termination - which precisely matches the semantics of the else clause in try/except/else) 2. As Bruce pointed out, the meaning of the else: clause on loops can't be changed as it would break backwards compatibility with existing code 3. The post doesn't explain how the proposed change in semantics also makes sense for while loops Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From andre.roberge at gmail.com Thu Jun 7 03:02:32 2012 From: andre.roberge at gmail.com (Andre Roberge) Date: Wed, 6 Jun 2012 22:02:32 -0300 Subject: [Python-ideas] for/else statements considered harmful In-Reply-To: References: Message-ID: On Wed, Jun 6, 2012 at 9:53 PM, Nick Coghlan wrote: > On Thu, Jun 7, 2012 at 9:58 AM, Bruce Leban wrote: > > If we could go back in time I would completely agree. But since we can't, > > flipping meaning of else would be too error inducing and therefore not at > > all likely. > > SNIP > > If anything were to change at the language level, my preference would > be SNIP My preference would be for a new keyword: nobreak This would work well with for/else and while/else which would become for/nobreak and while/nobreak I think that anyone reading while ... .... nobreak: some statements would (more) immediately understand that "some statements" are going to be executed if no break occurred in the above block. But I doubt that something like this will ever be considered even though it could be introduced now without breaking any code (other than that which uses "nobreak" as a variable ... which should be rare) by making it first a duplicate of the for/else and while/else construction which would be slowly deprecated. Just my 0.02$ ... Andr? -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Thu Jun 7 03:27:14 2012 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 07 Jun 2012 02:27:14 +0100 Subject: [Python-ideas] for/else statements considered harmful In-Reply-To: References: Message-ID: <4FD00372.2050804@mrabarnett.plus.com> On 07/06/2012 02:02, Andre Roberge wrote: > On Wed, Jun 6, 2012 at 9:53 PM, Nick Coghlan > wrote: > > On Thu, Jun 7, 2012 at 9:58 AM, Bruce Leban > wrote: > > If we could go back in time I would completely agree. But since > we can't, > > flipping meaning of else would be too error inducing and > therefore not at > > all likely. > > SNIP > > > If anything were to change at the language level, my preference would > be > > SNIP > > My preference would be for a new keyword: nobreak > > This would work well with for/else and while/else which would become > for/nobreak and while/nobreak > > I think that anyone reading > > while ... > .... > nobreak: > some statements > > would (more) immediately understand that "some statements" are going to > be executed if no break occurred in the above block. > > But I doubt that something like this will ever be considered even though > it could be introduced now without breaking any code (other than that > which uses "nobreak" as a variable ... which should be rare) by making > it first a duplicate of the for/else and while/else construction which > would be slowly deprecated. > How about "not break"? :-) From donspauldingii at gmail.com Thu Jun 7 03:59:15 2012 From: donspauldingii at gmail.com (Don Spaulding) Date: Wed, 6 Jun 2012 20:59:15 -0500 Subject: [Python-ideas] for/else statements considered harmful In-Reply-To: <4FCFF296.5090404@mrabarnett.plus.com> References: <4FCFF296.5090404@mrabarnett.plus.com> Message-ID: On Wed, Jun 6, 2012 at 7:15 PM, MRAB wrote: > On 07/06/2012 00:20, Alice Bevan?McGregor wrote: > >> Howdy! >> >> Was teaching a new user to Python the ropes a short while ago and ran >> into an interesting headspace problem: the for/else syntax fails the >> obviousness and consistency tests. When used in an if/else block the >> conditional code is executed if the conditional passes, and the else >> block is executed if the conditional fails. Compared to for loops >> where the for code is repeated and the else code executed if we >> "naturally fall off the loop". (The new user's reaction was "why the >> hoek would I ever use for/else?") >> >> I find the easiest way to think of it is imagine you're searching a > list. If you find what you're looking for you break, else you do > something else. I think the problem is that "break" doesn't sound like a positive, it sounds like a negative, and indeed it means we effectively *ignore* the rest of the list. So when you get to the "else" it's like an English double-negative, awkward to understand. Perhaps even more because you're effectively else-ing the break, not the for, so the indentation level even seems off. Backwards-compatibility issues aside, renaming "else" to "finally" sounds like a really great idea. -------------- next part -------------- An HTML attachment was scrubbed... URL: From andre.roberge at gmail.com Thu Jun 7 04:01:36 2012 From: andre.roberge at gmail.com (Andre Roberge) Date: Wed, 6 Jun 2012 23:01:36 -0300 Subject: [Python-ideas] for/else statements considered harmful In-Reply-To: References: <4FCFF296.5090404@mrabarnett.plus.com> Message-ID: On Wed, Jun 6, 2012 at 10:59 PM, Don Spaulding wrote: > > > On Wed, Jun 6, 2012 at 7:15 PM, MRAB wrote: > >> On 07/06/2012 00:20, Alice Bevan?McGregor wrote: >> >>> Howdy! >>> >>> Was teaching a new user to Python the ropes a short while ago and ran >>> into an interesting headspace problem: the for/else syntax fails the >>> obviousness and consistency tests. When used in an if/else block the >>> conditional code is executed if the conditional passes, and the else >>> block is executed if the conditional fails. Compared to for loops >>> where the for code is repeated and the else code executed if we >>> "naturally fall off the loop". (The new user's reaction was "why the >>> hoek would I ever use for/else?") >>> >>> I find the easiest way to think of it is imagine you're searching a >> list. If you find what you're looking for you break, else you do >> something else. > > > I think the problem is that "break" doesn't sound like a positive, it > sounds like a negative, and indeed it means we effectively *ignore* the > rest of the list. So when you get to the "else" it's like an English > double-negative, awkward to understand. Perhaps even more because you're > effectively else-ing the break, not the for, so the indentation level even > seems off. > > Backwards-compatibility issues aside, renaming "else" to "finally" sounds > like a really great idea. > No: "finally" implies that it is going to be done at the end of the block; the "else" clause is *not* executed if a break occurs - hence it has a different semantics. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rurpy at yahoo.com Thu Jun 7 04:34:05 2012 From: rurpy at yahoo.com (Rurpy) Date: Wed, 6 Jun 2012 19:34:05 -0700 (PDT) Subject: [Python-ideas] changing sys.stdout encoding In-Reply-To: <87pq9cal6t.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <1339036445.8918.YahooMailClassic@web161502.mail.bf1.yahoo.com> On 06/06/2012 02:39 AM, Stephen J. Turnbull wrote: > Rurpy writes: > > > But even if not, I'm sorry, compared with running a single > > command with an encoding option, I think messing with > > environment variables is not really a workable solution. > > You have a workable 2-line solution, which you posted.' Please don't misunderstand why I posted... as you say, my code now works fine and I understand how to handle this problem when I encounter it in the future. I took the time to post here because it took an inordinate amount of effort to find a solution to a legitimate need (your opinion to the contrary not withstanding) and the resulting code which should have been trivially simple and obvious, wasn't. It is a minor issue but the end result of experiences like this, although infrequent, is often "WTF, why is this simple and reasonable thing so hard to do?". And after a few times some programmers will start to wonder if maybe Python is not really an industrial-strength language -- one that they can be effective all the time, even when the problem falls outside the 95% demographic. (And I am not talking about things totally out of python's scope like high performance computing or systems programming.) > It's ugly and > hard to find, and it should be, to discourage people from thinking > it's something they might *want* to do. But they shouldn't; people in > multilingual environments should be using UTF-8 externally unless they > have really really special needs (and even then they should probably > be using UTF-8 embedded in markup that serves those needs). I wanted to do it because it was the correct design choice. The suggestion that to redesign an entire existing technical and personnel infrastructure to use utf-8, is a better choice is, well, never mind. It is not the place of language designers to intentionally make it hard to solve legitimate problems. There *are* other encodings in the world, there will be for sometime to come, and some programmers will sometimes have to deal with that. Non-utf-8 encodings are not so evil (except in the minds of some zealots) that working with them conveniently should be made difficult. (I am reminded of the Unix zealots of days past who refused to deal with Windows line endings.) The way I chose to deal with the encoding requirements I had was the correct way. It's unfortunate that Python makes it uglier than it should be. The discussion seems to be going off topic for this list. I understand there is no support here for providing a non- obscure, programmatic way of changing the encoding of the standard streams at program startup and that's fine, it was a suggestion. Thank you all for the feedback. From anikom15 at gmail.com Thu Jun 7 04:45:14 2012 From: anikom15 at gmail.com (Westley =?iso-8859-1?Q?Mart=EDnez?=) Date: Wed, 6 Jun 2012 19:45:14 -0700 Subject: [Python-ideas] for/else statements considered harmful In-Reply-To: References: Message-ID: <20120607024514.GA13028@kubrick> On Wed, Jun 06, 2012 at 07:20:07PM -0400, Alice Bevan?McGregor wrote: > > for x in range(20): > if x > 10: break > else: > pass # we had no values to iterate > finally: > pass # we naturally fell off the loop > -1 for me. The idea that finally is executed only when we naturally fall off the loop is weird. finally suggests that it will always be executed, like in a try/finally clause. I think the naming of else is weird but can be understood. If a change is a must I believe else should keep its semantic and simply be renamed except, but I am +0 on that. All in all the use cases would be extremely rare if existant. I've never actually seen a for/else or while/else block. From nathan at cmu.edu Thu Jun 7 06:29:46 2012 From: nathan at cmu.edu (Nathan Schneider) Date: Wed, 6 Jun 2012 21:29:46 -0700 Subject: [Python-ideas] for/else statements considered harmful In-Reply-To: References: Message-ID: On Wed, Jun 6, 2012 at 5:53 PM, Nick Coghlan wrote: > On Thu, Jun 7, 2012 at 9:58 AM, Bruce Leban wrote: >> If we could go back in time I would completely agree. But since we can't, >> flipping meaning of else would be too error inducing and therefore not at >> all likely. > > The meaning of the "else:" clause on for and while loops is actually > much closer to the sense in "try/except/else" sense than it is to the > sense in "if/else". > > Consider the following: > > ? ?for x in range(20): > ? ? ? ?if x > 10: > ? ? ? ? ? ?break > ? ?else: > ? ? ? ?# Reached the end of the loop > > As an approximate short hand for: > > ? ?class BreakLoop(Exception): pass > > ? ?try: > ? ? ? ?for x in range(20): > ? ? ? ? ? ?if x > 10: > ? ? ? ? ? ? ? ?raise BreakLoop > ? ?except BreakLoop: > ? ? ? ?pass > ? ?else: > ? ? ? ?# Reached the end of the loop > > It's not implemented anything like that (and the analogy doesn't hold > in many other respects), but in terms of the semantics of the > respective else clauses it's an exact match. > > Part of the problem is that the "else:" clause on while loops is often > explained as follows (and I've certainly been guilty of this), which I > now think exacerbates the confusion rather than reducing it: > > The following code: > ? ?x = 0 > ? ?while x < 10: > ? ? ? ?x += 1 > ? ? ? ?if x == y: > ? ? ? ? ? break > ? ?else: > ? ? ? ?# Made it to 10 > > Can be seen as equivalent to: > > ? ?x = 0 > ? ?while 1: > ? ? ? ?if x < 10: > ? ? ? ? ? ?pass > ? ? ? ?else: > ? ? ? ? ? ?# Made it to 10 > ? ? ? ?x += 1 > ? ? ? ?if x == y: > ? ? ? ? ? break > > This actually ends up reinforcing the erroneous connection to if > statements, when we really need to be encouraging people to think of > this clause in terms of try statements, with "break" playing the role > of an exception being raised. > > So I think what we actually have is a documentation problem where we > need to be actively encouraging the "while/else", "for/else" -> > "try/except/else" link and discouraging any attempts to think of this > construct in terms of if statements (as that is a clear recipe for > confusion). > > If anything were to change at the language level, my preference would > be to further reinforce the try/except/else connection by allowing an > "except break" clause: > > ? ?for x in range(20): > ? ? ? ?if x > 10: > ? ? ? ? ? ?break > ? ?except break: > ? ? ? ?# Bailed out early > ? ?else: > ? ? ? ?# Reached the end of the loop I like this proposal, or perhaps while ...: ... with break: # Bailed out early else: # Reached the end of the loop ...which avoids any conceptual baggage associated with exception handling, at some risk of making people think of context managers. For what it's worth, I don't use the loop version of 'else' to avoid confusing myself (or the reader of my code). But in my experience the use case 'else' is intended to solve is probably less common than (a) checking whether the loop was ever entered, and (b) checking from within the loop body whether it is the first iteration. Nathan > To critique the *specific* proposal presented at the start of the > thread, there are three main problems with it: > > 1. It doesn't match the expected semantics of a "finally:" clause. In > try/finally the finally clause executes regardless of how the suite > execution is terminated (whether via an exception, reaching the end of > the suite, or leaving the suite early via a return, break or continue > control flow statement). That is explicitly not the case here (as a > loop's else clause only executes in the case of normal loop > termination - which precisely matches the semantics of the else clause > in try/except/else) > 2. As Bruce pointed out, the meaning of the else: clause on loops > can't be changed as it would break backwards compatibility with > existing code > 3. The post doesn't explain how the proposed change in semantics also > makes sense for while loops > > Cheers, > Nick. > > -- > Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From tjreedy at udel.edu Thu Jun 7 06:31:30 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 07 Jun 2012 00:31:30 -0400 Subject: [Python-ideas] for/else statements considered harmful In-Reply-To: References: Message-ID: On 6/6/2012 7:20 PM, Alice Bevan?McGregor wrote: > Howdy! > > Was teaching a new user to Python the ropes a short while ago and ran > into an interesting headspace problem: the for/else syntax fails the > obviousness and consistency tests. I disagree. The else clause is executed when the condition (explicit in while loops, implicit in for loops) is false. Consider the following implementation of while loops in a lower-level pseudo-python: label startloop if condition: do_something() goto startloop else: do_else_stuff() This is *exactly* equivalent to while condition: do_something() else: do_else)_stuff() In fact, the absolute goto is how while is implemented in assembler languages, include CPython bytecode. If one converts a for-loop to a while-loop, you will see the same thing. CPython bytecode for for-loops is a little more condensed, with a higher level FOR_ITER code. It tries to get the next item if there is one and catches the exception and jumps if not. (It also handles and hides the fact that there are two iterator protocols.) But still, an absolute 'goto startloop' jump back up to FOR_ITER is added to the end of the 'if next' suite, just as with while-loops. -- Terry Jan Reedy From p.f.moore at gmail.com Thu Jun 7 08:27:36 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 7 Jun 2012 07:27:36 +0100 Subject: [Python-ideas] changing sys.stdout encoding In-Reply-To: <1339036445.8918.YahooMailClassic@web161502.mail.bf1.yahoo.com> References: <87pq9cal6t.fsf@uwakimon.sk.tsukuba.ac.jp> <1339036445.8918.YahooMailClassic@web161502.mail.bf1.yahoo.com> Message-ID: On 7 June 2012 03:34, Rurpy wrote: > It is a minor issue but the end result of experiences > like this, although infrequent, is often "WTF, why is > this simple and reasonable thing so hard to do?". ?And > after a few times some programmers will start to wonder > if maybe Python is not really an industrial-strength > language -- one that they can be effective all the time, > even when the problem falls outside the 95% demographic. > (And I am not talking about things totally out of > python's scope like high performance computing or > systems programming.) One suggestion, which would probably shed some light on whether this should be viewed as something "simple and reasonable", would be to do some research on how the same task would be achieved in other languages. I have no experience to contribute but my intuition says that this could well be hard on other languages too. Would you be willing to do some web searches to look for solutions in (say) Java, or C#, or Ruby? In theory, it shouldn't take long (as otherwise you can conclude that the solution is obscure to the same extent that it is with Python). Even better, if those other languages do have a simple solution, it may suggest an approach that would be appropriate for Python. Paul. From stephen at xemacs.org Thu Jun 7 09:12:26 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 07 Jun 2012 16:12:26 +0900 Subject: [Python-ideas] changing sys.stdout encoding In-Reply-To: <1339036445.8918.YahooMailClassic@web161502.mail.bf1.yahoo.com> References: <87pq9cal6t.fsf@uwakimon.sk.tsukuba.ac.jp> <1339036445.8918.YahooMailClassic@web161502.mail.bf1.yahoo.com> Message-ID: <87fwa7a945.fsf@uwakimon.sk.tsukuba.ac.jp> Rurpy writes: > I took the time to post here because it took an inordinate > amount of effort to find a solution to a legitimate need > (your opinion to the contrary not withstanding) I don't think I said the need was illegitimate, if I did I apologize, and I certainly don't believe it is (I'm an economist by trade -- de gustibus non est disputandum). I just don't think it's necessary for Python to try to address the problem, because the problem is somebody else's bad design at root. And I don't think it would be wise to try to do it in a very general way, because it's very hard to do that at the general level of the language. > I understand there is no support here for providing a non- > obscure, programmatic way of changing the encoding of the > standard streams at program startup You're wrong. There is *some* support for that. It just has to be done safely, and that means that a generic .set_encoding() method that can be called after I/O has been performed probably isn't going to happen. And it might not happen at the core level, since a 3-line function can do the job, it might make just as much sense to put up a package on PyPI. From ubershmekel at gmail.com Thu Jun 7 09:23:05 2012 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Thu, 7 Jun 2012 10:23:05 +0300 Subject: [Python-ideas] for/else statements considered harmful In-Reply-To: References: Message-ID: We had quite a lengthy discussion on for/else in October 2009 http://mail.python.org/pipermail/python-ideas/2009-October/thread.html#5924 Guido mentioned: > I would not have the feature at all if I had to do it over. I would *not* > choose another keyword. But I don't see the same level of danger in it that > some here see. > I am also against adding a syntax warning for this [[loops with else but > without break]]. It belongs in pylint etc. http://mail.python.org/pipermail/python-ideas/2009-October/006157.html Personally I'd prefer "if not break:" over "else:" but as we're stuck where we are today I'm just going to encourage people not to use the construct at all. Yuval -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeanpierreda at gmail.com Thu Jun 7 09:57:35 2012 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Thu, 7 Jun 2012 03:57:35 -0400 Subject: [Python-ideas] for/else statements considered harmful In-Reply-To: References: Message-ID: On Thu, Jun 7, 2012 at 3:23 AM, Yuval Greenfield wrote: > Personally I'd prefer "if not break:" over "else:" but as we're stuck where > we are today I'm just going to encourage people not to use the construct at > all. Why shouldn't people use for-else? -- Devin From ubershmekel at gmail.com Thu Jun 7 10:31:22 2012 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Thu, 7 Jun 2012 11:31:22 +0300 Subject: [Python-ideas] for/else statements considered harmful In-Reply-To: References: Message-ID: On Thu, Jun 7, 2012 at 10:57 AM, Devin Jeanpierre wrote: > On Thu, Jun 7, 2012 at 3:23 AM, Yuval Greenfield > wrote: > > Personally I'd prefer "if not break:" over "else:" but as we're stuck > where > > we are today I'm just going to encourage people not to use the construct > at > > all. > > Why shouldn't people use for-else? > > -- Devin > For-else/while-else are confusing. During the previous discussion even the construct's proponents have fallen to its misleading nature. The word "else" alone just doesn't fit its role here no matter how intricate and carefully constructed an example is given to explain its nature or rationale. I believe using for/else will cause you and maintainers of your code to make more mistakes. Yuval -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeanpierreda at gmail.com Thu Jun 7 11:28:49 2012 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Thu, 7 Jun 2012 05:28:49 -0400 Subject: [Python-ideas] for/else statements considered harmful In-Reply-To: References: Message-ID: On Thu, Jun 7, 2012 at 4:31 AM, Yuval Greenfield wrote: > On Thu, Jun 7, 2012 at 10:57 AM, Devin Jeanpierre > wrote: >> >> On Thu, Jun 7, 2012 at 3:23 AM, Yuval Greenfield >> wrote: >> > Personally I'd prefer "if not break:" over "else:" but as we're stuck >> > where >> > we are today I'm just going to encourage people not to use the construct >> > at >> > all. >> >> Why shouldn't people use for-else? >> >> -- Devin > > > I believe using for/else will cause you and maintainers of your code to make > more mistakes. I don't follow. What mistakes would people make? Why would they make them? Also, are you worried about people that read the documentation and know what for-else does, or the people that don't or haven't read this documentation? It's good practice to, when reading source code of an unfamiliar language, try to read up on things you haven't seen yet -- although sometimes context seems good enough. If you are afraid that this is someplace that context _seems_ good enough, but actually _isn't_, that would be something to worry about (although I don't feel that way). -- Devin From stephen at xemacs.org Thu Jun 7 12:50:14 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 07 Jun 2012 19:50:14 +0900 Subject: [Python-ideas] for/else statements considered harmful In-Reply-To: References: Message-ID: <87aa0f9z15.fsf@uwakimon.sk.tsukuba.ac.jp> Devin Jeanpierre writes: > On Thu, Jun 7, 2012 at 4:31 AM, Yuval Greenfield wrote: > > I believe using for/else will cause you and maintainers of your > > code to make more mistakes. > > I don't follow. What mistakes would people make? Why would they > make them? There was a long thread about a year ago on this list, where a couple of less experienced programmers and even a couple of people who have long since proven themselves reliable, gave code examples that obviously hadn't been tested. There's a summary at: http://grokbase.com/t/python/python-ideas/09abg9k5fc/summary-of-for-else-threads The reason they make such mistakes is that there's a strong association of "else" with "if-then-else", and for many people that seems to be somewhere between totally useless and actively misleading. For me, there are a number of reasonable mnemonics, a couple given in this thread, but IIRC the only idiom I found really plausible was def search_in_iterable(key, iter): for item in iter: if item == key: return some_function_of(item) else: return not_found_default From ubershmekel at gmail.com Thu Jun 7 13:01:18 2012 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Thu, 7 Jun 2012 14:01:18 +0300 Subject: [Python-ideas] for/else statements considered harmful In-Reply-To: <87aa0f9z15.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87aa0f9z15.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Thu, Jun 7, 2012 at 1:50 PM, Stephen J. Turnbull wrote: > def search_in_iterable(key, iter): > for item in iter: > if item == key: > return some_function_of(item) > else: > return not_found_default > > You don't need the "else" there. An equivalent: def search_in_iterable(key, iter): for item in iter: if item == key: return some_function_of(item) return not_found_default I'm not sure I understood what you meant but I'll assume that by "plausible"/"reasonable" you meant that it's a good example as to how for/else is misleading. Devin Jeanpierre Wrote: > Also, are you worried about people that read the documentation and > know what for-else does, or the people that don't or haven't read this > documentation? On this issue I'm worried about all sentient programmers. Yuval -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeanpierreda at gmail.com Thu Jun 7 14:01:52 2012 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Thu, 7 Jun 2012 08:01:52 -0400 Subject: [Python-ideas] for/else statements considered harmful In-Reply-To: <87aa0f9z15.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87aa0f9z15.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Thu, Jun 7, 2012 at 6:50 AM, Stephen J. Turnbull wrote: > There was a long thread about a year ago on this list, where a couple > of less experienced programmers and even a couple of people who have > long since proven themselves reliable, gave code examples that > obviously hadn't been tested. This is disappointing. for-else is simple, even if it has an ambiguous name. > The reason they make such mistakes is that there's a strong > association of "else" with "if-then-else", and for many people that > seems to be somewhere between totally useless and actively misleading. I know it's really bad form to shift goalposts, but I can't help but offer an alternative hypothesis: What if it isn't that else is confusing, but that use of else is rare? People have lots of silly beliefs about things they never use, or haven't used in a very long time. > For me, there are a number of reasonable mnemonics, a couple given in > this thread, but IIRC the only idiom I found really plausible was I think of "else" as a collective/delayed else to the if statement in the body of the loop (which is almost always present). This only works for for loops though. Pretty much every single for-else has almost exactly the same form, though, so... it's pretty easy to use specialized models like that. :) -- Devin From ncoghlan at gmail.com Thu Jun 7 15:04:29 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 7 Jun 2012 23:04:29 +1000 Subject: [Python-ideas] for/else statements considered harmful In-Reply-To: References: <87aa0f9z15.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Thu, Jun 7, 2012 at 10:01 PM, Devin Jeanpierre wrote: > On Thu, Jun 7, 2012 at 6:50 AM, Stephen J. Turnbull wrote: >> The reason they make such mistakes is that there's a strong >> association of "else" with "if-then-else", and for many people that >> seems to be somewhere between totally useless and actively misleading. > > I know it's really bad form to shift goalposts, but I can't help but > offer an alternative hypothesis: What if it isn't that else is > confusing, but that use of else is rare? People have lots of silly > beliefs about things they never use, or haven't used in a very long > time. FWIW, I just added the following paragraph to the relevant section of the Python tutorial in 2.7, 3.2 and 3.3: ================= When used with a loop, the ``else`` clause has more in common with the ``else`` clause of a :keyword:`try` statement than it does that of :keyword:`if` statements: a :keyword:`try` statement's ``else`` clause runs when no exception occurs, and a loop's ``else`` clause runs when no ``break`` occurs. For more on the :keyword:`try` statement and exceptions, see :ref:`tut-handling`. ================= The new text should appear in the respective online versions as part of the next daily docs rebuild. It may not help much, but it won't hurt, and the "exceptional else" is a much better parallel than trying to make loop else clauses fit the "conditional else" mental model. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From alice at gothcandy.com Thu Jun 7 15:06:38 2012 From: alice at gothcandy.com (=?utf-8?Q?Alice_Bevan=E2=80=93McGregor?=) Date: Thu, 7 Jun 2012 09:06:38 -0400 Subject: [Python-ideas] for/else statements considered harmful References: Message-ID: So the subject of the thread seems to hold true. Average developers are confused by the current semantic (a problem that needs more than abstract p-code to correct) to the point of actively avoiding use of the structure. I agree, however, that breaking all existing code is probably bad. ;) On 2012-06-07 00:53:22 +0000, Nick Coghlan said: > for x in range(20): > if x > 10: > break > except break: > # Bailed out early > else: > # Reached the end of the loop Seems a not insignifigant number of readers got fixated on the alternate keyword for the current behaviour of else (finally in my example) and ignored or misinterpreted the -really important part- of being able to detect if the loop was skipped (no iterations performed; else in my example). Being able to have a block executed if the loop is never entered is vitally important so you can avoid expensive or potentially impossible length checks on the iterator before the loop. Take this example: sock = lsock.accept() for chunk in iter(partial(sock.recv, 4096), ''): pass # do something with the chunk else: pass # no data recieved before client hangup! Using a temporary varable to simulate this is? unfortunate. sock = lsock.accept() has_data = False for chunk in iter(partial(sock.recv, 4096), ''): has_data = True pass # do something with the chunk if not has_data: pass # no data recieved before client hangup! empty woud be a good keyword to preserve the existing meaning of else, but I'm pretty sure that's a fairly common variable name. :/ ? Alice. From ethan at stoneleaf.us Thu Jun 7 15:14:36 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 07 Jun 2012 06:14:36 -0700 Subject: [Python-ideas] for/else statements considered harmful In-Reply-To: References: <87aa0f9z15.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4FD0A93C.6090502@stoneleaf.us> Devin Jeanpierre wrote: > On Thu, Jun 7, 2012 at 6:50 AM, Stephen J. Turnbull wrote: >> The reason they make such mistakes is that there's a strong >> association of "else" with "if-then-else", and for many people that >> seems to be somewhere between totally useless and actively misleading. > > I know it's really bad form to shift goalposts, but I can't help but > offer an alternative hypothesis: What if it isn't that else is > confusing, but that use of else is rare? People have lots of silly > beliefs about things they never use, or haven't used in a very long > time. I use the for/else and while/else constructs, and still get them wrong -- the association with if/else is very strong for me, and my usage pattern is more along the lines of "if this iterable was empty at the start...". I appreciate the correlation with except/else, and the failed search idea -- those should help me keep these straight even before my tests fail. ;) ~Ethan~ From ncoghlan at gmail.com Thu Jun 7 15:36:08 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 7 Jun 2012 23:36:08 +1000 Subject: [Python-ideas] for/else statements considered harmful In-Reply-To: References: Message-ID: On Thu, Jun 7, 2012 at 11:06 PM, Alice Bevan?McGregor wrote: > On 2012-06-07 00:53:22 +0000, Nick Coghlan said: >> >> ? ?for x in range(20): >> ? ? ? ?if x > 10: >> ? ? ? ? ? ?break >> ? ?except break: >> ? ? ? ?# Bailed out early >> ? ?else: >> ? ? ? ?# Reached the end of the loop > > > Seems a not insignifigant number of readers got fixated on the alternate > keyword for the current behaviour of else (finally in my example) and > ignored or misinterpreted the -really important part- of being able to > detect if the loop was skipped (no iterations performed; else in my > example). > > Being able to have a block executed if the loop is never entered is vitally > important so you can avoid expensive or potentially impossible length checks > on the iterator before the loop. ?Take this example: > > ? sock = lsock.accept() > ? for chunk in iter(partial(sock.recv, 4096), ''): > ? ? ? pass # do something with the chunk > ? else: > ? ? ? pass # no data recieved before client hangup! > > Using a temporary varable to simulate this is? unfortunate. > > ? sock = lsock.accept() > ? has_data = False > ? for chunk in iter(partial(sock.recv, 4096), ''): > ? ? ? has_data = True > ? ? ? pass # do something with the chunk > > ? if not has_data: > ? ? ? pass # no data recieved before client hangup! > > empty woud be a good keyword to preserve the existing meaning of else, but > I'm pretty sure that's a fairly common variable name. ?:/ Yeah, it's usually fairly important on here to separate out "this is the problem I see" from "this is a proposed solution". Getting agreement on the former is usually easier than the latter, since there are so many additional constraints that come into play when it comes to considering solutions. And if we can't even reach agreement that a problem needs to be solved, then talking about solution details isn't especially productive (although it can be fun to speculate about the possibilities anyway). FWIW, I usually solve this particular problem with for loops by using the iteration variable itself to hold a sentinel value: sock = lsock.accept() chunk = None for chunk in iter(partial(sock.recv, 4096), ''): pass # do something with the chunk if chunk is None: pass # no data recieved before client hangup! If "None" is a possible value in the iterable, then I'll use a dedicated sentinel value instead: var = sentinel = object() for var in iterable: ... if var is sentinel: ... I've never found either of those constructs ugly enough to particularly want dedicated syntax to replace it, and the availability of this approach is what makes it especially difficult to push for dedicated syntactic support (since all that can really be saved is the assignment that sets up the sentinel value). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ethan at stoneleaf.us Thu Jun 7 15:21:24 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 07 Jun 2012 06:21:24 -0700 Subject: [Python-ideas] for/else statements considered harmful In-Reply-To: References: Message-ID: <4FD0AAD4.3070108@stoneleaf.us> Alice Bevan?McGregor wrote: > Being able to have a block executed if the loop is never entered is > vitally important so you can avoid expensive or potentially impossible > length checks on the iterator before the loop. Take this example: > > sock = lsock.accept() > for chunk in iter(partial(sock.recv, 4096), ''): > pass # do something with the chunk > else: > pass # no data recieved before client hangup! This is, indeed, the usual way I try to use these contructs... > Using a temporary varable to simulate this is? unfortunate. > > sock = lsock.accept() > has_data = False > for chunk in iter(partial(sock.recv, 4096), ''): > has_data = True > pass # do something with the chunk > > if not has_data: > pass # no data recieved before client hangup! and this is how I usually work around it. :( ~Ethan~ From arnodel at gmail.com Thu Jun 7 16:00:39 2012 From: arnodel at gmail.com (Arnaud Delobelle) Date: Thu, 7 Jun 2012 15:00:39 +0100 Subject: [Python-ideas] for/else statements considered harmful In-Reply-To: References: Message-ID: On 7 June 2012 00:20, Alice Bevan?McGregor wrote: > Howdy! > > Was teaching a new user to Python the ropes a short while ago and ran into > an interesting headspace problem: the for/else syntax fails the obviousness > and consistency tests. ?When used in an if/else block the conditional code > is executed if the conditional passes, and the else block is executed if the > conditional fails. ?Compared to for loops where the for code is repeated and > the else code executed if we "naturally fall off the loop". ?(The new user's > reaction was "why the hoek would I ever use for/else?") My solution: don't talk about a for/else construct, but talk about a for/break/else block instead. Then the semantics become obvious again. > I forked Python 3.3 to experiment with an alternate implementation that > follows the logic of pass/fail implied by if/else: (and to refactor the > stdlib, but that's a different issue ;) > > ? for x in range(20): > ? ? ? if x > 10: break > ? else: > ? ? ? pass # we had no values to iterate > ? finally: > ? ? ? pass # we naturally fell off the loop > > It abuses finally (to avoid tying up a potentially common word as a reserved > word like "done") but makes possible an important distinction without having > to perform potentially expensive length calculations (which may not even be > possible!) on the value being iterated: that is, handling the case where > there were no values in the collection or returned by the generator. I think your use of finally is as unfortunate as the current use of else: usually, finally is *always* executed, irrespective of what happened in the try block. Your new use goes against that. Cheers, -- Arnaud From ethan at stoneleaf.us Thu Jun 7 15:52:49 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 07 Jun 2012 06:52:49 -0700 Subject: [Python-ideas] for/else statements considered harmful In-Reply-To: References: Message-ID: <4FD0B231.6010502@stoneleaf.us> Nick Coghlan wrote: > On Thu, Jun 7, 2012 at 11:06 PM, Alice Bevan?McGregor > wrote: >> On 2012-06-07 00:53:22 +0000, Nick Coghlan said: >>> for x in range(20): >>> if x > 10: >>> break >>> except break: >>> # Bailed out early >>> else: >>> # Reached the end of the loop >> >> Seems a not insignifigant number of readers got fixated on the alternate >> keyword for the current behaviour of else (finally in my example) and >> ignored or misinterpreted the -really important part- of being able to >> detect if the loop was skipped (no iterations performed; else in my >> example). >> >> Being able to have a block executed if the loop is never entered is vitally >> important so you can avoid expensive or potentially impossible length checks >> on the iterator before the loop. Take this example: >> >> sock = lsock.accept() >> for chunk in iter(partial(sock.recv, 4096), ''): >> pass # do something with the chunk >> else: >> pass # no data recieved before client hangup! >> >> Using a temporary varable to simulate this is? unfortunate. >> >> sock = lsock.accept() >> has_data = False >> for chunk in iter(partial(sock.recv, 4096), ''): >> has_data = True >> pass # do something with the chunk >> >> if not has_data: >> pass # no data recieved before client hangup! >> >> empty woud be a good keyword to preserve the existing meaning of else, but >> I'm pretty sure that's a fairly common variable name. :/ > > Yeah, it's usually fairly important on here to separate out "this is > the problem I see" from "this is a proposed solution". Getting > agreement on the former is usually easier than the latter, since there > are so many additional constraints that come into play when it comes > to considering solutions. And if we can't even reach agreement that a > problem needs to be solved, then talking about solution details isn't > especially productive (although it can be fun to speculate about the > possibilities anyway). > > FWIW, I usually solve this particular problem with for loops by using > the iteration variable itself to hold a sentinel value: > > sock = lsock.accept() > chunk = None > for chunk in iter(partial(sock.recv, 4096), ''): > pass # do something with the chunk > > if chunk is None: > pass # no data recieved before client hangup! > > If "None" is a possible value in the iterable, then I'll use a > dedicated sentinel value instead: > > var = sentinel = object() > for var in iterable: > ... > if var is sentinel: > ... > > I've never found either of those constructs ugly enough to > particularly want dedicated syntax to replace it, and the availability > of this approach is what makes it especially difficult to push for > dedicated syntactic support (since all that can really be saved is the > assignment that sets up the sentinel value). This seems like a good work-around (meaning: I'll definitely use it, thanks!), but it does not address the confusion issues. I think the main problem with the current while/else, for/else is two-fold: 1) we have two failure states (empty from the start, and desired result not met), and 2) even though the else is more similar to the else in try/except/else, it is formatted *just like* the if/else. Perhaps the solution is to enhance for and while with except? sock = lsock.accept() for chunk in iter(partial(sock.recv, 4096), ''): pass # do something with the chunk except: pass # no data recieved before client hangup! else: pass # wrap-up processing on chunks ~Ethan~ From mwm at mired.org Thu Jun 7 17:30:11 2012 From: mwm at mired.org (Mike Meyer) Date: Thu, 7 Jun 2012 11:30:11 -0400 Subject: [Python-ideas] for/else statements considered harmful In-Reply-To: <4FD0B231.6010502@stoneleaf.us> References: <4FD0B231.6010502@stoneleaf.us> Message-ID: <20120607113011.69e3d5e5@bhuda.mired.org> On Thu, 07 Jun 2012 06:52:49 -0700 Ethan Furman wrote: > I think the main problem with the current while/else, for/else is > two-fold: 1) we have two failure states (empty from the start, and > desired result not met), and 2) even though the else is more similar to > the else in try/except/else, it is formatted *just like* the if/else. I'd say we have 1.5 failure states, because the desired result is not met in both cases. In my experience, the general case (that else handles) is more common than the special case of the iterator being empty. > Perhaps the solution is to enhance for and while with except? > > sock = lsock.accept() > for chunk in iter(partial(sock.recv, 4096), ''): > pass # do something with the chunk > except: > pass # no data recieved before client hangup! > else: > pass # wrap-up processing on chunks Calling it "wrap-up processing" seems likely to cause people to think about it as meaning "finally". But if the else clause is not executed if the except clause is (as done by try/except/else), then there's no longer an easy way to describe it. It seems like adding an except would change the conditions under which the else clause is executed (unlike try/except/else), as otherwise there's no easy way capture the current behavior, where else is executed whenever there are no chunks left to process. But that kind of things seems like a way to introduce bugs. http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From alice at gothcandy.com Thu Jun 7 17:52:10 2012 From: alice at gothcandy.com (=?utf-8?Q?Alice_Bevan=E2=80=93McGregor?=) Date: Thu, 7 Jun 2012 11:52:10 -0400 Subject: [Python-ideas] for/else statements considered harmful References: <4FD0B231.6010502@stoneleaf.us> <20120607113011.69e3d5e5@bhuda.mired.org> Message-ID: On 2012-06-07 15:30:11 +0000, Mike Meyer said: > Calling it "wrap-up processing" seems likely to cause people to think > about it as meaning "finally". But if the else clause is not executed > if the except clause is (as done by try/except/else), then there's no > longer an easy way to describe it. > > It seems like adding an except would change the conditions under which > the else clause is executed (unlike try/except/else), as otherwise > there's no easy way capture the current behavior, where else is > executed whenever there are no chunks left to process. But that kind > of things seems like a way to introduce bugs. Well, how about: for in : pass # process each except: # no arguments! pass # nothing to process else: pass # fell through finally: pass # regardless of break/fallthrough/empty Now for loops perfectly match try/except/else/finally! >:D (Like exception handling, finally would be called even with an inner return from any of the prior sections.) ? Alice. From guido at python.org Thu Jun 7 18:26:06 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 7 Jun 2012 09:26:06 -0700 Subject: [Python-ideas] for/else statements considered harmful In-Reply-To: References: <87aa0f9z15.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Thu, Jun 7, 2012 at 6:04 AM, Nick Coghlan wrote: > FWIW, I just added the following paragraph to the relevant section of > the Python tutorial in 2.7, 3.2 and 3.3: > > ================= > When used with a loop, the ``else`` clause has more in common with the > ``else`` clause of a :keyword:`try` statement than it does that of > :keyword:`if` statements: a :keyword:`try` statement's ``else`` clause runs > when no exception occurs, and a loop's ``else`` clause runs when no ``break`` > occurs. For more on the :keyword:`try` statement and exceptions, see > :ref:`tut-handling`. > ================= I like this. Let's not change the syntax. -- --Guido van Rossum (python.org/~guido) From mwm at mired.org Thu Jun 7 18:29:01 2012 From: mwm at mired.org (Mike Meyer) Date: Thu, 7 Jun 2012 12:29:01 -0400 Subject: [Python-ideas] for/else statements considered harmful In-Reply-To: References: <4FD0B231.6010502@stoneleaf.us> <20120607113011.69e3d5e5@bhuda.mired.org> Message-ID: <20120607122901.6c8bfe42@bhuda.mired.org> On Thu, 7 Jun 2012 11:52:10 -0400 Alice Bevan?McGregor wrote: > On 2012-06-07 15:30:11 +0000, Mike Meyer said: > > Calling it "wrap-up processing" seems likely to cause people to think > > about it as meaning "finally". But if the else clause is not executed > > if the except clause is (as done by try/except/else), then there's no > > longer an easy way to describe it. > > > > It seems like adding an except would change the conditions under which > > the else clause is executed (unlike try/except/else), as otherwise > > there's no easy way capture the current behavior, where else is > > executed whenever there are no chunks left to process. But that kind > > of things seems like a way to introduce bugs. > > Well, how about: > > for in : > pass # process each > except: # no arguments! > pass # nothing to process > else: > pass # fell through > finally: > pass # regardless of break/fallthrough/empty > > Now for loops perfectly match try/except/else/finally! >:D (Like > exception handling, finally would be called even with an inner return > from any of the prior sections.) For for (and don't forget while) loops, finally is pointless. It's the same as code after the loop. For try, finally runs even if there's an exception, which isn't true of that code. http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From python at mrabarnett.plus.com Thu Jun 7 18:32:45 2012 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 07 Jun 2012 17:32:45 +0100 Subject: [Python-ideas] for/else statements considered harmful In-Reply-To: References: <4FD0B231.6010502@stoneleaf.us> <20120607113011.69e3d5e5@bhuda.mired.org> Message-ID: <4FD0D7AD.4030408@mrabarnett.plus.com> On 07/06/2012 16:52, Alice Bevan?McGregor wrote: > On 2012-06-07 15:30:11 +0000, Mike Meyer said: >> Calling it "wrap-up processing" seems likely to cause people to think >> about it as meaning "finally". But if the else clause is not executed >> if the except clause is (as done by try/except/else), then there's no >> longer an easy way to describe it. >> >> It seems like adding an except would change the conditions under which >> the else clause is executed (unlike try/except/else), as otherwise >> there's no easy way capture the current behavior, where else is >> executed whenever there are no chunks left to process. But that kind >> of things seems like a way to introduce bugs. > > Well, how about: > > for in: > pass # process each > except: # no arguments! > pass # nothing to process > else: > pass # fell through > finally: > pass # regardless of break/fallthrough/empty > > Now for loops perfectly match try/except/else/finally!>:D (Like > exception handling, finally would be called even with an inner return > from any of the prior sections.) > Is the "finally" clause really necessary? Is it just the same as putting it after the loop? From python at mrabarnett.plus.com Thu Jun 7 18:45:22 2012 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 07 Jun 2012 17:45:22 +0100 Subject: [Python-ideas] for/else statements considered harmful In-Reply-To: <4FD0D7AD.4030408@mrabarnett.plus.com> References: <4FD0B231.6010502@stoneleaf.us> <20120607113011.69e3d5e5@bhuda.mired.org> <4FD0D7AD.4030408@mrabarnett.plus.com> Message-ID: <4FD0DAA2.5000800@mrabarnett.plus.com> On 07/06/2012 17:32, MRAB wrote: > On 07/06/2012 16:52, Alice Bevan?McGregor wrote: >> On 2012-06-07 15:30:11 +0000, Mike Meyer said: >>> Calling it "wrap-up processing" seems likely to cause people to think >>> about it as meaning "finally". But if the else clause is not executed >>> if the except clause is (as done by try/except/else), then there's no >>> longer an easy way to describe it. >>> >>> It seems like adding an except would change the conditions under which >>> the else clause is executed (unlike try/except/else), as otherwise >>> there's no easy way capture the current behavior, where else is >>> executed whenever there are no chunks left to process. But that kind >>> of things seems like a way to introduce bugs. >> >> Well, how about: >> >> for in: >> pass # process each >> except: # no arguments! >> pass # nothing to process >> else: >> pass # fell through >> finally: >> pass # regardless of break/fallthrough/empty >> >> Now for loops perfectly match try/except/else/finally!>:D (Like >> exception handling, finally would be called even with an inner return >> from any of the prior sections.) >> > Is the "finally" clause really necessary? Is it just the same as putting it > after the loop? > I've just noticed your remark about the finally clause being run even if there's a return. I can't say I like that; that's the job of try...finally. From guido at python.org Thu Jun 7 18:57:00 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 7 Jun 2012 09:57:00 -0700 Subject: [Python-ideas] for/else statements considered harmful In-Reply-To: <4FD0DAA2.5000800@mrabarnett.plus.com> References: <4FD0B231.6010502@stoneleaf.us> <20120607113011.69e3d5e5@bhuda.mired.org> <4FD0D7AD.4030408@mrabarnett.plus.com> <4FD0DAA2.5000800@mrabarnett.plus.com> Message-ID: On Thu, Jun 7, 2012 at 9:45 AM, MRAB wrote: > On 07/06/2012 17:32, MRAB wrote: >> >> On 07/06/2012 16:52, Alice Bevan?McGregor wrote: >>> >>> ?On 2012-06-07 15:30:11 +0000, Mike Meyer said: >>>> >>>> ?Calling it "wrap-up processing" seems likely to cause people to think >>>> ?about it as meaning "finally". But if the else clause is not executed >>>> ?if the except clause is (as done by try/except/else), then there's no >>>> ?longer an easy way to describe it. >>>> >>>> ?It seems like adding an except would change the conditions under which >>>> ?the else clause is executed (unlike try/except/else), as otherwise >>>> ?there's no easy way capture the current behavior, where else is >>>> ?executed whenever there are no chunks left to process. But that kind >>>> ?of things seems like a way to introduce bugs. >>> >>> >>> ?Well, how about: >>> >>> ? ? ?for ? in: >>> ? ? ? ? ?pass # process each >>> ? ? ?except: ?# no arguments! >>> ? ? ? ? ?pass # nothing to process >>> ? ? ?else: >>> ? ? ? ? ?pass # fell through >>> ? ? ?finally: >>> ? ? ? ? ?pass # regardless of break/fallthrough/empty >>> >>> ?Now for loops perfectly match try/except/else/finally!>:D ?(Like >>> ?exception handling, finally would be called even with an inner return >>> ?from any of the prior sections.) >>> >> Is the "finally" clause really necessary? Is it just the same as putting >> it >> after the loop? >> > I've just noticed your remark about the finally clause being run even > if there's a return. I can't say I like that; that's the job of > try...finally. You can stop right there. This design is not going anywhere. -- --Guido van Rossum (python.org/~guido) From alice at gothcandy.com Thu Jun 7 20:04:11 2012 From: alice at gothcandy.com (=?utf-8?Q?Alice_Bevan=E2=80=93McGregor?=) Date: Thu, 7 Jun 2012 14:04:11 -0400 Subject: [Python-ideas] for/else statements considered harmful References: <4FD0B231.6010502@stoneleaf.us> <20120607113011.69e3d5e5@bhuda.mired.org> <20120607122901.6c8bfe42@bhuda.mired.org> Message-ID: On 2012-06-07 16:29:01 +0000, Mike Meyer said: > On Thu, 7 Jun 2012 11:52:10 -0400 > Alice Bevan?McGregor wrote: > >> Now for loops perfectly match try/except/else/finally! >:D (Like >> exception handling, finally would be called even with an inner return >> from any of the prior sections.) > > For for (and don't forget while) loops, finally is pointless. It's the > same as code after the loop. For try, finally runs even if there's an > exception, which isn't true of that code. I really should use parenthesis less as obviously people don't read the content between them. (Not just you, I'm afraid! ;^) If it weren't a useful feature (for/empty) I'm unsure as to why so many template engines implement it even though in most of them you _can_ utilize a sentinel value; at least, in the ones that allow embedded Python code. Alas, the BDFL has spoken, however. (Getting shot down was not unexpected despite the occasional +1000. ;) ? Alice. From python at mrabarnett.plus.com Thu Jun 7 20:18:12 2012 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 07 Jun 2012 19:18:12 +0100 Subject: [Python-ideas] for/else statements considered harmful In-Reply-To: References: <4FD0B231.6010502@stoneleaf.us> <20120607113011.69e3d5e5@bhuda.mired.org> <20120607122901.6c8bfe42@bhuda.mired.org> Message-ID: <4FD0F064.2050102@mrabarnett.plus.com> On 07/06/2012 19:04, Alice Bevan?McGregor wrote: > On 2012-06-07 16:29:01 +0000, Mike Meyer said: > >> On Thu, 7 Jun 2012 11:52:10 -0400 >> Alice Bevan?McGregor wrote: >> >>> Now for loops perfectly match try/except/else/finally!>:D (Like >>> exception handling, finally would be called even with an inner return >>> from any of the prior sections.) >> >> For for (and don't forget while) loops, finally is pointless. It's the >> same as code after the loop. For try, finally runs even if there's an >> exception, which isn't true of that code. > > I really should use parenthesis less as obviously people don't read the > content between them. (Not just you, I'm afraid! ;^) If it weren't a > useful feature (for/empty) I'm unsure as to why so many template > engines implement it even though in most of them you _can_ utilize a > sentinel value; at least, in the ones that allow embedded Python code. > > Alas, the BDFL has spoken, however. (Getting shot down was not > unexpected despite the occasional +1000. ;) > It was the comment about the "finally" clause always running which was the problem, not about running a clause when there was an empty sequence. From rurpy at yahoo.com Thu Jun 7 22:48:24 2012 From: rurpy at yahoo.com (Rurpy) Date: Thu, 7 Jun 2012 13:48:24 -0700 (PDT) Subject: [Python-ideas] changing sys.stdout encoding In-Reply-To: <87fwa7a945.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <1339102104.17621.YahooMailClassic@web161503.mail.bf1.yahoo.com> On 06/07/2012 01:12 AM, Stephen J. Turnbull wrote: > Rurpy writes: > > > I took the time to post here because it took an inordinate > > amount of effort to find a solution to a legitimate need > > (your opinion to the contrary not withstanding) > > I don't think I said the need was illegitimate, if I did I apologize, > and I certainly don't believe it is (I'm an economist by trade -- de > gustibus non est disputandum). > > I just don't think it's necessary for Python to try to address the > problem, because the problem is somebody else's bad design at root. I don't understand that argument. The world is full of bad design that Python has to address: daylight savings time, calendars, floating-point (according to some). Good/bad design is not even constant and changes with time. There is still a telnetlib module in stdlib despite the existence of ssh. I suspect the vast majority of programmers are interested in a language that allows them to *effectively* get done what they need to, whether they are working of the latest agile TTD REST server, or modifying some legacy text files. What I for one *don't* need is to have my programming language enforcing its idea of CS political correctness on me. Secondly, the disparity in ease of use of an alternate encoding on sts.stdout is not really between utf8 and non-utf8, it is between a default encoding (which may be non-utf8), and the encoding I wish to use. So one can't really attribute it to a desire to improve the world by making non-utf8 harder to use! And even were I to accept your argument, Python is inconsistent: when I open a file explicitly there is only a slight penalty for opening a non-default-encoded file (the need the explicitly give an encoding): f = open ("myfile", "w") # my default utf8 encoding print ("text string", file=f) vs f = open ("myfile", "w", encoding="sjis") # non-utf8 print ("text string", file=f) But for sys.stdout, the penalty for using an alternate encoding is to google around for a solution (which may not be optimal as Victor Stinner pointed out) and then read about codecs and the StreamWriter wrapper, textio wrappers and the .buffer() method. And the reading part is then repeated by all those (at the same level of python expertise) who read the program. All I can do is repeat what I said before: non-utf8 codings exist and are widely used. That's a simple fact. Sample some .jp web sites and look at the ratio of shift-jis web pages to utf-8 web pages for example. utf-8 is an encoding. shift-jis is an encoding. Sure, I understand that utf-8 is preferable and I will use it when possible. The fact that I am writing shift-jis means that utf-8 *isn't* possible in this case. Since utf-8 and shift-jis are both encodings and are equivalent from a coding viewpoint (a simple choice of which codec to use) the discrepancy in ease of use between the two in the case of writing to the standard streams is not justifiable and should be corrected if possible. > And I don't think it would be wise to try to do it in a very general > way, because it's very hard to do that at the general level of the > language. But is it? Or are you referring to switching encoding on-the-fly? (see below). > > I understand there is no support here for providing a non- > > obscure, programmatic way of changing the encoding of the > > standard streams at program startup > > You're wrong. There is *some* support for that. > > It just has to be done safely, and that means that a generic > .set_encoding() method that can be called after I/O has been performed > probably isn't going to happen. There are two sub-threads in this discussion 1) Providing a more convenient and discoverable way to programmatically change the encoding of std* streams before first use. 2) Changing the encoding used on the std* stream or any textio stream on the fly as a generalization of (1). I thought I made clear I was advocating for (1) and not (2) when I earlier wrote in reply to you: > You are correct that my current concern is reinitializing > the encoding(s) of the sys.std* streams prior to doing any > operations with them. and to MRAB: > Disclaimer: As I said before, I am not particularly > advocating for a for a set_encoding() method -- my > primary suggestion is a programatic way to change the > sys.std* encodings prior to first use. As for (2), you have pointed out some potential issues with switching encodings midstream. I don't understand how codecs work in Python sufficiently yet to either agree or disagree with you. I have however questioned some of the statements made regarding its difficulty (and am holding my opinion open until I understand the issues better), but I am not (as I've stated) advocating for it now. Sorry if I failed to make the distinction clearer. My use of .set_encoding() as a placeholder for both ideas probably contributed to the confusion. > And it might not happen at the core level, since a 3-line function can > do the job, it might make just as much sense to put up a package on > PyPI. I wasn't suggesting a change to the core level (if by that you mean to the interpreter). I was asking if some way could be provided that is easier and more reliable than googling around for a magic incantation) to change the encoding of one or more of the already-open-when-my-program-starts sys.std* streams. I presume that would be a standard library change (in either the io or sys modules) and offered a .set_encoding() method as a placeholder for discussion. I hardly think it is worth the effort, for either the producer or consumers, of putting a 3-line function on PyPI. Nor would such a solution address the discoverability and ease-of-use problems I am complaining about. An inferior and bare minimum way to address this would be to at least add a note about how to change the encoding to the sys.std* documentation. That encourages cargo-cult programming and doesn't address the WTF effect but it is at least better than the current state of affairs. From mwm at mired.org Thu Jun 7 23:00:47 2012 From: mwm at mired.org (Mike Meyer) Date: Thu, 7 Jun 2012 17:00:47 -0400 Subject: [Python-ideas] changing sys.stdout encoding In-Reply-To: <1339102104.17621.YahooMailClassic@web161503.mail.bf1.yahoo.com> References: <87fwa7a945.fsf@uwakimon.sk.tsukuba.ac.jp> <1339102104.17621.YahooMailClassic@web161503.mail.bf1.yahoo.com> Message-ID: On Thu, Jun 7, 2012 at 4:48 PM, Rurpy wrote: > I suspect the vast majority of > programmers are interested in a language that allows > them to *effectively* get done what they need to, whether > they are working of the latest agile TTD REST server, or > modifying some legacy text files. Others have raised the question this begs to have answered: how do other programming languages deal with wanting to change the encoding of the standard IO streams? Can you show us how they do things that's so much easier than what Python does? > And even were I to accept your argument, Python is > inconsistent: when I open a file explicitly there is > only a slight penalty for opening a non-default-encoded > file (the need the explicitly give an encoding): The proper encoding for the standard IO streams is generally a property of the environment, and hence is set in the environment. You have a use case where that's not the case. The argument is that your use case isn't common enough to justify changing the standard library. Can you provide evidence to the contrary? Other languages that make setting the encoding on the standard streams easy, or applications outside of those built for your system that have a "--encoding" type flag? > I wasn't suggesting a change to the core level (if by that > you mean to the interpreter). ?I was asking if some way could > be provided that is easier and more reliable than googling > around for a magic incantation) to change the encoding of one > or more of the already-open-when-my-program-starts sys.std* > streams. ?I presume that would be a standard library change > (in either the io or sys modules) and offered a .set_encoding() > method as a placeholder for discussion. Why presume that this needs a change in the library? The method is straightforward, if somewhat ugly. Is there any reason it can't just be documented, instead of added to the library? Changing the library would require a similar documentation change. Message-ID: <1339102933.79273.YahooMailClassic@web161506.mail.bf1.yahoo.com> On 06/07/2012 12:27 AM, Paul Moore wrote: > One suggestion, which would probably shed some light on whether this > should be viewed as something "simple and reasonable", would be to do > some research on how the same task would be achieved in other > languages. Yes, that is a good idea. If I decide to reraise this suggestion at some point, I will try to do as you suggest. > I have no experience to contribute but my intuition says > that this could well be hard on other languages too. Again, I have yet to be convinced this is hard. I am very sceptical it is hard in the case of streams before they've been written or read. Replacing sys.stdout with a wrapper that encodes with the alternate encoding clearly works -- it just needs to be encapsulated so the user doesn't need to figure out all the details in order to use it. > Would you be > willing to do some web searches to look for solutions in (say) Java, > or C#, or Ruby? In theory, it shouldn't take long (as otherwise you > can conclude that the solution is obscure to the same extent that it > is with Python). > > Even better, if those other languages do have a simple solution, it > may suggest an approach that would be appropriate for Python. From ncoghlan at gmail.com Thu Jun 7 23:45:55 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 8 Jun 2012 07:45:55 +1000 Subject: [Python-ideas] changing sys.stdout encoding In-Reply-To: <1339102933.79273.YahooMailClassic@web161506.mail.bf1.yahoo.com> References: <1339102933.79273.YahooMailClassic@web161506.mail.bf1.yahoo.com> Message-ID: The interpreter uses the standard streams internally, and they're one of the first things created during interpreter startup. User provided code doesn't start running until well after they're initialised. If user level code doesn't want those streams, it needs to replace them with something else. Cheers, Nick. -- Sent from my phone, thus the relative brevity :) On Jun 8, 2012 7:03 AM, "Rurpy" wrote: > On 06/07/2012 12:27 AM, Paul Moore wrote: > > One suggestion, which would probably shed some light on whether this > > should be viewed as something "simple and reasonable", would be to do > > some research on how the same task would be achieved in other > > languages. > > Yes, that is a good idea. If I decide to reraise this > suggestion at some point, I will try to do as you suggest. > > > I have no experience to contribute but my intuition says > > that this could well be hard on other languages too. > > Again, I have yet to be convinced this is hard. I am > very sceptical it is hard in the case of streams before > they've been written or read. Replacing sys.stdout > with a wrapper that encodes with the alternate encoding > clearly works -- it just needs to be encapsulated so the > user doesn't need to figure out all the details in order > to use it. > > > Would you be > > willing to do some web searches to look for solutions in (say) Java, > > or C#, or Ruby? In theory, it shouldn't take long (as otherwise you > > can conclude that the solution is obscure to the same extent that it > > is with Python). > > > > Even better, if those other languages do have a simple solution, it > > may suggest an approach that would be appropriate for Python. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rurpy at yahoo.com Fri Jun 8 02:14:29 2012 From: rurpy at yahoo.com (Rurpy) Date: Thu, 7 Jun 2012 17:14:29 -0700 (PDT) Subject: [Python-ideas] changing sys.stdout encoding Message-ID: <1339114469.94308.YahooMailClassic@web161502.mail.bf1.yahoo.com> On 06/07/2012 03:45 PM, Nick Coghlan wrote: > The interpreter uses the standard streams internally, and > they're one of the first things created during interpreter > startup. User provided code doesn't start running until well > after they're initialised. In other words, the stream objects referenced by sys.std* are opened before the user code runs? But if there are no operations on those streams until my user code runs, they are still in the same state they were after they were initialized, yes? So if one wanted to provide an "only before first use" set_encoding() function, why couldn't that function reexecute the codecs part of the initialization code a second time? Of course there would need to be some sort of flag that it could use to verify the stream was still in its initial state. > If user level code doesn't want those streams, it needs to > replace them with something else. Yes, this is what the code I googled up does: import codecs sys.stdout = codecs.getwriter(opts.encoding)(sys.stdout.buffer) But that code is not obvious to someone who has been able to do all his encoded IO (with the exception of sys.stdout) using just the encoding parameter of open(). Hence my question if some- thing like a set_encoding() method/function that would work on sys.stdout is feasible. I don't see an answer to that in your statement above. From nathan at cmu.edu Fri Jun 8 02:59:31 2012 From: nathan at cmu.edu (Nathan Schneider) Date: Thu, 7 Jun 2012 17:59:31 -0700 Subject: [Python-ideas] changing sys.stdout encoding In-Reply-To: <1339114469.94308.YahooMailClassic@web161502.mail.bf1.yahoo.com> References: <1339114469.94308.YahooMailClassic@web161502.mail.bf1.yahoo.com> Message-ID: On Thu, Jun 7, 2012 at 5:14 PM, Rurpy wrote: > On 06/07/2012 03:45 PM, Nick Coghlan wrote: >> The interpreter uses the standard streams internally, and >> they're one of the first things created during interpreter >> startup. User provided code doesn't start running until well >> after they're initialised. > > In other words, the stream objects referenced by sys.std* > are opened before the user code runs? > > But if there are no operations on those streams until my > user code runs, they are still in the same state they were > after they were initialized, yes? > > So if one wanted to provide an "only before first use" > set_encoding() function, why couldn't that function reexecute > the codecs part of the initialization code a second time? > Of course there would need to be some sort of flag that it > could use to verify the stream was still in its initial state. > >> If user level code doesn't want those streams, it needs to >> replace them with something else. > > Yes, this is what the code I googled up does: > ?import codecs > ?sys.stdout = codecs.getwriter(opts.encoding)(sys.stdout.buffer) What if codecs contained convenience methods for stdin and stdout? I.e. the above could be written more simply as import codecs codecs.encode_stdout(opts.encoding) This is much more memorable than the current option, and would also make life easier when working with fileinput (whose openhook argument can be set to control encoding of input *file* streams, but when it falls back to stdin this preference is ignored). > But that code is not obvious to someone who has been able to do > all his encoded IO (with the exception of sys.stdout) using just > the encoding parameter of open(). ?Hence my question if some- > thing like a set_encoding() method/function that would work on > sys.stdout is feasible. ?I don't see an answer to that in your > statement above. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From ncoghlan at gmail.com Fri Jun 8 03:01:26 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 8 Jun 2012 11:01:26 +1000 Subject: [Python-ideas] changing sys.stdout encoding In-Reply-To: <1339114469.94308.YahooMailClassic@web161502.mail.bf1.yahoo.com> References: <1339114469.94308.YahooMailClassic@web161502.mail.bf1.yahoo.com> Message-ID: On Fri, Jun 8, 2012 at 10:14 AM, Rurpy wrote: > On 06/07/2012 03:45 PM, Nick Coghlan wrote: >> If user level code doesn't want those streams, it needs to >> replace them with something else. > > Yes, this is what the code I googled up does: > ?import codecs > ?sys.stdout = codecs.getwriter(opts.encoding)(sys.stdout.buffer) > But that code is not obvious to someone who has been able to do > all his encoded IO (with the exception of sys.stdout) using just > the encoding parameter of open(). ?Hence my question if some- > thing like a set_encoding() method/function that would work on > sys.stdout is feasible. ?I don't see an answer to that in your > statement above. Right, I was only trying to explain why the standard streams are a special case - because they're also used by the interpreter, and it makes the startup process much simpler if the interpreter retains complete control over the way they're initialised (it's already complicated by the fact we need to get something half-usable in place as sys.stderr so that error reporting is possible while initialising them properly). It then becomes an application level operation to replace them if desired. We can (and do) make the internal standard stream initialisation configurable, but it then becomes a UI design problem to get something that balances flexibility against complexity. PYTHONIOENCODING (in association with OS utilities that make it possible to set an environment variable for a specific process invocation, as well as support in the subprocess module for passing a tailored environment to subprocesses) is our current solution. The interpreter design aims, first and foremost, to provide a simple and straightforward experience in POSIX environments that use UTF-8 everywhere (since that's the most sane approach available for migrating from a previously ASCII-based computing world). Windows is a bit trickier (due to the internal use of UTF-16 APIs and the lack of POSIX-style support for temporarily setting an environment variable when invoking a process from the shell), but correctly supporting that environment is also a very high priority. The fallback behaviours when these situations do not apply are designed to work best on systems that are, at least somewhat *locally* consistent. The real world is complex. Eventually, our answer has to be "handle it at the application level, there are too many variations for us to support it directly at the interpreter level". Currently, any standard stream encoding related problem that can't be handled with PYTHONIOENCODING is just such a situation. We know it sucks for multi-encoding environments, but those are a nightmare for a lot of reasons and are the main drivers behind the industry-wide effort to standardise on Unicode text handling, including universal encodings like UTF-8. So now we're down to the question of how much complexity we're willing to tolerate in the interpreter specifically for the sake of environments where: 1. The automatic standard stream encoding calculation gives the wrong answer 2. The PYTHONIOENCODING override is insufficient 3. The application being executed isn't already handling the problem 4. A -m executable helper module (or directly executable helper script) can't be used to initialise the standard streams correctly before continuing on to execute the requested application via the runpy module And the answer is "not much". About the only likely way forward I can see for streamlining this situation would be to treat this as another use case for http://bugs.python.org/issue14803, which proposes the ability to run snippets of Python code prior to execution of __main__. I do agree that "create a new IO object that is like this old IO object but with these settings changed" could probably do with a better official API, but such an API needs to be designed with a respect for the issues associated with changing encodings "on the fly" and ask serious questions about whether or not we should be encouraging that practice by making it easier than it is already. I thought I had posted a tracker issue to that effect, but I can't find it now. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From yselivanov.ml at gmail.com Fri Jun 8 04:57:29 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 7 Jun 2012 22:57:29 -0400 Subject: [Python-ideas] functools.partial Message-ID: Hello, While I was working on adding support for 'functools.partial' in PEP 362, I discovered that it doesn't do any sanity check on passed arguments upon creation. Example: def foo(a): pass p = partial(foo, 1, 2, 3) # this line will execute p() # this line will fail Is it a bug? Or is it a feature, because we deliberately don't do any checks because of performance issues? If the latter - I think it should be at least documented. - Yury From cmjohnson.mailinglist at gmail.com Fri Jun 8 05:14:42 2012 From: cmjohnson.mailinglist at gmail.com (Carl M. Johnson) Date: Thu, 7 Jun 2012 17:14:42 -1000 Subject: [Python-ideas] for/else statements considered harmful In-Reply-To: References: <4FD0B231.6010502@stoneleaf.us> <20120607113011.69e3d5e5@bhuda.mired.org> Message-ID: On Jun 7, 2012, at 5:52 AM, Alice Bevan?McGregor wrote: > Well, how about: > > for in : > pass # process each > except: # no arguments! > pass # nothing to process > else: > pass # fell through > finally: > pass # regardless of break/fallthrough/empty Finally is redundant, but what about an `except break:` as the opposite of `else`? From ncoghlan at gmail.com Fri Jun 8 05:40:25 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 8 Jun 2012 13:40:25 +1000 Subject: [Python-ideas] functools.partial In-Reply-To: References: Message-ID: On Fri, Jun 8, 2012 at 12:57 PM, Yury Selivanov wrote: > Hello, > > While I was working on adding support for 'functools.partial' in PEP 362, > I discovered that it doesn't do any sanity check on passed arguments > upon creation. > > Example: > > ? ?def foo(a): > ? ? ? ?pass > > ? ?p = partial(foo, 1, 2, 3) # this line will execute > > ? ?p() # this line will fail > > Is it a bug? ?Or is it a feature, because we deliberately don't do any checks > because of performance issues? ?If the latter - I think it should be at least > documented. Partly the latter, but also a matter of "this is hard to do, so we don't even try". There are many other "lazy execution" APIs with the same problem - they accept an arbitrary underlying callable, but you don't find out until you try to call it that the arguments don't match the parameters. This leads to errors being raised far away from the code that actually introduced the error. If you dig up some of the older PEP 362 discussions, you'll find that allowing developers to reduce this problem over time is the main reason the Signature.bind() method was added to the PEP. While I wouldn't recommend it for the base partial type, I could easily see someone using PEP 362 to create a "checked partial" that ensures arguments are valid as they get passed in rather than leaving the validation until the call is actually made. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From niki.spahiev at gmail.com Fri Jun 8 09:23:36 2012 From: niki.spahiev at gmail.com (Niki Spahiev) Date: Fri, 08 Jun 2012 10:23:36 +0300 Subject: [Python-ideas] changing sys.stdout encoding In-Reply-To: References: <87fwa7a945.fsf@uwakimon.sk.tsukuba.ac.jp> <1339102104.17621.YahooMailClassic@web161503.mail.bf1.yahoo.com> Message-ID: On 8.06.2012 00:00, Mike Meyer wrote: > The proper encoding for the standard IO streams is generally a > property of the environment, and hence is set in the environment. You > have a use case where that's not the case. The argument is that your > use case isn't common enough to justify changing the standard library. > Can you provide evidence to the contrary? Other languages that make > setting the encoding on the standard streams easy, or applications > outside of those built for your system that have a "--encoding" type > flag? Mercurial: ... --debug enable debugging output --debugger start debugger --encoding ENCODE set the charset encoding (default: UTF-8) --encodingmode MODE set the charset encoding mode (default: strict) --traceback always print a traceback on exception ... Niki From stephen at xemacs.org Fri Jun 8 10:04:27 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 08 Jun 2012 17:04:27 +0900 Subject: [Python-ideas] for/else statements considered harmful In-Reply-To: References: <87aa0f9z15.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87wr3i8c1g.fsf@uwakimon.sk.tsukuba.ac.jp> Yuval Greenfield writes: > On Thu, Jun 7, 2012 at 1:50 PM, Stephen J. Turnbull wrote: > > > def search_in_iterable(key, iter): > > for item in iter: > > if item == key: > > return some_function_of(item) > > else: > > return not_found_default > > > > > You don't need the "else" there. An equivalent: *You* don't need it. *I* like it, because it expresses the fact that returning a default is a necessary complement to the for loop. While this is something of a TOOWTDI violation, there are cases where else is needed to express the semantics, as well (eg, if the first return statement is replaced by "process(item); break"). From techtonik at gmail.com Fri Jun 8 10:16:34 2012 From: techtonik at gmail.com (anatoly techtonik) Date: Fri, 8 Jun 2012 11:16:34 +0300 Subject: [Python-ideas] Isolated (?transactional) exec (?subinterpreter) calls Message-ID: Hi, Having a lot of ideas is a curse, because I can barely follow up on them, but I try - I really read replies, just don't have enough energy to answer immediately (as it usually requires some time for research). Here is another one that ripes too long to become rotten: Make exec(code[, globals[, locals]]) calls consistent, optionally isolated from parent environment and transactional. Consistent: - it should not matter if the code is executed with command line interpreter or from exec(), code should not be modified to successfully run in exec if it successfully runs in intepreter session http://bugs.python.org/issue13557 http://bugs.python.org/issue14049 http://bugs.python.org/issue1167300 http://bugs.python.org/issue991196 Optionally isolated from parent environment: - a feature to execute user script in a snapshot of current environment and have a choice whenever to merge its modifications back to environment or not real user story - read system configuration settings, where optional detection rules are written in Python (Blender/SCons build scripts) - autodetection probes can affect environment while detection takes place and it can lead to more problems later (think of virtualenv on Python process level with defined data exchange protocol through globals/locals variables) Transactional: - well, if it is isolated - it is already transactional - an ability to discard results if an error or an exception inside exec() occurs - getting back to the state right before exec. -- anatoly t. From simon.sapin at kozea.fr Fri Jun 8 09:42:50 2012 From: simon.sapin at kozea.fr (Simon Sapin) Date: Fri, 08 Jun 2012 09:42:50 +0200 Subject: [Python-ideas] changing sys.stdout encoding In-Reply-To: References: <87fwa7a945.fsf@uwakimon.sk.tsukuba.ac.jp> <1339102104.17621.YahooMailClassic@web161503.mail.bf1.yahoo.com> Message-ID: <4FD1ACFA.3000707@kozea.fr> Le 08/06/2012 09:23, Niki Spahiev a ?crit : > Mercurial: > ... > --debug enable debugging output > --debugger start debugger > --encoding ENCODE set the charset encoding (default: UTF-8) > --encodingmode MODE set the charset encoding mode (default: strict) > --traceback always print a traceback on exception > ... From the man page: > HGENCODING > This overrides the default locale setting detected by Mercurial. > This setting is used to convert data including usernames, > changeset descriptions, tag names, and branches. This setting > can be overridden with the --encoding command-line option. I don?t know if this affects standard IO. -- Simon Sapin From ncoghlan at gmail.com Fri Jun 8 11:04:55 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 8 Jun 2012 19:04:55 +1000 Subject: [Python-ideas] Nudging beginners towards a more accurate mental model for loop else clauses Message-ID: (context for python-ideas: my recently checked in changes to the tutorial, that added the final paragraph to http://docs.python.org/tutorial/controlflow.html#break-and-continue-statements-and-else-clauses-on-loops) On Fri, Jun 8, 2012 at 5:29 PM, Stephen J. Turnbull wrote: > Note: reply-to set to python-ideas. > > Nick Coghlan writes: > > ?> The inaccuracies in the analogy are why this is in the tutorial, not the > ?> language reference. All 3 else clauses are really their own thing. > > Nick, for the purpose of the tutorial, actually there are 4 else > clauses: you need to distinguish *while* from *for*. ?It was much > easier for me to get confused about *for*. The only thing I'm trying to do with the tutorial update is to encourage beginners to be start thinking in terms of try/except/else when they first encounter for/break/else and while/break/else. That's it. Yes, ultimately once people fully understand how it works under the hood (including the loop-and-a-half construct for infinite while loops), they'll release it's actually closely related to conditionals as well, but anyone that places too much weight on the following obvious parallel is going to be confused for a long time. After all: if iterable: ... else: ... is *very* similar in appearance to: for x in iterable: ... else: ... I believe that parallel is 99% of the reason why people get confused about the meaning of the latter. The point of the tutorial update is to give readers a slight nudge towards thinking of the latter as: for x in iterable: ... except break: # Implicit in the semantics of loops pass else: ... Would it be worth adding the "except break:" clause to the language just to make it crystal clear what is actually going on? I don't think so, but it's still a handy way to explain the semantics while gently steering people away from linking for/else and if/else too closely. I actually agree all of the else clauses really *are* quite closely related (hence the consistent use of the same keyword), but the relationship is *not* the intuitively obvious one that comes to mind when you just look at the similarity in the concrete syntax specifically of for/else and if/else. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From jeanpierreda at gmail.com Fri Jun 8 11:14:45 2012 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Fri, 8 Jun 2012 05:14:45 -0400 Subject: [Python-ideas] for/else statements considered harmful In-Reply-To: <87wr3i8c1g.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87aa0f9z15.fsf@uwakimon.sk.tsukuba.ac.jp> <87wr3i8c1g.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Fri, Jun 8, 2012 at 4:04 AM, Stephen J. Turnbull wrote: > Yuval Greenfield writes: > ?> On Thu, Jun 7, 2012 at 1:50 PM, Stephen J. Turnbull wrote: > ?> > ?> > ? ?def search_in_iterable(key, iter): > ?> > ? ? ? ?for item in iter: > ?> > ? ? ? ? ? ?if item == key: > ?> > ? ? ? ? ? ? ? ?return some_function_of(item) > ?> > ? ? ? ?else: > ?> > ? ? ? ? ? ?return not_found_default > ?> > > ?> > > ?> You don't need the "else" there. An equivalent: > > *You* don't need it. ?*I* like it, because it expresses the fact that > returning a default is a necessary complement to the for loop. I've never been sure of what is good style here. It's comparable to these two things: def foo(): if bar(): return baz return quux def foo2(): if bar(): return baz else: return quux Is there some well-accepted rule of which to use? -- Devin From rob.cliffe at btinternet.com Fri Jun 8 11:44:55 2012 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Fri, 08 Jun 2012 10:44:55 +0100 Subject: [Python-ideas] Nudging beginners towards a more accurate mental model for loop else clauses In-Reply-To: References: Message-ID: <4FD1C997.9040003@btinternet.com> On 08/06/2012 10:04, Nick Coghlan wrote: > (context for python-ideas: my recently checked in changes to the > tutorial, that added the final paragraph to > http://docs.python.org/tutorial/controlflow.html#break-and-continue-statements-and-else-clauses-on-loops) > > On Fri, Jun 8, 2012 at 5:29 PM, Stephen J. Turnbull wrote: >> Note: reply-to set to python-ideas. >> >> Nick Coghlan writes: >> >> > The inaccuracies in the analogy are why this is in the tutorial, not the >> > language reference. All 3 else clauses are really their own thing. >> >> Nick, for the purpose of the tutorial, actually there are 4 else >> clauses: you need to distinguish *while* from *for*. It was much >> easier for me to get confused about *for*. > The only thing I'm trying to do with the tutorial update is to > encourage beginners to be start thinking in terms of try/except/else > when they first encounter for/break/else and while/break/else. That's > it. > > Yes, ultimately once people fully understand how it works under the > hood (including the loop-and-a-half construct for infinite while > loops), they'll release it's actually closely related to conditionals > as well, but anyone that places too much weight on the following > obvious parallel is going to be confused for a long time. After all: > > if iterable: > ... > else: > ... > > is *very* similar in appearance to: > > for x in iterable: > ... > else: > ... > > I believe that parallel is 99% of the reason why people get confused > about the meaning of the latter. > > The point of the tutorial update is to give readers a slight nudge > towards thinking of the latter as: > > for x in iterable: > ... > except break: # Implicit in the semantics of loops > pass > else: > ... > > Would it be worth adding the "except break:" clause to the language > just to make it crystal clear what is actually going on? I don't think > so, but it's still a handy way to explain the semantics while gently > steering people away from linking for/else and if/else too closely. I > actually agree all of the else clauses really *are* quite closely > related (hence the consistent use of the same keyword), but the > relationship is *not* the intuitively obvious one that comes to mind > when you just look at the similarity in the concrete syntax > specifically of for/else and if/else. > > Cheers, > Nick. > I think a better scheme would be to have more meaningful keywords or keyword-combinations, e.g. for x in iterable: # do stuff ifempty: # or perhaps ifnoiter: (also applicable to while loops) # do stuff #ifbreak: # do stuff #ifnobreak: # do stuff which would give all the flexibility while making it reasonably clear what was happening. Rob Cliffe From stephen at xemacs.org Fri Jun 8 13:11:56 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 08 Jun 2012 20:11:56 +0900 Subject: [Python-ideas] changing sys.stdout encoding In-Reply-To: <1339102104.17621.YahooMailClassic@web161503.mail.bf1.yahoo.com> References: <87fwa7a945.fsf@uwakimon.sk.tsukuba.ac.jp> <1339102104.17621.YahooMailClassic@web161503.mail.bf1.yahoo.com> Message-ID: <87vcj283cz.fsf@uwakimon.sk.tsukuba.ac.jp> Rurpy writes: > Python is inconsistent: Yup, and I said there is support for dealing with that inconsistency. At least I'm +1 and Nick's +0.5. So let's talk about what to do about it. Nick has a pretty good channel on the BFDL, and since he doesn't seem to like an addition to the stdlib here, it may not go far. But I don't see a reason to rule out stdlib changes yet. As far as I'm concerned, there are three reasonable proposals: > > [S]ince a 3-line function can do the job, it might make just as > > much sense to put up a package on PyPI. > I hardly think it is worth the effort, for either the producer > or consumers, of putting a 3-line function on PyPI. Nor would > such a solution address the discoverability and ease-of-use > problems I am complaining about. Agreed that it's pretty weak, but it's not clear that other solutions will be much better in practice. Discoverability depends on documentation, which can be written and improved. I think "ease of use" is way off-target. > I presume that would be a standard library change (in either the io > or sys modules) and offered a .set_encoding() method as a > placeholder for discussion. Changing the stdlib is not a panacea. In particular, it can't be applied to older Pythons. I'm also not convinced (cf. Nick's post) that there's enough value-added and a good name for the restricted functionality we know we can provide. > An inferior and bare minimum way to address this would be to at > least add a note about how to change the encoding to the sys.std* > documentation. That encourages cargo-cult programming and doesn't > address the WTF effect but it is at least better than the current > state of affairs. IMO, this may be the best, but again I doubt it can be added to older versions. As for the "cargo cult" and "WTF" issues, I have little sympathy for either. The real WTF problem is that multi-encoding environments are inherently complex and irregular (ie, a WTF waiting to happen), and Python can't fix that. It's very unlikely that typical programmers will bother to understand what happens "under the hood" of a stdlib function/method, so that is no better than cargo-cult programming (and cargo-cult at least has the advantage that what is being done is explicit, allowing programmers who understand textio but not encodings to figure out what's happening). From ncoghlan at gmail.com Fri Jun 8 13:53:03 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 8 Jun 2012 21:53:03 +1000 Subject: [Python-ideas] Nudging beginners towards a more accurate mental model for loop else clauses In-Reply-To: <4FD1C997.9040003@btinternet.com> References: <4FD1C997.9040003@btinternet.com> Message-ID: On Fri, Jun 8, 2012 at 7:44 PM, Rob Cliffe wrote: > I think a better scheme would be to have more meaningful keywords or > keyword-combinations, e.g. > > for x in iterable: > ? ?# do stuff > ifempty: ?# ?or perhaps ifnoiter: (also applicable to while loops) > ? ?# do stuff > #ifbreak: > ? ?# do stuff > #ifnobreak: > ? ?# do stuff > > which would give all the flexibility while making it reasonably clear what > was happening. The way to be clear would actually be to drop the feature altogether (as Guido has noted in the past). Then TOOWTDI becomes: x = _no_data = object() result = _not_found = object() for x in iterable: if acceptable(x): result = x break if x is _no_data: # No data! if result is _not_found: # Nothing interesting! # Found a result, process it That's never going to happen in Python though, due to backwards compatibility requirements. FWIW, I wrote an essay summarising some of the thoughts presented in these threads: http://readthedocs.org/docs/ncoghlan_devs-python-notes/en/latest/python_concepts/break_else.html Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From rob.cliffe at btinternet.com Fri Jun 8 14:05:38 2012 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Fri, 08 Jun 2012 13:05:38 +0100 Subject: [Python-ideas] Nudging beginners towards a more accurate mental model for loop else clauses In-Reply-To: References: <4FD1C997.9040003@btinternet.com> Message-ID: <4FD1EA92.3070200@btinternet.com> On 08/06/2012 12:53, Nick Coghlan wrote: > On Fri, Jun 8, 2012 at 7:44 PM, Rob Cliffe wrote: >> I think a better scheme would be to have more meaningful keywords or >> keyword-combinations, e.g. >> >> for x in iterable: >> # do stuff >> ifempty: # or perhaps ifnoiter: (also applicable to while loops) >> # do stuff >> #ifbreak: >> # do stuff >> #ifnobreak: >> # do stuff >> >> which would give all the flexibility while making it reasonably clear what >> was happening. > The way to be clear would actually be to drop the feature altogether > (as Guido has noted in the past). Then TOOWTDI becomes: > > x = _no_data = object() > result = _not_found = object() > for x in iterable: > if acceptable(x): > result = x > break > if x is _no_data: > # No data! > if result is _not_found: > # Nothing interesting! > # Found a result, process it > > That's never going to happen in Python though, due to backwards > compatibility requirements. > > FWIW, I wrote an essay summarising some of the thoughts presented in > these threads: > http://readthedocs.org/docs/ncoghlan_devs-python-notes/en/latest/python_concepts/break_else.html > > Cheers, > Nick. > Fair enough, but I think my more compact versions are more readable. Rob From ncoghlan at gmail.com Fri Jun 8 14:12:47 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 8 Jun 2012 22:12:47 +1000 Subject: [Python-ideas] Nudging beginners towards a more accurate mental model for loop else clauses In-Reply-To: <4FD1EA92.3070200@btinternet.com> References: <4FD1C997.9040003@btinternet.com> <4FD1EA92.3070200@btinternet.com> Message-ID: On Fri, Jun 8, 2012 at 10:05 PM, Rob Cliffe wrote: > Fair enough, but I think my more compact versions are more readable. At the expense of adding 3 new keywords to the language for something that can already be handled with ordinary variable assignments. They would add no real expressive power to the language, so they just become another special case for newcomers to learn. Not a good trade-off. With the benefit of hindsight, we can also see that supporting the "else" clause on loops wasn't a good trade-off either (given the confusion it can cause). However, since the mistake has already been made, a lot of code out in the wild relies on it and those of us that quite like the construct are used to having it available, it's not worth the hassle of deprecating and removing it. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From stephen at xemacs.org Fri Jun 8 14:53:18 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 08 Jun 2012 21:53:18 +0900 Subject: [Python-ideas] Nudging beginners towards a more accurate mental model for loop else clauses In-Reply-To: <4FD1C997.9040003@btinternet.com> References: <4FD1C997.9040003@btinternet.com> Message-ID: <87obou7yo1.fsf@uwakimon.sk.tsukuba.ac.jp> Rob Cliffe writes: > On 08/06/2012 10:04, Nick Coghlan wrote: > > Would it be worth adding the "except break:" clause to the language > > just to make it crystal clear what is actually going on? -1 I understood what "except break:" was supposed to mean when I read it the first time, but now I don't any more. > > I don't think so, but it's still a handy way to explain the > > semantics while gently steering people away from linking for/else > > and if/else too closely. My main point about documentation is that for/else and if/else should not be linked directly, but rather via while/else. > I think a better scheme would be to have more meaningful keywords or > keyword-combinations, e.g. > > for x in iterable: > # do stuff > ifempty: # or perhaps ifnoiter: (also applicable to while loops) > # do stuff > #ifbreak: > # do stuff > #ifnobreak: > # do stuff > > which would give all the flexibility while making it reasonably clear > what was happening. Sure, but that's way overboard for something that's only rarely useful. From amauryfa at gmail.com Fri Jun 8 15:00:00 2012 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Fri, 8 Jun 2012 15:00:00 +0200 Subject: [Python-ideas] Isolated (?transactional) exec (?subinterpreter) calls In-Reply-To: References: Message-ID: 2012/6/8 anatoly techtonik > Optionally isolated from parent environment: > - a feature to execute user script in a snapshot of current > environment and have > a choice whenever to merge its modifications back to environment or not > It would be a really interesting feature, but seems very difficult to implement. Do you have the slightest idea how this would work? What about global state, environment variables, threads, and all kinds of side-effects? Or are you thinking about a solution based on the multiprocessing module? -- Amaury Forgeot d'Arc -------------- next part -------------- An HTML attachment was scrubbed... URL: From ned at nedbatchelder.com Fri Jun 8 15:13:56 2012 From: ned at nedbatchelder.com (Ned Batchelder) Date: Fri, 08 Jun 2012 09:13:56 -0400 Subject: [Python-ideas] Nudging beginners towards a more accurate mental model for loop else clauses In-Reply-To: References: Message-ID: <4FD1FA94.7090707@nedbatchelder.com> Just to add another attempt at explaining the for/else confusion, the analogy that keeps it straight in my mind is that the "else" is really paired with the "if .. break" inside the loop: http://nedbatchelder.com/blog/201110/forelse.html --Ned. On 6/8/2012 5:04 AM, Nick Coghlan wrote: > (context for python-ideas: my recently checked in changes to the > tutorial, that added the final paragraph to > http://docs.python.org/tutorial/controlflow.html#break-and-continue-statements-and-else-clauses-on-loops) > > On Fri, Jun 8, 2012 at 5:29 PM, Stephen J. Turnbull wrote: >> Note: reply-to set to python-ideas. >> >> Nick Coghlan writes: >> >> > The inaccuracies in the analogy are why this is in the tutorial, not the >> > language reference. All 3 else clauses are really their own thing. >> >> Nick, for the purpose of the tutorial, actually there are 4 else >> clauses: you need to distinguish *while* from *for*. It was much >> easier for me to get confused about *for*. > The only thing I'm trying to do with the tutorial update is to > encourage beginners to be start thinking in terms of try/except/else > when they first encounter for/break/else and while/break/else. That's > it. > > Yes, ultimately once people fully understand how it works under the > hood (including the loop-and-a-half construct for infinite while > loops), they'll release it's actually closely related to conditionals > as well, but anyone that places too much weight on the following > obvious parallel is going to be confused for a long time. After all: > > if iterable: > ... > else: > ... > > is *very* similar in appearance to: > > for x in iterable: > ... > else: > ... > > I believe that parallel is 99% of the reason why people get confused > about the meaning of the latter. > > The point of the tutorial update is to give readers a slight nudge > towards thinking of the latter as: > > for x in iterable: > ... > except break: # Implicit in the semantics of loops > pass > else: > ... > > Would it be worth adding the "except break:" clause to the language > just to make it crystal clear what is actually going on? I don't think > so, but it's still a handy way to explain the semantics while gently > steering people away from linking for/else and if/else too closely. I > actually agree all of the else clauses really *are* quite closely > related (hence the consistent use of the same keyword), but the > relationship is *not* the intuitively obvious one that comes to mind > when you just look at the similarity in the concrete syntax > specifically of for/else and if/else. > > Cheers, > Nick. > From jacek.masiulaniec at gmail.com Fri Jun 8 15:21:59 2012 From: jacek.masiulaniec at gmail.com (Jacek Masiulaniec) Date: Fri, 8 Jun 2012 14:21:59 +0100 Subject: [Python-ideas] SysLogHandler: gratuitous data loss Message-ID: Hello, In logging.handlers, SysLogHandler defaults to localhost:514. http://docs.python.org/py3k/library/logging.handlers.html#logging.handlers.SysLogHandler http://docs.python.org/library/logging.handlers.html#logging.handlers.SysLogHandler In practice, there are systems out there that offer local syslog service via additional endpoints, for example: /dev/log /var/run/syslog Some systems even ship with UDP endpoint disabled by default, in which case Python's default is to drop data despite the availability of these other endpoints. The /dev/log path in particular is so commonplace that many system-level utils default to it. Other languages' syslog libraries provide support for it, too. [1] [2] I propose a change to SysLogHandler's default behavior: 1) Try connect(2) against the socket files. 2) Use localhost:514 as a fallback. I believe it's possible to change this interface while remaining backwards-compatible. Thoughts? Jacek [1] http://hackage.haskell.org/packages/archive/hslogger/1.0.7/doc/html/src/System-Log-Handler-Syslog.html#openlog [2] http://golang.org/src/pkg/log/syslog/syslog_unix.go From solipsis at pitrou.net Fri Jun 8 16:55:17 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 08 Jun 2012 16:55:17 +0200 Subject: [Python-ideas] Nudging beginners towards a more accurate mental model for loop else clauses In-Reply-To: References: Message-ID: Le 08/06/2012 11:04, Nick Coghlan a ?crit : > > The only thing I'm trying to do with the tutorial update is to > encourage beginners to be start thinking in terms of try/except/else > when they first encounter for/break/else and while/break/else. That's > it. I don't see why you're trying to draw that analogy, since a loop has nothing in common with a try block. For the record, when I was a Python beginner, I had zero problem understanding the for/else construct, and it even struck me as very useful ("oh, they've thought about a clean and easy way to write search-and-break loops"). I don't think it's useful to think of beginners as people having comprehension problems. Besides, if you don't understand something up front, there's always the possibility to come back later. Regards Antoine. From steve at pearwood.info Fri Jun 8 17:06:23 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 09 Jun 2012 01:06:23 +1000 Subject: [Python-ideas] Nudging beginners towards a more accurate mental model for loop else clauses In-Reply-To: References: Message-ID: <4FD214EF.6070808@pearwood.info> Nick Coghlan wrote: > for x in iterable: > ... > except break: # Implicit in the semantics of loops > pass > else: > ... > > Would it be worth adding the "except break:" clause to the language > just to make it crystal clear what is actually going on? I don't think > so, but it's still a handy way to explain the semantics while gently I agree that it is *not* worthwhile. The main reason is that "except break" would add a new and different form of confusion (or at least complication): what happens when you return or raise from inside the loop rather than break? If "except break" *only* executes after a break (like it says!) that opens the door to "except return" and "except raise". Bleh. I really don't think we need this level of complication in loops. But if "except break" runs on *any* early exit from the loop (break, return or raise), then the name is misleading and confusing and we now have a new and exciting education problem to replace the old one. (Albeit probably a simpler problem.) > steering people away from linking for/else and if/else too closely. I > actually agree all of the else clauses really *are* quite closely > related (hence the consistent use of the same keyword), but the I'm not so sure that they are that close, except in the trivial sense of having two alternatives, "A happens, otherwise B happens". In the case of for/else, the A is implied (a break, return or raise), which makes it rather different from if/else where both alternatives are explicit. Despite the similarity with try/else, I think it is quite a stretch to link the semantics of for/else with the word "else". It simply is not a good choice of keyword. If it were, we wouldn't be having this discussion. Although it would have cost an additional keyword, I think that for/else and while/else should have been written as for/then and while/then, since that accurately describes what they do (unless you're Dutch *wink*). for x in seq: ... then: ... There would be no implication that the "then" clause is executed *instead of* the loop part, instead the natural implication is that it is executed *after* the loop. (And then we could have introduced an "else" clause to do what people expect the else clause to do, namely run if the loop doesn't run.) -- Steven From steve at pearwood.info Fri Jun 8 17:08:33 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 09 Jun 2012 01:08:33 +1000 Subject: [Python-ideas] for/else statements considered harmful In-Reply-To: References: <87aa0f9z15.fsf@uwakimon.sk.tsukuba.ac.jp> <87wr3i8c1g.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4FD21571.4060806@pearwood.info> Devin Jeanpierre wrote: > I've never been sure of what is good style here. It's comparable to > these two things: > > def foo(): > if bar(): > return baz > return quux > > def foo2(): > if bar(): > return baz > else: > return quux > > Is there some well-accepted rule of which to use? Not in my opinion. Due to laziness (why write an extra line that isn't necessary?), I tend to prefer the first version, but have been known to also use the second version on some occasions. -- Steven From g.rodola at gmail.com Fri Jun 8 17:53:04 2012 From: g.rodola at gmail.com (=?ISO-8859-1?Q?Giampaolo_Rodol=E0?=) Date: Fri, 8 Jun 2012 17:53:04 +0200 Subject: [Python-ideas] Nudging beginners towards a more accurate mental model for loop else clauses In-Reply-To: References: Message-ID: 2012/6/8 Antoine Pitrou : > Le 08/06/2012 11:04, Nick Coghlan a ?crit : > >> >> The only thing I'm trying to do with the tutorial update is to >> encourage beginners to be start thinking in terms of try/except/else >> when they first encounter for/break/else and while/break/else. That's >> it. > > > I don't see why you're trying to draw that analogy, since a loop has nothing > in common with a try block. For the record, when I was a Python beginner, I > had zero problem understanding the for/else construct, and it even struck me > as very useful ("oh, they've thought about a clean and easy way to write > search-and-break loops"). > > I don't think it's useful to think of beginners as people having > comprehension problems. Besides, if you don't understand something up front, > there's always the possibility to come back later. > > Regards > > Antoine. +1. I also didn't have problems while I was learning python, and always found for/else very expressive as a statement. for/else is not immediately clear, meaning it is mandatory to read the doc in order to understand what it does and what to expect, but once you do that then you're done. --- Giampaolo http://code.google.com/p/pyftpdlib/ http://code.google.com/p/psutil/ http://code.google.com/p/pysendfile/ From simon.sapin at kozea.fr Fri Jun 8 17:52:44 2012 From: simon.sapin at kozea.fr (Simon Sapin) Date: Fri, 08 Jun 2012 17:52:44 +0200 Subject: [Python-ideas] Isolated (?transactional) exec (?subinterpreter) calls In-Reply-To: References: Message-ID: <4FD21FCC.9030009@kozea.fr> Le 08/06/2012 15:00, Amaury Forgeot d'Arc a ?crit : > It would be a really interesting feature, but seems very difficult to > implement. > Do you have the slightest idea how this would work? > What about global state, environment variables, threads, and all kinds > of side-effects? > > Or are you thinking about a solution based on the multiprocessing module? Without any kind an guarantee about side-effects, one could start by making shallow or deep copies of the namespaces passed to exec. -- Simon Sapin From tjreedy at udel.edu Fri Jun 8 18:31:57 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 08 Jun 2012 12:31:57 -0400 Subject: [Python-ideas] Isolated (?transactional) exec (?subinterpreter) calls In-Reply-To: References: Message-ID: On 6/8/2012 4:16 AM, anatoly techtonik wrote: > - it should not matter if the code is executed with command line > interpreter or from exec(), > code should not be modified to successfully run in exec if it > successfully runs in intepreter session > http://bugs.python.org/issue13557 > http://bugs.python.org/issue14049 > http://bugs.python.org/issue1167300 > http://bugs.python.org/issue991196 These 4 duplicate issues are all about misuse of exec. The code in each *does* run if exec is passed just one namespace instead of two. When people pass two separate namespaces, the code executes as if embedded in a class definition. Since 'docs at python' never applied my suggested doc patch on 13557, I will take a stab at it. > Optionally isolated from parent environment: > - a feature to execute user script in a snapshot of current > environment and have > a choice whenever to merge its modifications back to environment or not You can do at least some of that now. -- Terry Jan Reedy From guido at python.org Fri Jun 8 19:37:17 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 8 Jun 2012 10:37:17 -0700 Subject: [Python-ideas] for/else statements considered harmful In-Reply-To: <4FD21571.4060806@pearwood.info> References: <87aa0f9z15.fsf@uwakimon.sk.tsukuba.ac.jp> <87wr3i8c1g.fsf@uwakimon.sk.tsukuba.ac.jp> <4FD21571.4060806@pearwood.info> Message-ID: On Fri, Jun 8, 2012 at 8:08 AM, Steven D'Aprano wrote: > Devin Jeanpierre wrote: > >> I've never been sure of what is good style here. It's comparable to >> these two things: >> >> def foo(): >> ? ?if bar(): >> ? ? ? ?return baz >> ? ?return quux >> >> def foo2(): >> ? ?if bar(): >> ? ? ? ?return baz >> ? ?else: >> ? ? ? ?return quux >> >> Is there some well-accepted rule of which to use? > Not in my opinion. Due to laziness (why write an extra line that isn't > necessary?), I tend to prefer the first version, but have been known to also > use the second version on some occasions. It's indeed a very subtle choice, and for simple examples it usually doesn't much matter. I tend to like #1 better if "then" block is small (especially an error exit or some other "early return" like a cache hit) and the "else" block is more substantial -- it saves an indentation level. (I also sometimes reverse the sense of the test just to get the smaller block first, for this reason.) When there are a bunch of elif clauses each ending with return (e.g. emulating a switch) I think it makes more sense to use "else" for the final clause, for symmetry. So maybe my gut rule is that if the clauses are roughly symmetrical, use the else, but if there is significant asymmetry, don't bother. -- --Guido van Rossum (python.org/~guido) From mwm at mired.org Fri Jun 8 21:53:05 2012 From: mwm at mired.org (Mike Meyer) Date: Fri, 8 Jun 2012 15:53:05 -0400 Subject: [Python-ideas] Nudging beginners towards a more accurate mental model for loop else clauses In-Reply-To: <87obou7yo1.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4FD1C997.9040003@btinternet.com> <87obou7yo1.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20120608155305.5b40e87c@bhuda.mired.org> On Fri, 08 Jun 2012 21:53:18 +0900 "Stephen J. Turnbull" wrote: > My main point about documentation is that for/else and if/else should > not be linked directly, but rather via while/else. Right. That was the most enlightening comment I saw in this thread. Writing the if/else and while/else out as: if condition: # code to run if condition is true else: # code to run if condition is false while condition: # code to run while condition is true else: # code to run when condition is false Seems obvious enough to me. For is a little bit harder, but still a straightforward if you think about it in terms of the while. for x in iterable: # code to run while there are objects left in iterable else: # code to run when there are no objects left in iterable. http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From jeanpierreda at gmail.com Fri Jun 8 22:08:12 2012 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Fri, 8 Jun 2012 16:08:12 -0400 Subject: [Python-ideas] functools.partial In-Reply-To: References: Message-ID: On Thu, Jun 7, 2012 at 11:40 PM, Nick Coghlan wrote: > If you dig up some of the older PEP 362 discussions, you'll find that > allowing developers to reduce this problem over time is the main > reason the Signature.bind() method was added to the PEP. While I > wouldn't recommend it for the base partial type, ... Why not? It seems like a good idea all around. -- Devin From ubershmekel at gmail.com Sat Jun 9 00:34:53 2012 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Sat, 9 Jun 2012 01:34:53 +0300 Subject: [Python-ideas] Nudging beginners towards a more accurate mental model for loop else clauses In-Reply-To: References: Message-ID: On Fri, Jun 8, 2012 at 12:04 PM, Nick Coghlan wrote: > (context for python-ideas: my recently checked in changes to the > tutorial, that added the final paragraph to > > http://docs.python.org/tutorial/controlflow.html#break-and-continue-statements-and-else-clauses-on-loops > ) > > If we're on that subject then I think this > Loop statements may have an else clause; it is executed when the loop terminates through exhaustion of the list (with for) or when the condition becomes false (with while), but not when the loop is terminated by a break statement. Doesn't hit the "break" nail on the head fast and hard enough in my opinion. I'd replace it with something like: > Loop statements may have an else clause; it is executed immediately after the loop but is skipped if the loop was terminated by a break statement. Yuval -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Sat Jun 9 01:59:46 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 08 Jun 2012 19:59:46 -0400 Subject: [Python-ideas] for/else statements considered harmful In-Reply-To: <4FD21571.4060806@pearwood.info> References: <87aa0f9z15.fsf@uwakimon.sk.tsukuba.ac.jp> <87wr3i8c1g.fsf@uwakimon.sk.tsukuba.ac.jp> <4FD21571.4060806@pearwood.info> Message-ID: > Devin Jeanpierre wrote: > >> I've never been sure of what is good style here. It's comparable to >> these two things: >> >> def foo(): >> if bar(): >> return baz >> return quux >> >> def foo2(): >> if bar(): >> return baz >> else: >> return quux >> >> Is there some well-accepted rule of which to use? The rule I have adopted is to omit unneeded after-if else to separate preamble stuff -- argument checking -- from the core algorithm, but leave it when branching is an essential part of the algorithm. My idea is that if the top level structure of the algorithm is an alternation, then the code should say so without the reader having to examine the contents of the branch. Example: floating-point square root def fsqrt(x): if not isinstance(x, float): raise TypeError elif x < 0: raise ValueError # Now we are ready for the real algorithm if x > 1.0: return fsqrt(1/x) else: # iterate return result Omission of elses can definitely be taken too far. There is in the C codebase code roughly with this outline: if expression: # about 15 line with at least 2 nested ifs (with else omitted) # and at least 3 codepaths ending in return calculation for else but with else omitted It takes far longer for each reader to examine if block to determine the the following block is really an else block that it would have taken one writer to just put in the "} else {" Also, some editor allow collapsing of indented blocks, but one cannot do that if else is omitted. Of course, it is routine to omit unneeded else after loops. -- Terry Jan Reedy From tjreedy at udel.edu Sat Jun 9 02:15:27 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 08 Jun 2012 20:15:27 -0400 Subject: [Python-ideas] Nudging beginners towards a more accurate mental model for loop else clauses In-Reply-To: References: Message-ID: On 6/8/2012 6:34 PM, Yuval Greenfield wrote: > > Loop statements may have an else clause; it is executed immediately > after the loop but is skipped if the loop was terminated by a break > statement. As I said in my reply on pydev, that is misleading. The else clause executes if and when the loop condition is false. Period. Simple rule. It will not execute if the loop is exited by break OR if the loop is exited by return OR if the loop is exited by raise OR if the loop never exits. (OR is the loop is aborted by external factors.) As far as else is concerned, there is nothing special about break exits compared to return or raise exits. But Nick's doc addition and your alternative imply otherwise. One could read Nick's statement and your paraphrase as suggesting that the else will by executed if the loop is exited by return (like the finally of try) or raise (like the except of try). And that is wrong. -- Terry Jan Reedy From ncoghlan at gmail.com Sat Jun 9 02:17:24 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 9 Jun 2012 10:17:24 +1000 Subject: [Python-ideas] functools.partial In-Reply-To: References: Message-ID: On Jun 9, 2012 6:08 AM, "Devin Jeanpierre" wrote: > > On Thu, Jun 7, 2012 at 11:40 PM, Nick Coghlan wrote: > > If you dig up some of the older PEP 362 discussions, you'll find that > > allowing developers to reduce this problem over time is the main > > reason the Signature.bind() method was added to the PEP. While I > > wouldn't recommend it for the base partial type, ... > > Why not? It seems like a good idea all around. Speed, complexity and backwards compatibility. With a layered API, users can choose whether they want to do early checks or not. If we build it in, you can't avoid it when you prefer the delayed error to checking the arguments twice. Cheers, Nick. -- Sent from my phone, thus the relative brevity :) > > -- Devin -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Jun 9 02:24:30 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 9 Jun 2012 10:24:30 +1000 Subject: [Python-ideas] Nudging beginners towards a more accurate mental model for loop else clauses In-Reply-To: References: Message-ID: On Jun 9, 2012 10:16 AM, "Terry Reedy" wrote: > > On 6/8/2012 6:34 PM, Yuval Greenfield wrote: > >> > Loop statements may have an else clause; it is executed immediately >> after the loop but is skipped if the loop was terminated by a break >> statement. > > > As I said in my reply on pydev, that is misleading. The else clause executes if and when the loop condition is false. Period. Simple rule. > > It will not execute if the loop is exited by break OR if the loop is exited by return OR if the loop is exited by raise OR if the loop never exits. (OR is the loop is aborted by external factors.) As far as else is concerned, there is nothing special about break exits compared to return or raise exits. > > But Nick's doc addition and your alternative imply otherwise. One could read Nick's statement and your paraphrase as suggesting that the else will by executed if the loop is exited by return (like the finally of try) or raise (like the except of try). And that is wrong. An else clause on a try statement doesn't execute in any of those cases either. I'm not assuming beginners are idiots, I'm assuming they're making a perfectly logical connection that happens to be wrong. Cheers, Nick. -- Sent from my phone, thus the relative brevity :) > > -- > Terry Jan Reedy > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Jun 9 04:31:27 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 09 Jun 2012 12:31:27 +1000 Subject: [Python-ideas] Nudging beginners towards a more accurate mental model for loop else clauses In-Reply-To: References: Message-ID: <4FD2B57F.5030401@pearwood.info> Terry Reedy wrote: > On 6/8/2012 6:34 PM, Yuval Greenfield wrote: > >> > Loop statements may have an else clause; it is executed immediately >> after the loop but is skipped if the loop was terminated by a break >> statement. > > As I said in my reply on pydev, that is misleading. Why is it misleading? It is *incomplete* insofar as it assumes the reader understands that (in the absence of try...finally) a return or raise will immediately exit the current function regardless of where in the function that return/raise happens to be. I think that's a fair assumption to make. Other than that, Yuval's description seems both correct and simple to me. It precisely matches the semantics of for/else and while/else without introducing any additional complexity. The only thing it doesn't do is rationalise why the keyword is called "else" instead of a less confusing name. > The else clause > executes if and when the loop condition is false. Period. Simple rule. What is "the loop condition" in a for-loop? If you mean "when the iterable is false (empty)", that's simply incorrect, and is precisely the common error that many people make. If on the other hand you are talking about the reader mentally converting a for-loop to an imaginary while-loop in their head, I hardly call that "simple". It wouldn't be simple even if for/else loops actually were implemented internally as a while loop. If you mean something else, I have no idea what that could possibly be. > It will not execute if the loop is exited by break OR if the loop is > exited by return OR if the loop is exited by raise OR if the loop never > exits. (OR is the loop is aborted by external factors.) As far as else > is concerned, there is nothing special about break exits compared to > return or raise exits. Right. Do we really need to explicitly document all of that under for/else? Surely we are allowed to assume a certain basic level of understanding of Python semantics -- not every page of the docs has to cover the fundamentals. This kind of reminds me of the scene in "Red Dwarf" where Holly the computer is explaining to Lister that he is the last surviving crew member of the skip and that everyone else is dead. Paraphrasing: Holly: They're all dead. Everybody's dead, Dave. Lister: Peterson isn't, is he? Holly: Everybody's dead, Dave! Lister: Not Chen! Holly: Yes, Chen. Everyone. Everybody's dead, Dave! Lister: Rimmer? Holly: He's dead, Dave. Everybody is dead. Everybody is dead, Dave. Lister: Wait. Are you trying to tell me everybody's dead? Yes Dave, a return will exit a for-loop without executing the code that follows. *wink* -- Steven From rurpy at yahoo.com Sat Jun 9 05:39:34 2012 From: rurpy at yahoo.com (Rurpy) Date: Fri, 8 Jun 2012 20:39:34 -0700 (PDT) Subject: [Python-ideas] changing sys.stdout encoding Message-ID: <1339213174.4337.YahooMailClassic@web161506.mail.bf1.yahoo.com> On 06/07/2012 03:00 PM, Mike Meyer wrote: > On Thu, Jun 7, 2012 at 4:48 PM, Rurpy wrote: >> I suspect the vast majority of >> programmers are interested in a language that allows >> them to *effectively* get done what they need to, whether >> they are working of the latest agile TTD REST server, or >> modifying some legacy text files. > > Others have raised the question this begs to have answered: how do > other programming languages deal with wanting to change the encoding > of the standard IO streams? Can you show us how they do things that's > so much easier than what Python does? This is how it seems to be done in Perl: binmode(STDOUT, ":encoding(sjis)"); which seems quite a bit simpler than Python. I don't know if it meets your "so much easier" criterion. A quick trial showed that it works as advertised when called before any output. The description of binmode() in "man perlfunc" sounds like encoding can be changed on-the-fly but my attempt to do so had no effect, so I don't know if I'm misinterpreting the text or wrote bad Perl code (haven't used it in ages and not interested in relearning it right now.) TCL appears to have on-the-fly encoding changes: | encoding system ?encoding? | Set the system encoding to encoding. If encoding is omitted | then the command returns the current system encoding. The system | encoding is used whenever Tcl passes strings to system calls. http://www.tcl.tk/man/tcl8.4/TclCmd/encoding.htm I'll see if I can find out about some other languages if there continues to be any interest. >> And even were I to accept your argument, Python is >> inconsistent: when I open a file explicitly there is >> only a slight penalty for opening a non-default-encoded >> file (the need the explicitly give an encoding): > > The proper encoding for the standard IO streams is generally a > property of the environment, and hence is set in the environment. "Proper encoding"? If you said, "Proper default encoding" I'd agree with you. And I'd buy your claim if no one had ever invented output redirection and if print output always went to a console with a (relatively) fixed encoding. But that is not the case. > You > have a use case where that's not the case. The argument is that your > use case isn't common enough to justify changing the standard library. > Can you provide evidence to the contrary? How exactly do you suggest one accurately quantify "commonness"? And what is the threshold for justification? It seems to me the strongest argument is the credibility one that I already made: 1) Programs that accept data input on stdin and write data on stdout have a long history and are widely used. I hope this is self evident. 2) Encodings other than utf-8 are widely used. I pointed to the commonness of non-utf8 encoding in Japanese web pages. Additionally, Google for "ftp readme ? site:.jp" turns up lots of text files. Once past the first few pages of Google results (where the web pages are mostly utf8) hardly any utf8 files are to be found. 3) An effect of globalization means that many more programmers today are dealing with files that have non-native encoding that come from or go to customers, vendors, partners and colleagues in other parts of the world. The number of encodings in wide use even within a single country (again Japan: utf8, sjis, euc-jp, iso202jp) implies pretty strongly that tools for use only in that region will often need multi-encoding capabilities. I think connecting the dots above leads to a pretty high-probability conclusion. > Other languages that make > setting the encoding on the standard streams easy, or applications > outside of those built for your system that have a "--encoding" type > flag? iconv, recode and their ilk are obvious examples of applications. >> I wasn't suggesting a change to the core level (if by that >> you mean to the interpreter). I was asking if some way could >> be provided that is easier and more reliable than googling >> around for a magic incantation) to change the encoding of one >> or more of the already-open-when-my-program-starts sys.std* >> streams. I presume that would be a standard library change >> (in either the io or sys modules) and offered a .set_encoding() >> method as a placeholder for discussion. > > Why presume that this needs a change in the library? The method is > straightforward, if somewhat ugly. Is there any reason it can't just > be documented, instead of added to the library? Changing the library > would require a similar documentation change. Did you miss the paragraph right below the one you quote? The one in which I said, >> An inferior and bare minimum way to address this would be to >> at least add a note about how to change the encoding to the >> sys.std* documentation. That encourages cargo-cult programming >> and doesn't address the WTF effect but it is at least better >> than the current state of affairs. From rurpy at yahoo.com Sat Jun 9 05:47:36 2012 From: rurpy at yahoo.com (Rurpy) Date: Fri, 8 Jun 2012 20:47:36 -0700 (PDT) Subject: [Python-ideas] changing sys.stdout encoding In-Reply-To: Message-ID: <1339213656.41576.YahooMailClassic@web161505.mail.bf1.yahoo.com> On 06/07/2012 06:59 PM, Nathan Schneider wrote: > On Thu, Jun 7, 2012 at 5:14 PM, Rurpy wrote: >> On 06/07/2012 03:45 PM, Nick Coghlan wrote: [...] >>> level code doesn't want those streams, it needs to >>> replace them with something else. >> >> Yes, this is what the code I googled up does: >> import codecs >> sys.stdout = codecs.getwriter(opts.encoding)(sys.stdout.buffer) > > What if codecs contained convenience methods for stdin and stdout? > I.e. the above could be written more simply as > > import codecs > codecs.encode_stdout(opts.encoding) > > This is much more memorable than the current option, and would also > make life easier when working with fileinput (whose openhook argument > can be set to control encoding of input *file* streams, but when it > falls back to stdin this preference is ignored). How ironic. In Python2 I hated having to import codecs and use codecs.open() (the only thing I ever used from the codecs module) rather than just having an encoding parameter on open(). But seems like might be a reasonable thing to do. I'm sure there will be opinions. :-). It's not just sys.stdout though, the same issue exists with sys.stdin and sys.stderr so one might want either three functions, or one function that includes the a stream as parameter. From rurpy at yahoo.com Sat Jun 9 05:57:01 2012 From: rurpy at yahoo.com (Rurpy) Date: Fri, 8 Jun 2012 20:57:01 -0700 (PDT) Subject: [Python-ideas] changing sys.stdout encoding In-Reply-To: Message-ID: <1339214221.27166.YahooMailClassic@web161503.mail.bf1.yahoo.com> On 06/07/2012 07:01 PM, Nick Coghlan wrote: > On Fri, Jun 8, 2012 at 10:14 AM, Rurpy wrote: >> On 06/07/2012 03:45 PM, Nick Coghlan wrote: >>> If user level code doesn't want those streams, it needs to >>> replace them with something else. >> >> Yes, this is what the code I googled up does: >> import codecs >> sys.stdout = codecs.getwriter(opts.encoding)(sys.stdout.buffer) >> But that code is not obvious to someone who has been able to do >> all his encoded IO (with the exception of sys.stdout) using just >> the encoding parameter of open(). Hence my question if some- >> thing like a set_encoding() method/function that would work on >> sys.stdout is feasible. I don't see an answer to that in your >> statement above. First, thanks for the detailed response. > Right, I was only trying to explain why the standard streams are a > special case - because they're also used by the interpreter, and it > makes the startup process much simpler if the interpreter retains > complete control over the way they're initialised (it's already > complicated by the fact we need to get something half-usable in place > as sys.stderr so that error reporting is possible while initialising > them properly). It then becomes an application level operation to > replace them if desired. OK, I can see that as a use-case design principle. I still don't see any hard technical reason why the same streams could not be kept and simply allow their encoding's to be reset if they haven't been used yet. In other words, does that principle provide sufficient value to compensate for ruling out several possible solutions to based on modify the current stream rather than rewrapping it? > We can (and do) make the internal standard stream initialisation > configurable, but it then becomes a UI design problem to get something > that balances flexibility against complexity. PYTHONIOENCODING (in > association with OS utilities that make it possible to set an > environment variable for a specific process invocation, as well as > support in the subprocess module for passing a tailored environment to > subprocesses) is our current solution. > > The interpreter design aims, first and foremost, to provide a simple > and straightforward experience in POSIX environments that use UTF-8 > everywhere (since that's the most sane approach available for > migrating from a previously ASCII-based computing world). Windows is a > bit trickier (due to the internal use of UTF-16 APIs and the lack of > POSIX-style support for temporarily setting an environment variable > when invoking a process from the shell), but correctly supporting that > environment is also a very high priority. The fallback behaviours when > these situations do not apply are designed to work best on systems > that are, at least somewhat *locally* consistent. But networks, shared files systems, email, etc have all blurred the concept of localness. Just because I am running my program on a Unix machine does not mean I may not need to write files with '\n\r' line endings. Perhaps another way to view it is that Python is wrongly subsuming part of the problem space into the system space. The need to read or write disparate encodings is a function of the problem being addressed (which includes how problem data is encoded just as much as whether it is formatted as CSV or as labeled name-value pairs); it's not really a function of my local system environment. > The real world is complex. Eventually, our answer has to be "handle it > at the application level, there are too many variations for us to > support it directly at the interpreter level". Currently, any standard > stream encoding related problem that can't be handled with > PYTHONIOENCODING is just such a situation. We know it sucks for > multi-encoding environments, but those are a nightmare for a lot of > reasons and are the main drivers behind the industry-wide effort to > standardise on Unicode text handling, including universal encodings > like UTF-8. I think "nightmare" is a little too strong. PITA maybe, particularly before one's gotten tools and environment worked out. Eventually one can get used to seeing Windows path separators displayed as yen signs in cmd.exe windows. :-) I think of it as just another annoyance imposed by the real world -- like making sure backups run exactly once a night even across dst changes. > So now we're down to the question of how much complexity we're willing > to tolerate in the interpreter specifically for the sake of > environments where: > 1. The automatic standard stream encoding calculation gives the wrong answer > 2. The PYTHONIOENCODING override is insufficient > 3. The application being executed isn't already handling the problem > 4. A -m executable helper module (or directly executable helper > script) can't be used to initialise the standard streams correctly > before continuing on to execute the requested application via the > runpy module In the options you give above, it seems to me that all (except 3, and maybe 4; I only use -m only for pdb) there seems to be an implicit assumption that there is a single encoding that needs to be determined. But that is wrong. There are three streams and each of those streams may need a different encoding. Python gets this in the case of explicitly opened files... no one would dream of having a sys.encoding setting replace the open(encoding=...) parameter. What Python is missing is that the same applies to stdin, stdout and stderr. PYTHONIOENCODING is fine for what it is; it is just not meant for my particular issue. My proposal was simply to allow your option (3) to address this. (Or more accurately, that it address this on a near equal footing to explicitly opened streams for reasons of both ease of use and python api consistency.) > And the answer is "not much". About the only likely way forward I can > see for streamlining this situation would be to treat this as another > use case for http://bugs.python.org/issue14803, which proposes the > ability to run snippets of Python code prior to execution of __main__. That (IIUC) would not be workable for my problem. ./myprog.py -e sjis,sjis [other options...] is acceptable. Something like: python -C 'sys.stdin=...; sys.stdout=...' myprog.py [other options...] would not be. And since you mentioned it above, nor would: python -m setstdin_sjis -m setstdout_sjis myprog.py [other options...] > I do agree that "create a new IO object that is like this old IO > object but with these settings changed" could probably do with a > better official API, but such an API needs to be designed with a > respect for the issues associated with changing encodings "on the fly" > and ask serious questions about whether or not we should be > encouraging that practice by making it easier than it is already. I > thought I had posted a tracker issue to that effect, but I can't find > it now. I think that being unable to easily change stream encoding before first use is orders of magnitude more important than being unable to change them on-the-fly. I mentioned the latter only because I thought it might fall out naturally from fixing the first problem, and might occasionally be useful. (I mentioned a couple cases I've encountered but even I, who am very much in favor of generality, have to admit I think the uses are rare.) I acknowledge though that even a before-first-use api (which I think could be implemented before an on-the-fly one) would have to take the possible later existence of the latter into account. From rurpy at yahoo.com Sat Jun 9 06:07:07 2012 From: rurpy at yahoo.com (Rurpy) Date: Fri, 8 Jun 2012 21:07:07 -0700 (PDT) Subject: [Python-ideas] changing sys.stdout encoding In-Reply-To: <87vcj283cz.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <1339214827.45788.YahooMailClassic@web161505.mail.bf1.yahoo.com> On 06/08/2012 05:11 AM, Stephen J. Turnbull wrote: > Rurpy writes: > > > Python is inconsistent: > > Yup, and I said there is support for dealing with that inconsistency. > At least I'm +1 and Nick's +0.5. > > So let's talk about what to do about it. Nick has a pretty good > channel on the BFDL, and since he doesn't seem to like an addition to > the stdlib here, it may not go far. But I don't see a reason to rule > out stdlib changes yet. > > As far as I'm concerned, there are three reasonable proposals: Which were (summarizing, please correct if wrong) 1) A package on PyPI containing a function like import codecs def rewrap_stream_with_new_encoding (old_stream, encoding): new_stream = codecs.getwriter (encoding)(old_stream.buffer) return new_stream (or maybe three functions for each of the std* streams, without the 'old_stream' parameter?) 2) Modify standard lib. Add something like a .reset_encoding() method to io.TextIOWrapper? (Name and functionality to be bikeshedded to death.) 3) Modify the standard lib documentation (I assume for sys.std* as described below) Also 4?) Nathan Schneider suggested a hybrid (1) and (2): put the function in the codecs module. > > > [S]ince a 3-line function can do the job, it might make just as > > > much sense to put up a package on PyPI. > > > I hardly think it is worth the effort, for either the producer > > or consumers, of putting a 3-line function on PyPI. Nor would > > such a solution address the discoverability and ease-of-use > > problems I am complaining about. > > Agreed that it's pretty weak, but it's not clear that other solutions > will be much better in practice. If (and when) I had the problem of figuring out how to change sys.stdout encoding PyPI would be (and was) the last place I'd look. It is just not the kind of problem one looks to a package to solve. Rather like looking in PyPI if you want to capitalize a string. Where I would look is where I did: * The Python docs io module. * Then the sys module docs for std*. They say how to change the buffering and how to change to binary. They also say how the default encoding is determined. For this reason, this is where I would put any note about changing the encoding. * Finally the internet. * Had I not found an answer there I would have posted to c.l.p. I don't think I'd have looked on PyPI unless something explicitly pointed me there. > Discoverability depends on > documentation, which can be written and improved. Documentation where? > I think "ease of use" is way off-target. I would think ease of use would always be a consideration in any api change users were exposed to. Or are you saying some api's should be discouraged and making them hard to use is better than a "not recommended" note in the documentation? If so I suspect we'll just have to agree to disagree on that. And in this case I don't even see any reason to disrecommend it -- writing to sys.stdout is the best answer in the circumstances I've described. > > I presume that would be a standard library change (in either the io > > or sys modules) and offered a .set_encoding() method as a > > placeholder for discussion. > > Changing the stdlib is not a panacea. In particular, it can't be > applied to older Pythons. I'm also not convinced (cf. Nick's post) > that there's enough value-added and a good name for the restricted > functionality we know we can provide. Nothing is ever a panacea. It seems like it could be the cleanest, nicest (long term) solution but clearly the most difficult. > > An inferior and bare minimum way to address this would be to at > > least add a note about how to change the encoding to the sys.std* > > documentation. That encourages cargo-cult programming and doesn't > > address the WTF effect but it is at least better than the current > > state of affairs. > > IMO, this may be the best, but again I doubt it can be added to older > versions. Does it need to be? I'd have thought this would just be a doc issue on the tracker (although perhaps getting agreement of the wording would be hard?) > As for the "cargo cult" and "WTF" issues, I have little sympathy for > either. The real WTF problem is that multi-encoding environments are > inherently complex and irregular (ie, a WTF waiting to happen), and > Python can't fix that. But the WTF comes not from multi-encoding (in which case it would have occurred when the problem requirements were received) but from observing that doing the necessary output to a file is easy as pie, but doing the same to stdout (another file) isn't. Python can avoid making a less than ideal situation (multi-encoding) worse by not making harder to do what needs to be done than necessary. > It's very unlikely that typical programmers > will bother to understand what happens "under the hood" of a stdlib > function/method, so that is no better than cargo-cult programming The point though is that programmers don't need to look under the hood -- the fact that something is in stdlib means (at least ideally) it is documented as a black box. What goes in, what comees out, the relationship between the two and any side effects are all concisely, fully and accurately described (again, in an ideal world). But with a code snippet and a comment that says, "use this to change the encoding of sys.stdout), the programmer has to figure out everything himself. (Of course that's not totally bad -- I know a lot more about text IO streams than I did 3 days ago. :-) Sure, you could document the code snippet as well as a packaged function, but that's stretching our ideal world well past the breaking point -- it doesn't happen. :-) > (and > cargo-cult at least has the advantage that what is being done is > explicit, allowing programmers who understand textio but not encodings > to figure out what's happening). True it's a double edged sword but I prefer to use code packaged in stdlib. If I didn't I would cut and paste from there and I don't :-) Also, there are programmers who understand encoding but not textio (I'm one) but I'll concede we are probably a minority. From ncoghlan at gmail.com Sat Jun 9 08:58:21 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 9 Jun 2012 16:58:21 +1000 Subject: [Python-ideas] functools.partial In-Reply-To: References: Message-ID: (Added list back to recipients) On Sat, Jun 9, 2012 at 10:58 AM, Devin Jeanpierre wrote: > On Fri, Jun 8, 2012 at 8:17 PM, Nick Coghlan wrote: >> Speed, complexity and backwards compatibility. With a layered API, users can >> choose whether they want to do early checks or not. If we build it in, you >> can't avoid it when you prefer the delayed error to checking the arguments >> twice. > > Then maybe the layered API belongs in the stdlib. What's the use of > base partial type, except as a micro-optimization? functools.partial will still be used to change the signature of a callable, the same as it has been ever since it was added. The layered API runs afoul of "not every three line function needs to be in the standard library". It's better to add the base API that is difficult for third parties to provide (in this case, inspect.signature and Signature objects) and let specific use cases emerge naturally over time, rather than trying to guess the *exact* patterns in advance. > On the other hand, I'm not so sure about your complexity argument. > It's hard to argue against "this isn't worth our time", if that's what > you're saying. But if you're saying it's too complicated now, > shouldn't Signature.bind help with that? I'm saying it makes *functools.partial* more complex, because we're asking it to do more. We would also be making it impossible to use without checking every signature twice. Current uses will be a mix of cases where the lack of early checking is annoying (but tolerable) and cases where it is undesirable. Adding the checking directly to the base API means we're assuming that early checking is desirable for *every* use case, and that's unlikely to be true. Most importantly though, if we leave the status quo in place for now, we can change our minds later if we still think it's a good idea. If we charge ahead and add early checking everywhere immediately, then we're quite likely to do more harm than good. We're in this for the long haul, and 2014 really isn't that far away in the context of programming language evolution. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From arnodel at gmail.com Sat Jun 9 09:46:25 2012 From: arnodel at gmail.com (Arnaud Delobelle) Date: Sat, 9 Jun 2012 08:46:25 +0100 Subject: [Python-ideas] Nudging beginners towards a more accurate mental model for loop else clauses In-Reply-To: References: Message-ID: Hi, (sent from my phone) On Jun 8, 2012 11:35 PM, "Yuval Greenfield" wrote: > > On Fri, Jun 8, 2012 at 12:04 PM, Nick Coghlan wrote: >> >> (context for python-ideas: my recently checked in changes to the >> tutorial, that added the final paragraph to >> http://docs.python.org/tutorial/controlflow.html#break-and-continue-statements-and-else-clauses-on-loops ) >> > > If we're on that subject then I think this > > > Loop statements may have an else clause; it is executed when the loop terminates through exhaustion of the list (with for) or when the condition becomes false (with while), but not when the loop is terminated by a break statement. > > Doesn't hit the "break" nail on the head fast and hard enough in my opinion. I'd replace it with something like: > > > Loop statements may have an else clause; it is executed immediately after the loop but is skipped if the loop was terminated by a break statement. > Yes. This is why I've been suggesting for a while that we call these constructs for/break/else and while/break/else. As Terry says, this is not the whole truth but you'd have to have a warped mind not to extrapolate the correct behaviour when there is a return or raise in the loop body. Arnaud -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Jun 9 11:55:09 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 9 Jun 2012 19:55:09 +1000 Subject: [Python-ideas] Replacing the standard IO streams (was Re: changing sys.stdout encoding) Message-ID: So, after much digging, it appears the *right* way to replace a standard stream in Python 3 after application start is to do the following: sys.stdin = open(sys.stdin.fileno(), 'r', ) sys.stdout = open(sys.stdout.fileno(), 'w', ) sys.stderr = open(sys.stderr.fileno(), 'w', ) Ditto for the other standard streams. It seems it already *is* as simple as with any other file, we just collectively forgot about: 1. The fact open() accepts file descriptors directly in Python 3 2. The fact that text streams still report the underlying file descriptor correctly *That* is something we can happily advertise in the standard library docs. If you could check to make sure it works properly for your use case and then file a docs bug at bugs.python.org to get it added to the std streams documentation, that would be very helpful. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From p.f.moore at gmail.com Sat Jun 9 13:00:37 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 9 Jun 2012 12:00:37 +0100 Subject: [Python-ideas] Replacing the standard IO streams (was Re: changing sys.stdout encoding) In-Reply-To: References: Message-ID: On 9 June 2012 10:55, Nick Coghlan wrote: > So, after much digging, it appears the *right* way to replace a > standard stream in Python 3 after application start is to do the > following: > > ? ?sys.stdin = open(sys.stdin.fileno(), 'r', ) > ? ?sys.stdout = open(sys.stdout.fileno(), 'w', ) > ? ?sys.stderr = open(sys.stderr.fileno(), 'w', ) > > Ditto for the other standard streams. It seems it already *is* as > simple as with any other file, we just collectively forgot about: One minor point - if sys.stdout is redirected, *and* you have already written to sys.stdout, this resets the file pointer. With test.py as import sys print("Hello!") sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8') print("Hello!") test.py >a gives one line in a, not two (tested on Windows, Unix may be different). And changing to "a" doesn't resolve this... Of course, the actual use case is to change the encoding before anything is written - so maybe a small note saying "don't do this" is enough. But it's worth mentioning before we get the bug report saying "Python lost my data" :-) Paul. From p.f.moore at gmail.com Sat Jun 9 15:00:03 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 9 Jun 2012 14:00:03 +0100 Subject: [Python-ideas] Replacing the standard IO streams (was Re: changing sys.stdout encoding) In-Reply-To: References: Message-ID: On 9 June 2012 12:00, Paul Moore wrote: > On 9 June 2012 10:55, Nick Coghlan wrote: >> So, after much digging, it appears the *right* way to replace a >> standard stream in Python 3 after application start is to do the >> following: >> >> ? ?sys.stdin = open(sys.stdin.fileno(), 'r', ) >> ? ?sys.stdout = open(sys.stdout.fileno(), 'w', ) >> ? ?sys.stderr = open(sys.stderr.fileno(), 'w', ) >> >> Ditto for the other standard streams. It seems it already *is* as >> simple as with any other file, we just collectively forgot about: > > One minor point - if sys.stdout is redirected, *and* you have already > written to sys.stdout, this resets the file pointer. With test.py as > > import sys > print("Hello!") > sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8') > print("Hello!") > > test.py >a gives one line in a, not two (tested on Windows, Unix may > be different). And changing to "a" doesn't resolve this... Ignore me - you need to flush stdout before repoening it, is all. Dumb mistake, sorry for the noise :-( Paul. From jeanpierreda at gmail.com Sat Jun 9 16:01:50 2012 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Sat, 9 Jun 2012 10:01:50 -0400 Subject: [Python-ideas] Nudging beginners towards a more accurate mental model for loop else clauses In-Reply-To: <4FD2B57F.5030401@pearwood.info> References: <4FD2B57F.5030401@pearwood.info> Message-ID: On Fri, Jun 8, 2012 at 10:31 PM, Steven D'Aprano wrote: > Why is it misleading? It is *incomplete* insofar as it assumes the reader > understands that (in the absence of try...finally) a return or raise will > immediately exit the current function regardless of where in the function > that return/raise happens to be. I think that's a fair assumption to make. How can the reader understand that, when the reader doesn't know that return or raise exist yet? The assumption that the reader understands basic Python is unreasonable. This is the tutorial. As I understand the objection, it is misleading in that it puts the focus on the wrong thing. It says "it's skipped by a break", as if that were special. It's skipped by a lot of things that aren't mentioned, the really interesting thing is when it _isn't_ skipped, which is glossed over. It is implied that this happens whenever it is exited by anything other than break, but of course that isn't true, and you have to think "well, what about return and raise?" However, as mentioned above, no student will ever think about return and raise, because those constructs have not been introduced yet. I wonder if they will just internalize "except not when left by break"? That would be awful! Anyway, I'm not really an expert on writing technical documentation I would expect that it's better to not force the reader to remember information and think about implications, if we can say flat-out exactly what happens. Even if they can do it successfully, surely it is annoying? If you want to mention break up-front, why not reverse the clause order? Currently the phrasing is this: Loop statements may have an else clause; it is executed when the loop terminates through exhaustion of the list (with for) or when the condition becomes false (with while), but not when the loop is terminated by a break statement. It could also be (something like) this: Loop statements may have an else clause; it is not executed when the loop is terminated by a break statement; it is only executed when the loop terminates through exhaustion of the list (with for) or when the condition becomes false (with while). This reads backwards to me, because the clarifying information is listed before the main fact. Also it's a terrible sentence (my fault, the original was long but didn't have three independent clauses). But hey. Or you could split it up into two sentences: Loop statements may have an else clause, which is only executed when the loop terminates through exhaustion of the list (with for) or when the condition becomes false (with while). The else clause is *not* executed when the loop is terminated by a ``break``, or any other control flow construct you will see. And so on. Lots of room to play around with how the information gets across without sacrificing the core fact of how else works. If that core fact is unworkable and does more harm than good, then I guess it has to go, though. Lying-to-children is a well-worn and useful didactic technique. -- Devin From zuo at chopin.edu.pl Sat Jun 9 18:01:13 2012 From: zuo at chopin.edu.pl (Jan Kaliszewski) Date: Sat, 9 Jun 2012 18:01:13 +0200 Subject: [Python-ideas] Nudging beginners towards a more accurate mental model for loop else clauses In-Reply-To: References: Message-ID: <20120609160113.GE2587@chopin.edu.pl> Nick Coghlan dixit (2012-06-08, 19:04): > for x in iterable: > ... > except break: # Implicit in the semantics of loops > pass > else: > ... > > Would it be worth adding the "except break:" clause to the language > just to make it crystal clear what is actually going on? I don't think > so, but it's still a handy way to explain the semantics while gently > steering people away from linking for/else and if/else too closely. IMHO a better option would be a separate keyword, e.g. 'broken': for x in iterable: ... broken: ... else: ... And not only to make the 'else' more understandable. I found, in a few situations, that such a 'broken' clause would be really useful, making my code easier to read and maintain. There were some relatively complex, parsing-related, code structures... stopped = False for x in iterable: ... if condition1: stopped = True break ... if contition2: stopped = True break ... if contition3: stopped = True break ... if stopped: do_foo() else: do_bar() It would have been nice to be able to do: for x in iterable: ... if condition1: break ... if contition2: break ... if contition3: break ... broken: do_foo() else: do_bar() Cheers. *j From python at mrabarnett.plus.com Sat Jun 9 18:42:53 2012 From: python at mrabarnett.plus.com (MRAB) Date: Sat, 09 Jun 2012 17:42:53 +0100 Subject: [Python-ideas] Replacing the standard IO streams (was Re: changing sys.stdout encoding) In-Reply-To: References: Message-ID: <4FD37D0D.60807@mrabarnett.plus.com> On 09/06/2012 12:00, Paul Moore wrote: > On 9 June 2012 10:55, Nick Coghlan wrote: >> So, after much digging, it appears the *right* way to replace a >> standard stream in Python 3 after application start is to do the >> following: >> >> sys.stdin = open(sys.stdin.fileno(), 'r',) >> sys.stdout = open(sys.stdout.fileno(), 'w',) >> sys.stderr = open(sys.stderr.fileno(), 'w',) >> >> Ditto for the other standard streams. It seems it already *is* as >> simple as with any other file, we just collectively forgot about: > > One minor point - if sys.stdout is redirected, *and* you have already > written to sys.stdout, this resets the file pointer. With test.py as > > import sys > print("Hello!") > sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8') > print("Hello!") > > test.py>a gives one line in a, not two (tested on Windows, Unix may > be different). And changing to "a" doesn't resolve this... > > Of course, the actual use case is to change the encoding before > anything is written - so maybe a small note saying "don't do this" is > enough. But it's worth mentioning before we get the bug report saying > "Python lost my data" :-) > I find that this: print("Hello!") sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8') print("Hello!") prints the string "Hello!\r\r\n", but this: print("Hello!") sys.stdout.flush() sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8') print("Hello!") prints the string "Hello!\r\nHello!\r\r\n". I had hoped that the flush would be enough, but apparently not. From bruce at leapyear.org Sat Jun 9 18:44:15 2012 From: bruce at leapyear.org (Bruce Leban) Date: Sat, 9 Jun 2012 09:44:15 -0700 Subject: [Python-ideas] Nudging beginners towards a more accurate mental model for loop else clauses In-Reply-To: References: Message-ID: On Fri, Jun 8, 2012 at 3:34 PM, Yuval Greenfield wrote: > > Loop statements may have an else clause; it is executed when the loop > terminates through exhaustion of the list (with for) or when the condition > becomes false (with while), but not when the loop is terminated by a break > statement. I don't think talking about exhaustion of the list is the simplest way to think about this. Isn't it the distinction whether the loop exits at the bottom or in the middle? On Sat, Jun 9, 2012 at 12:46 AM, Arnaud Delobelle wrote: > As Terry says, this is not the whole truth but you'd have to have a warped > mind not to extrapolate the correct behaviour when there is a return or > raise in the loop body. > > If we can express this in a way that is the whole truth that's better. And leaving out a very common scenario like return in a loop and an less common one like raise. Asking readers of technical documentation to extrapolate frequently leads to incorrect assumptions. Go read the docs on msdn if you don't agree with that. Here's my take: Loop statements may have an else clause which is executed when the loop exits normally (control flows off the bottom of the loop). If the loop exits from the middle (through break, return, raise or something else), then the else is not executed. It may help to think of the else as being paired with an "if ... break" in the middle of the loop. If the break is not executed then the else will be. Likewise I would reword the comparison to try. In particular I would remove the negative reference to if as I think that's misleading. The else clause of a loop can also be thought of as similar to the else clause of a try statement. A try statement?s else clause runs when no exception, break or return occurs and the try exits normally, and a loop?s else clause runs when no break or return occurs and the loop exits normally. For more on the try statement and exceptions, see Handling Exceptions. Note that this corrects the error in the current docs which says "a try statement?s else clause runs when no exception occurs" which is not true if you exit the try via break or return. --- Bruce Follow me: http://www.twitter.com/Vroo http://www.vroospeak.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Sat Jun 9 18:49:48 2012 From: python at mrabarnett.plus.com (MRAB) Date: Sat, 09 Jun 2012 17:49:48 +0100 Subject: [Python-ideas] Nudging beginners towards a more accurate mental model for loop else clauses In-Reply-To: <20120609160113.GE2587@chopin.edu.pl> References: <20120609160113.GE2587@chopin.edu.pl> Message-ID: <4FD37EAC.2080808@mrabarnett.plus.com> On 09/06/2012 17:01, Jan Kaliszewski wrote: > Nick Coghlan dixit (2012-06-08, 19:04): > >> for x in iterable: >> ... >> except break: # Implicit in the semantics of loops >> pass >> else: >> ... >> >> Would it be worth adding the "except break:" clause to the language >> just to make it crystal clear what is actually going on? I don't think >> so, but it's still a handy way to explain the semantics while gently >> steering people away from linking for/else and if/else too closely. > > IMHO a better option would be a separate keyword, e.g. 'broken': > > for x in iterable: > ... > broken: > ... > else: > ... > > And not only to make the 'else' more understandable. I found, in > a few situations, that such a 'broken' clause would be really useful, > making my code easier to read and maintain. There were some relatively > complex, parsing-related, code structures... > > stopped = False > for x in iterable: > ... > if condition1: > stopped = True > break > ... > if contition2: > stopped = True > break > ... > if contition3: > stopped = True > break > ... > if stopped: > do_foo() > else: > do_bar() > [snip] That can be re-written as: stopped = True for x in iterable: ... if condition1: break ... if condition2: break ... if condition3: break ... else: stopped = False if stopped: do_foo() else: do_bar() From steve at pearwood.info Sat Jun 9 19:48:28 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 10 Jun 2012 03:48:28 +1000 Subject: [Python-ideas] Nudging beginners towards a more accurate mental model for loop else clauses In-Reply-To: References: Message-ID: <4FD38C6C.1020602@pearwood.info> Bruce Leban wrote: > On Fri, Jun 8, 2012 at 3:34 PM, Yuval Greenfield > wrote: > >>> Loop statements may have an else clause; it is executed when the loop >> terminates through exhaustion of the list (with for) or when the condition >> becomes false (with while), but not when the loop is terminated by a break >> statement. > > > I don't think talking about exhaustion of the list is the simplest way to > think about this. Isn't it the distinction whether the loop exits at the > bottom or in the middle? [Aside: I believe that isn't Yuval's description above. As I understand it, he is quoting the current docs.] Loops exit at the top, not the bottom. This is most obvious when you think about a while loop: while condition: ... Of course you have to be at the top of the loop for the while to check condition, not the bottom. For-loops are not quite so obvious, but execution has to return back to the top of the loop in order to check whether or not the sequence is exhausted. Whether or not it is the *simplest* way to think about for/else, talking about exhaustion of the list (iterable) is correct. > On Sat, Jun 9, 2012 at 12:46 AM, Arnaud Delobelle wrote: > >> As Terry says, this is not the whole truth but you'd have to have a warped >> mind not to extrapolate the correct behaviour when there is a return or >> raise in the loop body. >> >> > If we can express this in a way that is the whole truth that's better. And > leaving out a very common scenario like return in a loop and an less common > one like raise. Asking readers of technical documentation to extrapolate > frequently leads to incorrect assumptions. Go read the docs on msdn if you > don't agree with that. Should we ask readers to extrapolate what happens when the for loop variable is a keyword (e.g. "for None in sequence"), or explicitly mention what happens? Should we ask readers to extrapolate what happens when the for loop sequence doesn't actually exist, or explicitly tell them that they get a NameError and the loop doesn't run? Should we do this for every single function? Frankly, you cannot avoid asking readers to extrapolate, because there is an infinite number of things that they could do, and you cannot possibly document them all. -- Steven From steve at pearwood.info Sat Jun 9 19:49:43 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 10 Jun 2012 03:49:43 +1000 Subject: [Python-ideas] Nudging beginners towards a more accurate mental model for loop else clauses In-Reply-To: References: <4FD2B57F.5030401@pearwood.info> Message-ID: <4FD38CB7.9070804@pearwood.info> Devin Jeanpierre wrote: > On Fri, Jun 8, 2012 at 10:31 PM, Steven D'Aprano wrote: >> Why is it misleading? It is *incomplete* insofar as it assumes the reader >> understands that (in the absence of try...finally) a return or raise will >> immediately exit the current function regardless of where in the function >> that return/raise happens to be. I think that's a fair assumption to make. > > How can the reader understand that, when the reader doesn't know that > return or raise exist yet? The assumption that the reader understands > basic Python is unreasonable. This is the tutorial. If the reader doesn't know that return or raise exist, they are hardly going to draw conclusions about the behaviour of for/else when a return or raise is reached. If they do know about return and raise, they should understand that return and raise skip everything, not just for/else. > As I understand the objection, it is misleading in that it puts the > focus on the wrong thing. It says "it's skipped by a break", as if > that were special. It's skipped by a lot of things that aren't > mentioned, the really interesting thing is when it _isn't_ skipped, > which is glossed over. I don't think it is glossed over at all. You cut out my quote of Yuval's description, here it is again: Loop statements may have an else clause; it is executed immediately after the loop but is skipped if the loop was terminated by a break statement. The "really interesting thing" is the first thing about the else clause mentioned: it is executed immediately after the loop. The break statement *really is special* and deserves to be singled out for mention. The break statement is the only way to exit the *entire* for-loop construct (including the else) without exiting the entire function, or halting execution. > It is implied that this happens whenever it is > exited by anything other than break, but of course that isn't true, I strongly disagree that it implies anything of the sort. We're should assume the readers are beginners, but not idiots. Ignoring try/finally blocks, which are special, we can assume that the reader has (or will have once they actually learn about functions and exceptions) a correct understanding of the behaviour of return and raise. - If the loop is exited by a return, *nothing* following the return is executed. That includes the else block. - If execution is halted by an exception (including raise), *nothing* following the exception is executed. That includes the else block. - If execution is halted by an external event that halts or interrupts the Python process, *nothing* following executes. That includes the else block. - If the loop never completes, *nothing* following the loop executes. That includes the else block. To continue the analogy with the "Red Dwarf" quote I made earlier: "What about assignments after the loop?" "No, they aren't executed. Nothing is executed." "Well what about print statements?" "No Dave, print statements aren't executed. Nothing is executed." "How about the len() function?" "No Dave, nothing is executed." "What, not even the else clause?" Unless we think that the average beginner to Python is as dumb as Dave Lister from Red Dwarf, I don't think we need worry that they will imagine that for/else blocks behave like try/finally. Somehow we've gone from trying to fix an actual, real-life problem where people assume that the else block executes if the loop sequence is empty, to arguing how best to solve the entirely hypothetical problem that people might imagine that else blocks have the special behaviour of try/finally. -- Steven From zachary.ware+pyideas at gmail.com Sat Jun 9 20:16:03 2012 From: zachary.ware+pyideas at gmail.com (Zachary Ware) Date: Sat, 9 Jun 2012 13:16:03 -0500 Subject: [Python-ideas] Nudging beginners towards a more accurate mental model for loop else clauses In-Reply-To: References: <4FD2B57F.5030401@pearwood.info> <4FD38CB7.9070804@pearwood.info> Message-ID: I've had a thought on this topic; how would it be to completely leave else out of the if, for, and while sections, then give else its own section explaining exactly how it works in each situation where it is applicable? I'd be happy to write up a sample later this evening if this thought isn't completely shot down :) As a side note, I didn't even know there was a while...else construct until I saw this discussion. I'd heard of for...else, but not with while. -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Sat Jun 9 22:02:19 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 09 Jun 2012 23:02:19 +0300 Subject: [Python-ideas] Replacing the standard IO streams (was Re: changing sys.stdout encoding) In-Reply-To: References: Message-ID: <4FD3ABCB.5080800@gmail.com> On 09.06.12 12:55, Nick Coghlan wrote: > So, after much digging, it appears the *right* way to replace a > standard stream in Python 3 after application start is to do the > following: > > sys.stdin = open(sys.stdin.fileno(), 'r',) > sys.stdout = open(sys.stdout.fileno(), 'w',) > sys.stderr = open(sys.stderr.fileno(), 'w',) sys.stdin = io.TextIOWrapper(sys.stdin.detach(), ) sys.stdout = io.TextIOWrapper(sys.stdout.detach(), ) ... None of these methods are not guaranteed to work if the input or output have occurred before. From zuo at chopin.edu.pl Sat Jun 9 22:07:03 2012 From: zuo at chopin.edu.pl (Jan Kaliszewski) Date: Sat, 9 Jun 2012 22:07:03 +0200 Subject: [Python-ideas] BindError as a built-in TypeError subclass (on the margin of PEP 362 discussion) Message-ID: <20120609200703.GF2587@chopin.edu.pl> Suggestion ========== I think that BindError proposed in PEP 362 could be a built-in TypeError subclass, raised whenever given arguments do not match a given callable: 1. while using Signature().bind(...) [as proposed in PEP 362], and also 2. while using inspect.getcallargs(...) and *also* 3. while doing *any* call. Rationale ========= The present behaviour (ad 2. and 3.), i.e. raising TypeError, makes it hard to differentiate call-argument-related errors from other TypeError occurrences. Raising BindError (or ArgumentError? the actual name is disputable of course), being a TypeError instance, instead -- would made easier implementing test suites, RPC mechanisms etc. Cheers. *j From zuo at chopin.edu.pl Sat Jun 9 22:14:33 2012 From: zuo at chopin.edu.pl (Jan Kaliszewski) Date: Sat, 9 Jun 2012 22:14:33 +0200 Subject: [Python-ideas] BindError as a built-in TypeError subclass (on the margin of PEP 362 discussion) In-Reply-To: <20120609200703.GF2587@chopin.edu.pl> References: <20120609200703.GF2587@chopin.edu.pl> Message-ID: <20120609201433.GG2587@chopin.edu.pl> Jan Kaliszewski dixit (2012-06-09, 22:07): > Raising BindError (or ArgumentError? the actual name is disputable of > course), being a TypeError instance, instead -- would made easier > implementing test suites, RPC mechanisms etc. Erratum: s/TypeError instance/TypeError subclass/, sorry. *j From breamoreboy at yahoo.co.uk Sat Jun 9 23:22:41 2012 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Sat, 09 Jun 2012 22:22:41 +0100 Subject: [Python-ideas] Replacing the standard IO streams (was Re: changing sys.stdout encoding) In-Reply-To: <4FD3ABCB.5080800@gmail.com> References: <4FD3ABCB.5080800@gmail.com> Message-ID: On 09/06/2012 21:02, Serhiy Storchaka wrote: > > None of these methods are not guaranteed to work if the input or output > have occurred before. That's a double negative so I'm not sure what you meant to say. Can you please rephrase it. I assume that English is not your native language, so I'll let you off :) -- Cheers. Mark Lawrence. From jeanpierreda at gmail.com Sun Jun 10 01:33:21 2012 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Sat, 9 Jun 2012 19:33:21 -0400 Subject: [Python-ideas] Nudging beginners towards a more accurate mental model for loop else clauses In-Reply-To: <4FD38CB7.9070804@pearwood.info> References: <4FD2B57F.5030401@pearwood.info> <4FD38CB7.9070804@pearwood.info> Message-ID: On Sat, Jun 9, 2012 at 1:49 PM, Steven D'Aprano wrote: > - If the loop is exited by a return, *nothing* following the return is > executed. That includes the else block. > > - If execution is halted by an exception (including raise), *nothing* > following the exception is executed. That includes the else block. > > - If execution is halted by an external event that halts or interrupts the > Python process, *nothing* following executes. That includes the else block. > > - If the loop never completes, *nothing* following the loop executes. That > includes the else block. > > To continue the analogy with the "Red Dwarf" quote I made earlier: Please stop mocking your own writing. I wrote nothing like the above. I said that maybe we should be specific and correct with when else is called. I didn't say that we should be exhaustive for when it is not. In fact, the example explanations I gave were not exhaustive. -- Devin From steve at pearwood.info Sun Jun 10 02:52:02 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 10 Jun 2012 10:52:02 +1000 Subject: [Python-ideas] BindError as a built-in TypeError subclass (on the margin of PEP 362 discussion) In-Reply-To: <20120609200703.GF2587@chopin.edu.pl> References: <20120609200703.GF2587@chopin.edu.pl> Message-ID: <4FD3EFB2.1010800@pearwood.info> Jan Kaliszewski wrote: > The present behaviour (ad 2. and 3.), i.e. raising TypeError, makes it > hard to differentiate call-argument-related errors from other TypeError > occurrences. > > Raising BindError (or ArgumentError? the actual name is disputable of > course), being a TypeError instance, instead -- would made easier > implementing test suites, RPC mechanisms etc. +1 Since this will be an error that beginners see (frequently), I suggest ArgumentError is more friendly than BindError. -- Steven From ncoghlan at gmail.com Sun Jun 10 04:26:17 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 10 Jun 2012 12:26:17 +1000 Subject: [Python-ideas] Replacing the standard IO streams (was Re: changing sys.stdout encoding) In-Reply-To: <4FD3ABCB.5080800@gmail.com> References: <4FD3ABCB.5080800@gmail.com> Message-ID: Calling detach() on the standard streams is a bad idea - the interpreter uses the originals internally, and calling detach() breaks them. -- Sent from my phone, thus the relative brevity :) On Jun 10, 2012 6:03 AM, "Serhiy Storchaka" wrote: > On 09.06.12 12:55, Nick Coghlan wrote: > >> So, after much digging, it appears the *right* way to replace a >> standard stream in Python 3 after application start is to do the >> following: >> >> sys.stdin = open(sys.stdin.fileno(), 'r',) >> sys.stdout = open(sys.stdout.fileno(), 'w',) >> sys.stderr = open(sys.stderr.fileno(), 'w',) >> > > sys.stdin = io.TextIOWrapper(sys.stdin.**detach(), ) > sys.stdout = io.TextIOWrapper(sys.stdout.**detach(), ) > ... > > None of these methods are not guaranteed to work if the input or output > have occurred before. > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sun Jun 10 05:03:46 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 10 Jun 2012 13:03:46 +1000 Subject: [Python-ideas] Nudging beginners towards a more accurate mental model for loop else clauses In-Reply-To: References: <4FD2B57F.5030401@pearwood.info> <4FD38CB7.9070804@pearwood.info> Message-ID: <4FD40E92.7020104@pearwood.info> Devin Jeanpierre wrote: [...] > Please stop mocking your own writing. I wrote nothing like the above. > > I said that maybe we should be specific and correct with when else is > called. I didn't say that we should be exhaustive for when it is not. > In fact, the example explanations I gave were not exhaustive. You explicitly worried that users will conclude that the else block will run "except not when left by break", and stated that the description given earlier implies that for/else behaves like try/finally (i.e. that the else clause is *only* skipped on a break, but not return or raise). There is no evidence that users somehow get the impression that for/else behaves like try/finally, and I find it completely implausible that they will do so in the future. If I'm wrong, the docs can be revised, but until then, in my opinion worrying about this is a documentation case of YAGNI. The current documentation for for/else is already specific and correct. The real-life problem Nick is trying to solve is that many people think that the else clause implies that it behaves like if/else, and Nick is trying to nudge users to think of try/else instead. I think that's a worthy goal. Worrying about users reading the tutorial and concluding that for/else will run when you exit with a return, not so much. -- Steven From jeanpierreda at gmail.com Sun Jun 10 05:28:04 2012 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Sat, 9 Jun 2012 23:28:04 -0400 Subject: [Python-ideas] Nudging beginners towards a more accurate mental model for loop else clauses In-Reply-To: <4FD40E92.7020104@pearwood.info> References: <4FD2B57F.5030401@pearwood.info> <4FD38CB7.9070804@pearwood.info> <4FD40E92.7020104@pearwood.info> Message-ID: On Sat, Jun 9, 2012 at 11:03 PM, Steven D'Aprano wrote: > There is no evidence that users somehow get the impression that for/else > behaves like try/finally, and I find it completely implausible that they > will do so in the future. If I'm wrong, the docs can be revised, but until > then, in my opinion worrying about this is a documentation case of YAGNI. > > The current documentation for for/else is already specific and correct. The > real-life problem Nick is trying to solve is that many people think that the > else clause implies that it behaves like if/else, and Nick is trying to > nudge users to think of try/else instead. I think that's a worthy goal. > Worrying about users reading the tutorial and concluding that for/else will > run when you exit with a return, not so much. You are confused. A) I was arguing in favor of the current documentation, written by Nick Coghlan. You were arguing in favor of Yuval's thing. You appear to have forgotten this, and are now agreeing with me. B) Obviously there is no empirical evidence for anything, because Yuval's thing is unpublished, and the current documentation was added two days ago to the dev branch of the docs. -- Devin From rurpy at yahoo.com Sun Jun 10 06:22:03 2012 From: rurpy at yahoo.com (Rurpy) Date: Sat, 9 Jun 2012 21:22:03 -0700 (PDT) Subject: [Python-ideas] Replacing the standard IO streams (was Re: changing sys.stdout encoding) Message-ID: <1339302123.29306.YahooMailClassic@web161505.mail.bf1.yahoo.com> On 06/09/2012 08:26 PM, Nick Coghlan wrote: > Calling detach() on the standard streams is a bad idea - the > interpreter uses the originals internally, and calling detach() > breaks them. The documentation for sys.std* specifically describes using detach() on the standard streams: | To write or read binary data from/to the standard | streams, use the underlying binary buffer. and gives example code. The only caveat mentioned is that detach() "can raise AttributeError or io.UnsupportedOperation" if the stream has benn replaced with something that does not support detach(). From yselivanov.ml at gmail.com Sun Jun 10 06:36:36 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sun, 10 Jun 2012 00:36:36 -0400 Subject: [Python-ideas] BindError as a built-in TypeError subclass (on the margin of PEP 362 discussion) In-Reply-To: <20120609200703.GF2587@chopin.edu.pl> References: <20120609200703.GF2587@chopin.edu.pl> Message-ID: On 2012-06-09, at 4:07 PM, Jan Kaliszewski wrote: > Suggestion > ========== > > I think that BindError proposed in PEP 362 could be a built-in TypeError > subclass, raised whenever given arguments do not match a given callable: > > 1. while using Signature().bind(...) [as proposed in PEP 362], > > and also > > 2. while using inspect.getcallargs(...) > > and *also* > > 3. while doing *any* call. > > Rationale > ========= > > The present behaviour (ad 2. and 3.), i.e. raising TypeError, makes it > hard to differentiate call-argument-related errors from other TypeError > occurrences. > > Raising BindError (or ArgumentError? the actual name is disputable of > course), being a TypeError instance, instead -- would made easier > implementing test suites, RPC mechanisms etc. That's how it is currently implemented - BindError(TypeError). I'll mention this in the PEP. - Yury From solipsis at pitrou.net Sun Jun 10 09:17:02 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 10 Jun 2012 09:17:02 +0200 Subject: [Python-ideas] Replacing the standard IO streams (was Re: changing sys.stdout encoding) In-Reply-To: References: <4FD3ABCB.5080800@gmail.com> Message-ID: Le 10/06/2012 04:26, Nick Coghlan a ?crit : > Calling detach() on the standard streams is a bad idea - the interpreter > uses the originals internally, and calling detach() breaks them. Where does it do that? The interpreter certainly shouldn't hardwire the original objects internally. Moreover, your snippet is wrong because if someone replaces the streams for a second time, garbage collecting the previous streams will close the file descriptors. You should use closefd=False. Regards Antoine. From pyideas at rebertia.com Sun Jun 10 09:32:41 2012 From: pyideas at rebertia.com (Chris Rebert) Date: Sun, 10 Jun 2012 00:32:41 -0700 Subject: [Python-ideas] BindError as a built-in TypeError subclass (on the margin of PEP 362 discussion) In-Reply-To: <4FD3EFB2.1010800@pearwood.info> References: <20120609200703.GF2587@chopin.edu.pl> <4FD3EFB2.1010800@pearwood.info> Message-ID: On Sat, Jun 9, 2012 at 5:52 PM, Steven D'Aprano wrote: > Jan Kaliszewski wrote: > >> The present behaviour (ad 2. and 3.), i.e. raising TypeError, makes it >> hard to differentiate call-argument-related errors from other TypeError >> occurrences. >> >> Raising BindError (or ArgumentError? the actual name is disputable of >> course), being a TypeError instance, instead -- would made easier >> implementing test suites, RPC mechanisms etc. > > +1 > > Since this will be an error that beginners see (frequently), I suggest > ArgumentError is more friendly than BindError. I note that Ruby also has an ArgumentError, which it raises both for calls with an incorrect number of arguments and in cases when Python would raise ValueError. Cheers, Chris From steve at pearwood.info Sun Jun 10 14:00:30 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 10 Jun 2012 22:00:30 +1000 Subject: [Python-ideas] BindError as a built-in TypeError subclass (on the margin of PEP 362 discussion) In-Reply-To: References: <20120609200703.GF2587@chopin.edu.pl> <4FD3EFB2.1010800@pearwood.info> Message-ID: <4FD48C5E.6000102@pearwood.info> Chris Rebert wrote: >> Since this will be an error that beginners see (frequently), I suggest >> ArgumentError is more friendly than BindError. > > I note that Ruby also has an ArgumentError, which it raises both for > calls with an incorrect number of arguments and in cases when Python > would raise ValueError. Even if I wanted to replace ValueError with ArgumentError (and I don't), we couldn't due to backward compatibility. (Although I suppose ArgumentError could inherit from both TypeError and ValueError.) My concept is that errors due to the wrong argument count, duplicate or missing keyword arguments, etc. which currently raise TypeError could raise ArgumentError, a subclass, instead. That will make distinguishing between "passed the wrong number of arguments" from "passed the wrong type of argument" easier. -- Steven From steve at pearwood.info Sun Jun 10 14:04:09 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 10 Jun 2012 22:04:09 +1000 Subject: [Python-ideas] Nudging beginners towards a more accurate mental model for loop else clauses In-Reply-To: References: <4FD2B57F.5030401@pearwood.info> <4FD38CB7.9070804@pearwood.info> <4FD40E92.7020104@pearwood.info> Message-ID: <4FD48D39.10507@pearwood.info> Devin Jeanpierre wrote: > On Sat, Jun 9, 2012 at 11:03 PM, Steven D'Aprano wrote: >> There is no evidence that users somehow get the impression that for/else >> behaves like try/finally, and I find it completely implausible that they >> will do so in the future. If I'm wrong, the docs can be revised, but until >> then, in my opinion worrying about this is a documentation case of YAGNI. >> >> The current documentation for for/else is already specific and correct. The >> real-life problem Nick is trying to solve is that many people think that the >> else clause implies that it behaves like if/else, and Nick is trying to >> nudge users to think of try/else instead. I think that's a worthy goal. >> Worrying about users reading the tutorial and concluding that for/else will >> run when you exit with a return, not so much. > > You are confused. Perhaps I am. > A) I was arguing in favor of the current documentation, written by > Nick Coghlan. You were arguing in favor of Yuval's thing. You appear > to have forgotten this, and are now agreeing with me. The context which has been lost is that Terry Reedy objected to Yuval's description of for/else. I replied to Terry's objection, disagreeing, and you replied to me, (apparently) disagreeing with my reply. Do you blame me for thinking you were agreeing with Terry? I think that our positions are probably closer than our disagreements might suggest. > B) Obviously there is no empirical evidence for anything, because > Yuval's thing is unpublished, and the current documentation was added > two days ago to the dev branch of the docs. We have anecdotal evidence that many people expect that for/else will execute the else clause when the for loop is empty. We have no anecdotal evidence, or any other evidence, that anyone excepts that the else clause runs if you return out of the loop. -- Steven From masklinn at masklinn.net Sun Jun 10 15:05:53 2012 From: masklinn at masklinn.net (Masklinn) Date: Sun, 10 Jun 2012 15:05:53 +0200 Subject: [Python-ideas] Add adaptive-load salt-mandatory hashing functions? Message-ID: <43D1A0EB-A850-4F67-9B6F-D852DE23B0F8@masklinn.net> The standard library already provides for cryptographic hashes (hashlib) and MACs (hmac). One issue which exists, and has been repeatedly outlined after several breaches of straight-hashed databases (salted and unsalted) last week, is that many developers do not know: 1. straight hashes are not sufficient to store passwords securely in case of database breach 2. salted password, while mitigating rainbow table attacks, aren't enough to mitigate brute-force attacks. (in case of database breach, the goal being to protect password plaintexts from being found and matched to a user identity in case users re-use passwords across services, as it would allow attackers to access all services used by the user). The best solution to these currently is *mandatory* salting (of specified minimum strength) and adaptive workload which can be tuned higher to keep up with Moore's law (especially as most hashing functions tend to be very fast and embarassingly parallelizable, two undesirable properties in the face of brute-forcing of the plaintext). Therefore, I would suggest either adding a new module (name tbd) or adding new constructors to hashlib. * All password-hashing functions listed below should recommend a strong salt (the PBKDF2 specification recommends 64 bits, we could go further) by erroring out (ValueError) if the conditions are not met unless a `weak_salt=True` parameter is provided. I think this would be sufficient to hint at the importance of salt to users, and to drive them to "the right thing". The salt should also be mandated non-empty, providing an empty salt should generate an error in all cases. * All password-hashing functions should require a `workload` parameter with documentary recommendation. A default value might make sense in the short run (ensure the functions are used with an acceptably high workload), but those defaults would be set in stone for users *not* setting their own load factor. This module (or addition) should provide, if possible: * PBKDF2, recommending a load factor of above 10000. The recommended load factor in RFC 2898 (PKCS #5) is 1000, but the specification is 12 years old. Extrapolating on that original load factor using Moore's law (the load factor has a linear relation to the amount of computation in PBKDF2 as it's the number of hashing iterations), the stdlib could recommend a load factor of 64000 (6 doublings). As with hmac, it should be possible to configure the digest constructor (PKCS #5 specifies HMAC-SHA1 as the default PRF) * bcrypt, the bcrypt C library is BSD-licensed and open-source so it could be added pretty directly, there is already a wrapper called "py-bcrypt" (under ISC/BSD licence)[0] * scrypt is younger and has been looked at less than the previous two[0], but from my readings (of articles on it, I am no cryptographer) it seems to have no overt issue and combines load-adaptive CPU-hardness with load-adaptive memory-hardness (PBKDF2 and bcrypt both work in constant space) making it significantly more resistant to massively parallel brute-forcing arrays (GPGPU or custom ASIC). It is available under a 2-clause BSD license as are the existing Python bindings I could find[2], but has a hard dependency on OpenSSL which may prevent its usage. I think these would make Python users safe by lowering the cost of using these functions and by demonstrating ways to safely store passwords up-front. They could be augmented with a note in hashlib indicating that they are to be preferred for password hashing. [0] especially PBKDF2, still the most conservatively safe choice [1] http://code.google.com/p/py-bcrypt/ [2] http://pypi.python.org/pypi/scrypt/ From ncoghlan at gmail.com Sun Jun 10 15:16:24 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 10 Jun 2012 23:16:24 +1000 Subject: [Python-ideas] Replacing the standard IO streams (was Re: changing sys.stdout encoding) In-Reply-To: References: <4FD3ABCB.5080800@gmail.com> Message-ID: On Sun, Jun 10, 2012 at 5:17 PM, Antoine Pitrou wrote: > Le 10/06/2012 04:26, Nick Coghlan a ?crit : > >> Calling detach() on the standard streams is a bad idea - the interpreter >> uses the originals internally, and calling detach() breaks them. > > > Where does it do that? The interpreter certainly shouldn't hardwire the > original objects internally. At the very least, sys.__std(in/out/err)__. Doing "sys.stderr = io.TextIOWrapper(sys.stderr.detach(), line_buffering=True)" also seems to suppress display of exception tracebacks at the interactive prompt (perhaps the default except hook is using a cached reference?). I believe PyFatalError and other APIs that are used deep in the interpreter won't respect the module level setting. Basically, it's dangerous to use detach() on a stream where you don't hold the sole reference, and the safest approach with the standard streams is to assume that other code is holding references to them. Detaching the standard streams is just as likely to cause problems as closing them. > Moreover, your snippet is wrong because if someone replaces the streams for > a second time, garbage collecting the previous streams will close the file > descriptors. You should use closefd=False. True, although that nicety is all the more reason to encapsulate this idiom in a new IOBase.reopen() method: def reopen(self, mode=None, buffering=-1, encoding=None, errors=None, newline=None, closefd=False): if mode is None: mode = getattr(mode, self, 'r') return open(self.fileno(), mode, buffering, encoding, errors, newline, closefd) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Sun Jun 10 15:22:52 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 10 Jun 2012 23:22:52 +1000 Subject: [Python-ideas] BindError as a built-in TypeError subclass (on the margin of PEP 362 discussion) In-Reply-To: <4FD48C5E.6000102@pearwood.info> References: <20120609200703.GF2587@chopin.edu.pl> <4FD3EFB2.1010800@pearwood.info> <4FD48C5E.6000102@pearwood.info> Message-ID: On Sun, Jun 10, 2012 at 10:00 PM, Steven D'Aprano wrote: > My concept is that errors due to the wrong argument count, duplicate or > missing keyword arguments, etc. which currently raise TypeError could raise > ArgumentError, a subclass, instead. > > That will make distinguishing between "passed the wrong number of arguments" > from "passed the wrong type of argument" easier. This is actually why I prefer "BindError" to the name "ArgumentError". The former is explicit about what has gone wrong: the supplied arguments could not be bound to the parameters expected by the supplied callable. "ArgumentError", on the other hand, could easily refer to any of: - failing to bind the supplied arguments to the expected parameters (currently TypeError, will be BindError when using PEP 362) - one or more of the arguments is of the wrong type (currently TypeError) - one or more of the arguments has an unacceptable value (currently ValueError) While I don't think the PEP should be held up over it, the idea of making BindError a builtin exception and also raising it in the interpreter's internal parameter binding code is certainly an interesting idea to explore in the future. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Sun Jun 10 15:36:43 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 10 Jun 2012 23:36:43 +1000 Subject: [Python-ideas] Nudging beginners towards a more accurate mental model for loop else clauses In-Reply-To: <4FD48D39.10507@pearwood.info> References: <4FD2B57F.5030401@pearwood.info> <4FD38CB7.9070804@pearwood.info> <4FD40E92.7020104@pearwood.info> <4FD48D39.10507@pearwood.info> Message-ID: On Sun, Jun 10, 2012 at 10:04 PM, Steven D'Aprano wrote: > We have anecdotal evidence that many people expect that for/else will > execute the else clause when the for loop is empty. > > We have no anecdotal evidence, or any other evidence, that anyone excepts > that the else clause runs if you return out of the loop. Right. We also need to remember that this entire discussion started with a complaint regarding an apparent internal inconsistency in the language, because the else clauses on if statements and loops don't mean exactly the same thing. When you read the tutorial, it introduces the first two forms together, but the third form (try/except/else) doesn't show up until a later chapter on exception handling. This was quite possibly one of the factors leading people to make a perfectly reasonable intuitive leap that happens to be wrong. All my docs addition is designed to do is discourage readers from making that incorrect intuitive leap. They will still need to learn how the else clauses interact with other constructs, like exceptions and early returns, but those details aren't relevant to building a fence across the tempting-but-wrong path from "if /else" to "for x in /else". It's a tricky educational problem to be sure, and if it wasn't for backwards compatibility requirements, there would be a strong temptation to just drop the else clause from loops entirely. The versions that use sentinel values instead aren't *that* complicated, and have the virtue of being explicit. However, that's not going to happen (it would break too much code without a sufficiently compelling justification), so making small tweaks to the relevant tutorial docs (that will hopefully be picked up by Python instructors and other learning and teching resources) is a reasonable way forward. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From simon.sapin at kozea.fr Sun Jun 10 16:17:22 2012 From: simon.sapin at kozea.fr (Simon Sapin) Date: Sun, 10 Jun 2012 16:17:22 +0200 Subject: [Python-ideas] Add adaptive-load salt-mandatory hashing functions? In-Reply-To: <43D1A0EB-A850-4F67-9B6F-D852DE23B0F8@masklinn.net> References: <43D1A0EB-A850-4F67-9B6F-D852DE23B0F8@masklinn.net> Message-ID: <4FD4AC72.3070206@kozea.fr> Le 10/06/2012 15:05, Masklinn a ?crit : > The standard library already provides for cryptographic hashes (hashlib) > and MACs (hmac). > > [snip] > > Therefore, I would suggest either adding a new module (name tbd) or > adding new constructors to hashlib. PBKDF2 can be implemented in 15 lines of code based on the hmac and hashlib modules: https://github.com/mitsuhiko/python-pbkdf2/blob/master/pbkdf2.py Although the code is short, it is easy to get wrong. So I think it would be nice to have in the stdlib, tested once and for all. Also, PBKDF2 is a well-defined spec that will not change (or it will be called PBKDF3 or something) which I think makes it a good fit for the stdlib. I would suggest to have Armin?s implementation (linked above) included as-is, but it?s probably too late for 3.3. -- Simon Sapin From storchaka at gmail.com Sun Jun 10 16:34:08 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 10 Jun 2012 17:34:08 +0300 Subject: [Python-ideas] Replacing the standard IO streams (was Re: changing sys.stdout encoding) In-Reply-To: References: <4FD3ABCB.5080800@gmail.com> Message-ID: On 10.06.12 00:22, Mark Lawrence wrote: > On 09/06/2012 21:02, Serhiy Storchaka wrote: >> None of these methods are not guaranteed to work if the input or output >> have occurred before. > > That's a double negative so I'm not sure what you meant to say. Can you > please rephrase it. I assume that English is not your native language, > so I'll let you off :) open(sys.stdin.fileno()) is not guaranteed to work if the input or output have occurred before. And io.TextIOWrapper(sys.stdin.detach()) is not guaranteed to work if the input or output have occurred before. sys.stdin internal buffer can contains read by not used characters. sys.stdin.buffer internal buffer can contains read by not used bytes. With multibyte encoding sys.stdin.decoder internal buffer can contains uncompleted multibyte character. From storchaka at gmail.com Sun Jun 10 16:45:02 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 10 Jun 2012 17:45:02 +0300 Subject: [Python-ideas] Replacing the standard IO streams (was Re: changing sys.stdout encoding) In-Reply-To: References: <4FD3ABCB.5080800@gmail.com> Message-ID: On 10.06.12 05:26, Nick Coghlan wrote: > Calling detach() on the standard streams is a bad idea - the interpreter > uses the originals internally, and calling detach() breaks them. If interpreter uses standard streams then it uses raw C streams (FILE *) stdin/stdout/etc. Calling open(sys.stdin.fileno()) bypasses internal buffering in sys.stdin, sys.stdin.buffer, sys.stdin.decoder and raw C stdin (if it used in lower level), and lose and break multibyte characters. From ncoghlan at gmail.com Sun Jun 10 17:28:20 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 11 Jun 2012 01:28:20 +1000 Subject: [Python-ideas] Add adaptive-load salt-mandatory hashing functions? In-Reply-To: <4FD4AC72.3070206@kozea.fr> References: <43D1A0EB-A850-4F67-9B6F-D852DE23B0F8@masklinn.net> <4FD4AC72.3070206@kozea.fr> Message-ID: On Mon, Jun 11, 2012 at 12:17 AM, Simon Sapin wrote: > Le 10/06/2012 15:05, Masklinn a ?crit : >> >> The standard library already provides for cryptographic hashes (hashlib) >> and MACs (hmac). >> >> [snip] >> >> >> Therefore, I would suggest either adding a new module (name tbd) or >> adding new constructors to hashlib. > > > PBKDF2 can be implemented in 15 lines of code based on the hmac and hashlib > modules: > > https://github.com/mitsuhiko/python-pbkdf2/blob/master/pbkdf2.py > > Although the code is short, it is easy to get wrong. So I think it would be > nice to have in the stdlib, tested once and for all. > > Also, PBKDF2 is a well-defined spec that will not change (or it will be > called PBKDF3 or something) which I think makes it a good fit for the > stdlib. > > I would suggest to have Armin?s implementation (linked above) included > as-is, but it?s probably too late for 3.3. It's cutting it very fine relative to the beta feature freeze (which is in a couple of weeks), but it could still make it in as a very reasonable addition to the standard library. The hmac module has already been enhanced with a "secure_compare" function for 3.3 to perform string and byte sequence comparisons that don't leak as much information about the expected result under timing attacks (it still leaks the expected length, but beyond that the running time of the comparison should be constant for a given digest length). Since the PBKDF2 key derivation requires hmac, and hmac depends on hashlib (to provide the default hash algorithm for hmac.HMAC), I believe the best way to expedite this would be to: 1. Create an issue on bugs.python.org proposing just the binary version of pbkdf2 as an enhancement to hmac 2. Attach a patch that updates Lib/hmac.py, Lib/test/test_hmac.py and Doc/library/hmac.rst accordingly (this will likely require changes to work with bytes rather than 2.x strings) 3. Adds a "min_salt_len" parameter to discourage short salt values (rather than the "weak_salt" boolean flag suggested by Masklinn) 4. Post to python-dev proposing the addition of that function for Python 3 Having needed a key derivation function myself not that long ago, and with the recent high profile password database breaches Masklinn noted, this seems like a very reasonable addition to me. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From stephen at xemacs.org Sun Jun 10 17:36:25 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 11 Jun 2012 00:36:25 +0900 Subject: [Python-ideas] changing sys.stdout encoding In-Reply-To: <1339214827.45788.YahooMailClassic@web161505.mail.bf1.yahoo.com> References: <87vcj283cz.fsf@uwakimon.sk.tsukuba.ac.jp> <1339214827.45788.YahooMailClassic@web161505.mail.bf1.yahoo.com> Message-ID: <87hauj89hi.fsf@uwakimon.sk.tsukuba.ac.jp> Rurpy writes: > Or are you saying some api's should be discouraged and making them > hard to use is better than a "not recommended" note in the > documentation? No, I'm saying "explicit is better than implicit". It's not hard to use the explicit idiom, and it makes it clear that there are two *different* kinds of problem that could occur, which would be concealed by an API. From ncoghlan at gmail.com Sun Jun 10 17:44:08 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 11 Jun 2012 01:44:08 +1000 Subject: [Python-ideas] Replacing the standard IO streams (was Re: changing sys.stdout encoding) In-Reply-To: References: <4FD3ABCB.5080800@gmail.com> Message-ID: On Mon, Jun 11, 2012 at 12:34 AM, Serhiy Storchaka wrote: > On 10.06.12 00:22, Mark Lawrence wrote: >> >> On 09/06/2012 21:02, Serhiy Storchaka wrote: >>> >>> None of these methods are not guaranteed to work if the input or output >>> have occurred before. >> >> >> That's a double negative so I'm not sure what you meant to say. Can you >> please rephrase it. I assume that English is not your native language, >> so I'll let you off :) > > > open(sys.stdin.fileno()) is not guaranteed to work if the input or output > have occurred before. And io.TextIOWrapper(sys.stdin.detach()) is not > guaranteed to work if the input or output have occurred before. sys.stdin > internal buffer can contains read by not used characters. sys.stdin.buffer > internal buffer can contains read by not used bytes. With multibyte encoding > sys.stdin.decoder internal buffer can contains uncompleted multibyte > character. Right, but the point of this discussion is to document the cleanest available way for an application to change these settings at *application start* (e.g. to support an "--encoding" parameter). Yes, there are potential issues if you use any of these mechanisms while there is data in the buffers, but that's a much harder problem and not one we're trying to solve here. Regardless, the advantage of the "open + fileno" idiom is that it works for *any* level of change. If you want to force your streams to unbuffered binary IO rather than merely changing the encoding: sys.stdin = open(sys.stdin.fileno(), 'rb', buffering=0, closefd=False) sys.stdout = open(sys.stdout.fileno(), 'wb', buffering=0, closefd=False) sys.stderr = open(sys.stderr.fileno(), 'wb', buffering=0, closefd=False) Keep them as text, but force them to permissive utf-8, no matter how the interpreter originally created them?: sys.stdin = open(sys.stdin.fileno(), 'r', encoding="utf-8", errors="surrogateescape", closefd=False) sys.stdout = open(sys.stdout.fileno(), 'w', encoding="utf-8", errors="surrogateescape", closefd=False) sys.stderr = open(sys.stderr.fileno(), 'w', encoding="utf-8", errors="surrogateescape", closefd=False) This approach also has the advantage of leaving sys.__std(in/out/err)__ in a somewhat usable state. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ubershmekel at gmail.com Sun Jun 10 17:50:36 2012 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Sun, 10 Jun 2012 18:50:36 +0300 Subject: [Python-ideas] Nudging beginners towards a more accurate mental model for loop else clauses In-Reply-To: References: <4FD2B57F.5030401@pearwood.info> <4FD38CB7.9070804@pearwood.info> <4FD40E92.7020104@pearwood.info> <4FD48D39.10507@pearwood.info> Message-ID: I hope this isn't too off-topic, but is the tutorial supposed to exhaustively explain the python language? Because if not, then the for-else/while-else clause may be a good thing to move to an appendix. Yuval -------------- next part -------------- An HTML attachment was scrubbed... URL: From masklinn at masklinn.net Sun Jun 10 17:52:44 2012 From: masklinn at masklinn.net (Masklinn) Date: Sun, 10 Jun 2012 17:52:44 +0200 Subject: [Python-ideas] Add adaptive-load salt-mandatory hashing functions? In-Reply-To: References: <43D1A0EB-A850-4F67-9B6F-D852DE23B0F8@masklinn.net> <4FD4AC72.3070206@kozea.fr> Message-ID: <5E974769-E9F0-4F0D-BE45-CC2BD55E8902@masklinn.net> On 2012-06-10, at 17:28 , Nick Coghlan wrote: > On Mon, Jun 11, 2012 at 12:17 AM, Simon Sapin wrote: >> Le 10/06/2012 15:05, Masklinn a ?crit : >>> >>> The standard library already provides for cryptographic hashes (hashlib) >>> and MACs (hmac). >>> >>> [snip] >>> >>> >>> Therefore, I would suggest either adding a new module (name tbd) or >>> adding new constructors to hashlib. >> >> >> PBKDF2 can be implemented in 15 lines of code based on the hmac and hashlib >> modules: >> >> https://github.com/mitsuhiko/python-pbkdf2/blob/master/pbkdf2.py >> >> Although the code is short, it is easy to get wrong. So I think it would be >> nice to have in the stdlib, tested once and for all. >> >> Also, PBKDF2 is a well-defined spec that will not change (or it will be >> called PBKDF3 or something) which I think makes it a good fit for the >> stdlib. >> >> I would suggest to have Armin?s implementation (linked above) included >> as-is, but it?s probably too late for 3.3. > > It's cutting it very fine relative to the beta feature freeze (which > is in a couple of weeks), but it could still make it in as a very > reasonable addition to the standard library. > > The hmac module has already been enhanced with a "secure_compare" > function for 3.3 to perform string and byte sequence comparisons that > don't leak as much information about the expected result under timing > attacks (it still leaks the expected length, but beyond that the > running time of the comparison should be constant for a given digest > length). > > Since the PBKDF2 key derivation requires hmac, and hmac depends on > hashlib (to provide the default hash algorithm for hmac.HMAC), I > believe the best way to expedite this would be to: > > 1. Create an issue on bugs.python.org proposing just the binary > version of pbkdf2 as an enhancement to hmac Although it makes sense from a dependency POV, I'm not sure it's the best place to put it as people in need of knowing about PBKDF2 would be more likely to be browsing hashlib, and ? more importantly ? PBKDF2 isn't a MAC, the usage of hmac underlying it being mostly incidental. If PBKDF2 alone is added, I think putting it in its own module (parallel to hmac) would be cleaner, *that* can be deprecated if more cryptographic hashes of that style (e.g. bcrypt, scrypt) are added later on in the style of md5 -> hashlib. > 2. Attach a patch that updates Lib/hmac.py, Lib/test/test_hmac.py and > Doc/library/hmac.rst accordingly (this will likely require changes to > work with bytes rather than 2.x strings) > 3. Adds a "min_salt_len" parameter to discourage short salt values > (rather than the "weak_salt" boolean flag suggested by Masklinn) > 4. Post to python-dev proposing the addition of that function for Python 3 > > Having needed a key derivation function myself not that long ago, and > with the recent high profile password database breaches Masklinn > noted, this seems like a very reasonable addition to me. From python at mrabarnett.plus.com Sun Jun 10 18:04:17 2012 From: python at mrabarnett.plus.com (MRAB) Date: Sun, 10 Jun 2012 17:04:17 +0100 Subject: [Python-ideas] Nudging beginners towards a more accurate mental model for loop else clauses In-Reply-To: References: <4FD2B57F.5030401@pearwood.info> <4FD38CB7.9070804@pearwood.info> <4FD40E92.7020104@pearwood.info> <4FD48D39.10507@pearwood.info> Message-ID: <4FD4C581.1040607@mrabarnett.plus.com> On 10/06/2012 16:50, Yuval Greenfield wrote: > I hope this isn't too off-topic, but is the tutorial supposed to > exhaustively explain the python language? > > Because if not, then the for-else/while-else clause may be a good thing > to move to an appendix. > The for-else/while-else clause is part of the core language, so it should be explained. From ncoghlan at gmail.com Sun Jun 10 18:04:50 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 11 Jun 2012 02:04:50 +1000 Subject: [Python-ideas] Nudging beginners towards a more accurate mental model for loop else clauses In-Reply-To: References: <4FD2B57F.5030401@pearwood.info> <4FD38CB7.9070804@pearwood.info> <4FD40E92.7020104@pearwood.info> <4FD48D39.10507@pearwood.info> Message-ID: On Mon, Jun 11, 2012 at 1:50 AM, Yuval Greenfield wrote: > I hope this isn't too off-topic, but is the tutorial supposed to > exhaustively explain the python language? > > Because if not, then the for-else/while-else clause may be a good thing to > move to an appendix. It's supposed to arm people well enough to cope with at least *reading* most code they're likely to encounter. Since for/else is the idiomatic way to write a search loop, even beginners really should learn how to read it. For more esoteric stuff like metaclasses where the philosophy of "If you're wondering whether or not you need it, you don't need it" applies, then the tutorial can safely skip it. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Sun Jun 10 18:11:04 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 11 Jun 2012 02:11:04 +1000 Subject: [Python-ideas] Add adaptive-load salt-mandatory hashing functions? In-Reply-To: <5E974769-E9F0-4F0D-BE45-CC2BD55E8902@masklinn.net> References: <43D1A0EB-A850-4F67-9B6F-D852DE23B0F8@masklinn.net> <4FD4AC72.3070206@kozea.fr> <5E974769-E9F0-4F0D-BE45-CC2BD55E8902@masklinn.net> Message-ID: On Mon, Jun 11, 2012 at 1:52 AM, Masklinn wrote: > On 2012-06-10, at 17:28 , Nick Coghlan wrote: >> 1. Create an issue on bugs.python.org proposing just the binary >> version of pbkdf2 as an enhancement to hmac > > Although it makes sense from a dependency POV, I'm not sure it's the > best place to put it as people in need of knowing about PBKDF2 would > be more likely to be browsing hashlib, and ? more importantly ? PBKDF2 > isn't a MAC, the usage of hmac underlying it being mostly incidental. > > If PBKDF2 alone is added, I think putting it in its own module > (parallel to hmac) would be cleaner, *that* can be deprecated if > more cryptographic hashes of that style (e.g. bcrypt, scrypt) are > added later on in the style of md5 -> hashlib. Yeah, you're probably right. Either a new module, or else in "getpass" (either way, with a cross-reference from hashlib). Wherever it ends up, it should also reference hmac.secure_compare for a comparison function that doesn't allowing timing attacks to progressively discover the expected hash. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ubershmekel at gmail.com Sun Jun 10 18:13:38 2012 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Sun, 10 Jun 2012 19:13:38 +0300 Subject: [Python-ideas] Nudging beginners towards a more accurate mental model for loop else clauses In-Reply-To: <4FD4C581.1040607@mrabarnett.plus.com> References: <4FD2B57F.5030401@pearwood.info> <4FD38CB7.9070804@pearwood.info> <4FD40E92.7020104@pearwood.info> <4FD48D39.10507@pearwood.info> <4FD4C581.1040607@mrabarnett.plus.com> Message-ID: On Sun, Jun 10, 2012 at 7:04 PM, MRAB wrote: > On 10/06/2012 16:50, Yuval Greenfield wrote: > >> I hope this isn't too off-topic, but is the tutorial supposed to >> exhaustively explain the python language? >> >> Because if not, then the for-else/while-else clause may be a good thing >> to move to an appendix. >> >> The for-else/while-else clause is part of the core language, so it > should be explained. > If we want a dust of a chance to deprecate for-else/while-else in python 6, circa 2031, then we should at least move it to the back of the tutorial. I'm not suggesting to completely delete the text, just to nudge it to the end. This clause is most definitely not a common pattern in python. Personally I've never seen it in the wild and most pythonistas I've spoken with have never heard of the construct. Yuval -------------- next part -------------- An HTML attachment was scrubbed... URL: From bruce at leapyear.org Sun Jun 10 18:23:13 2012 From: bruce at leapyear.org (Bruce Leban) Date: Sun, 10 Jun 2012 09:23:13 -0700 Subject: [Python-ideas] Nudging beginners towards a more accurate mental model for loop else clauses In-Reply-To: <4FD38CB7.9070804@pearwood.info> References: <4FD2B57F.5030401@pearwood.info> <4FD38CB7.9070804@pearwood.info> Message-ID: On Sat, Jun 9, 2012 at 10:48 AM, Steven D'Aprano wrote: > Loops exit at the top, not the bottom. This is most obvious when you think > about a while loop: > > while condition: > ... > > > Of course you have to be at the top of the loop for the while to check > condition, not the bottom. For-loops are not quite so obvious, but > execution has to return back to the top of the loop in order to check > whether or not the sequence is exhausted. > > Whether or not it is the *simplest* way to think about for/else, talking > about exhaustion of the list (iterable) is correct. > If you want to talk about exhaustion of the list then you need to talk differently about the while loop. Documentation is usually written for non-experts. When I taught intro to programming, the mental model that most students had was nowhere near as strong as most people on this list. The concept 'loop exits normally' would be much easier for them to understand. On Sat, Jun 9, 2012 at 10:49 AM, Steven D'Aprano wrote: > Ignoring try/finally blocks, which are special, we can assume that the > reader has (or will have once they actually learn about functions and > exceptions) a correct understanding of the behaviour of return and raise. > > - If the loop is exited by a return, *nothing* following the return is > executed. That includes the else block. > > - If execution is halted by an exception (including raise), *nothing* > following the exception is executed. That includes the else block. > > - If execution is halted by an external event that halts or interrupts the > Python process, *nothing* following executes. That includes the else block. > > - If the loop never completes, *nothing* following the loop executes. That > includes the else block. > You've written four different ways of saying 'loop does not exit normally' vs. saying once 'loop exits normally'. When you emphasize *nothing* above, it strongly suggests they all mean the same thing. If you *don't* ignore try/finally, then they don't. I don't think documentation needs to cover every case, but if you're going to write stuff in bold letters (or italic or whatever), then readers expect you're covering all the bases and not ignoring special cases. That may not be your intent but that's the way people read things. Again, docs are written for non-experts. Holly: He's dead, Dave. Everybody is dead. Everybody is dead, Dave. Lister: Wait. Are you trying to tell me everybody's dead? Holly: Yup. Well, except for Dracula who was executing a try/finally. He's undead and probably going to kill you too. But I didn't want to bother you with that minor detail. :-) --- Bruce Follow me: http://www.twitter.com/Vroo http://www.vroospeak.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Sun Jun 10 18:41:14 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 11 Jun 2012 01:41:14 +0900 Subject: [Python-ideas] Replacing the standard IO streams (was Re: changing sys.stdout encoding) In-Reply-To: References: <4FD3ABCB.5080800@gmail.com> Message-ID: <87fwa386hh.fsf@uwakimon.sk.tsukuba.ac.jp> ^^^^^^^[[[[[[[[[@[@Nick Coghlan writes: > On Mon, Jun 11, 2012 at 12:34 AM, Serhiy Storchaka wrote: > > open(sys.stdin.fileno()) is not guaranteed to work if the input or output > > have occurred before. [...] > Right, but the point of this discussion is to document the cleanest > available way for an application to change these settings at > *application start* (e.g. to support an "--encoding" parameter). Yes, > there are potential issues if you use any of these mechanisms while > there is data in the buffers, +1 The OP's problem is a real one. His use case (the "--encoding" parameter) seems to be the most likely one in production use, so the loss of buffered data issue should rarely come up. Changing encodings on the fly offers plenty of ways to lose data besides incomplete buffers, anyway. I am a little concerned with MRAB's report that import sys print("hello") sys.stdout.flush() sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8') print("hello") doesn't work as expected, though. (It does work for me on Mac OS X, both as above -- of course there are no '\r's in the output -- and with 'print("hello", end="\r\n")'.) From storchaka at gmail.com Sun Jun 10 18:43:51 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 10 Jun 2012 19:43:51 +0300 Subject: [Python-ideas] Replacing the standard IO streams (was Re: changing sys.stdout encoding) In-Reply-To: References: <4FD3ABCB.5080800@gmail.com> Message-ID: On 10.06.12 18:44, Nick Coghlan wrote: > This approach also has the advantage of leaving > sys.__std(in/out/err)__ in a somewhat usable state. And then sys.std* and sys.__std*__ have their own inconsistent buffers. From greg at krypto.org Sun Jun 10 19:56:46 2012 From: greg at krypto.org (Gregory P. Smith) Date: Sun, 10 Jun 2012 10:56:46 -0700 Subject: [Python-ideas] Add adaptive-load salt-mandatory hashing functions? In-Reply-To: References: <43D1A0EB-A850-4F67-9B6F-D852DE23B0F8@masklinn.net> <4FD4AC72.3070206@kozea.fr> <5E974769-E9F0-4F0D-BE45-CC2BD55E8902@masklinn.net> Message-ID: On Sun, Jun 10, 2012 at 9:11 AM, Nick Coghlan wrote: > On Mon, Jun 11, 2012 at 1:52 AM, Masklinn wrote: > > On 2012-06-10, at 17:28 , Nick Coghlan wrote: > >> 1. Create an issue on bugs.python.org proposing just the binary > >> version of pbkdf2 as an enhancement to hmac > > > > Although it makes sense from a dependency POV, I'm not sure it's the > > best place to put it as people in need of knowing about PBKDF2 would > > be more likely to be browsing hashlib, and ? more importantly ? PBKDF2 > > isn't a MAC, the usage of hmac underlying it being mostly incidental. > > > > If PBKDF2 alone is added, I think putting it in its own module > > (parallel to hmac) would be cleaner, *that* can be deprecated if > > more cryptographic hashes of that style (e.g. bcrypt, scrypt) are > > added later on in the style of md5 -> hashlib. > > Yeah, you're probably right. Either a new module, or else in "getpass" > (either way, with a cross-reference from hashlib). > > Wherever it ends up, it should also reference hmac.secure_compare for > a comparison function that doesn't allowing timing attacks to > progressively discover the expected hash. > > I'd just stick it in hmac myself but getpass was also a good suggestion. Cross reference to it from the docs of all three as the real goal of adding pbkdf2 is to advertise it to users so that they might use it rather than something more naive. hashlib itself should be kept pure as is for standard low level hash algorithms. It can't have a dependency on anything else. Even if this doesn't make it into the stdlib in time for 3.3, feel free to update the getpass, hmac and/or hashlib docs to point to the pbkdf2 module externally as a suggestion for passphrase/secret hashing. -gps -------------- next part -------------- An HTML attachment was scrubbed... URL: From simon.sapin at kozea.fr Sun Jun 10 20:04:17 2012 From: simon.sapin at kozea.fr (Simon Sapin) Date: Sun, 10 Jun 2012 20:04:17 +0200 Subject: [Python-ideas] Add adaptive-load salt-mandatory hashing functions? In-Reply-To: References: <43D1A0EB-A850-4F67-9B6F-D852DE23B0F8@masklinn.net> <4FD4AC72.3070206@kozea.fr> <5E974769-E9F0-4F0D-BE45-CC2BD55E8902@masklinn.net> Message-ID: <4FD4E1A1.6030409@kozea.fr> Le 10/06/2012 19:56, Gregory P. Smith a ?crit : >> Yeah, you're probably right. Either a new module, or else in "getpass" >> (either way, with a cross-reference from hashlib). > > I'd just stick it in hmac myself but getpass was also a good suggestion. I disagree. The getpass module is about terminal control, it has nothing to do with hashing. PBKDF2 or other adaptive hashes do not belong there. -- Simon Sapin From masklinn at masklinn.net Sun Jun 10 20:11:15 2012 From: masklinn at masklinn.net (Masklinn) Date: Sun, 10 Jun 2012 20:11:15 +0200 Subject: [Python-ideas] Add adaptive-load salt-mandatory hashing functions? In-Reply-To: <4FD4E1A1.6030409@kozea.fr> References: <43D1A0EB-A850-4F67-9B6F-D852DE23B0F8@masklinn.net> <4FD4AC72.3070206@kozea.fr> <5E974769-E9F0-4F0D-BE45-CC2BD55E8902@masklinn.net> <4FD4E1A1.6030409@kozea.fr> Message-ID: <8C7C9869-CB87-4E98-AD4E-AACD83375676@masklinn.net> On 2012-06-10, at 20:04 , Simon Sapin wrote: > Le 10/06/2012 19:56, Gregory P. Smith a ?crit : >>> Yeah, you're probably right. Either a new module, or else in "getpass" >>> (either way, with a cross-reference from hashlib). >> >> I'd just stick it in hmac myself but getpass was also a good suggestion. > > I disagree. The getpass module is about terminal control, it has nothing to do with hashing. PBKDF2 or other adaptive hashes do not belong there. It seems there's as many opinions on the subject as there are people (which was to be expected) when there's no code yet, I'll try to get something done first (unless somebody else wants to) and discussion of its exact location in the stdlib can be bikeshed in -dev if and when that point/paint is reached. From python at mrabarnett.plus.com Sun Jun 10 20:12:55 2012 From: python at mrabarnett.plus.com (MRAB) Date: Sun, 10 Jun 2012 19:12:55 +0100 Subject: [Python-ideas] Replacing the standard IO streams (was Re: changing sys.stdout encoding) In-Reply-To: <87fwa386hh.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4FD3ABCB.5080800@gmail.com> <87fwa386hh.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4FD4E3A7.6010506@mrabarnett.plus.com> On 10/06/2012 17:41, Stephen J. Turnbull wrote: > ^^^^^^^[[[[[[[[[@[@Nick Coghlan writes: > > On Mon, Jun 11, 2012 at 12:34 AM, Serhiy Storchaka wrote: > > > > open(sys.stdin.fileno()) is not guaranteed to work if the input or output > > > have occurred before. > [...] > > > Right, but the point of this discussion is to document the cleanest > > available way for an application to change these settings at > > *application start* (e.g. to support an "--encoding" parameter). Yes, > > there are potential issues if you use any of these mechanisms while > > there is data in the buffers, > > +1 > > The OP's problem is a real one. His use case (the "--encoding" > parameter) seems to be the most likely one in production use, so the > loss of buffered data issue should rarely come up. Changing encodings > on the fly offers plenty of ways to lose data besides incomplete > buffers, anyway. > > I am a little concerned with MRAB's report that > > import sys > print("hello") > sys.stdout.flush() > sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8') > print("hello") > > doesn't work as expected, though. (It does work for me on Mac OS X, > both as above -- of course there are no '\r's in the output -- and > with 'print("hello", end="\r\n")'.) > That's actually Python 3.1. From Python 3.2 it's slightly different, but still not quite right: Python 3.1: "hello\r\nhello\r\r\n" Python 3.2: "hello\nhello\r\n" Python 3.3.0a4: "hello\nhello\r\n" All on Windows. From simon.sapin at kozea.fr Sun Jun 10 20:24:43 2012 From: simon.sapin at kozea.fr (Simon Sapin) Date: Sun, 10 Jun 2012 20:24:43 +0200 Subject: [Python-ideas] Add adaptive-load salt-mandatory hashing functions? In-Reply-To: <8C7C9869-CB87-4E98-AD4E-AACD83375676@masklinn.net> References: <43D1A0EB-A850-4F67-9B6F-D852DE23B0F8@masklinn.net> <4FD4AC72.3070206@kozea.fr> <5E974769-E9F0-4F0D-BE45-CC2BD55E8902@masklinn.net> <4FD4E1A1.6030409@kozea.fr> <8C7C9869-CB87-4E98-AD4E-AACD83375676@masklinn.net> Message-ID: <4FD4E66B.7080802@kozea.fr> Le 10/06/2012 20:11, Masklinn a ?crit : > [...] when there's no code yet > I'll try to get something done first There is code, with tests. Here is the link I posted earlier in this thread: https://github.com/mitsuhiko/python-pbkdf2/blob/master/pbkdf2.py -- Simon Sapin From p.f.moore at gmail.com Sun Jun 10 20:34:04 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 10 Jun 2012 19:34:04 +0100 Subject: [Python-ideas] Replacing the standard IO streams (was Re: changing sys.stdout encoding) In-Reply-To: <4FD4E3A7.6010506@mrabarnett.plus.com> References: <4FD3ABCB.5080800@gmail.com> <87fwa386hh.fsf@uwakimon.sk.tsukuba.ac.jp> <4FD4E3A7.6010506@mrabarnett.plus.com> Message-ID: On 10 June 2012 19:12, MRAB wrote: > On 10/06/2012 17:41, Stephen J. Turnbull wrote: >> I am a little concerned with MRAB's report that >> >> ? ? import sys >> ? ? print("hello") >> ? ? sys.stdout.flush() >> ? ? sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8') >> ? ? print("hello") >> >> doesn't work as expected, though. ?(It does work for me on Mac OS X, >> both as above -- of course there are no '\r's in the output -- and >> with 'print("hello", end="\r\n")'.) >> > That's actually Python 3.1. From Python 3.2 it's slightly different, > but still not quite right: > > Python 3.1: ? ? "hello\r\nhello\r\r\n" > Python 3.2: ? ? "hello\nhello\r\n" > Python 3.3.0a4: "hello\nhello\r\n" > > All on Windows. Not here (Win 7 32-bit): PS D:\Data> type t.py import sys print("Hello!") sys.stdout.flush() sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8') print("Hello!") PS D:\Data> py -3.2 t.py | od -c 0000000 H e l l o ! \r \n H e l l o ! \r \n 0000020 Paul. From masklinn at masklinn.net Sun Jun 10 20:35:35 2012 From: masklinn at masklinn.net (Masklinn) Date: Sun, 10 Jun 2012 20:35:35 +0200 Subject: [Python-ideas] Add adaptive-load salt-mandatory hashing functions? In-Reply-To: <4FD4E66B.7080802@kozea.fr> References: <43D1A0EB-A850-4F67-9B6F-D852DE23B0F8@masklinn.net> <4FD4AC72.3070206@kozea.fr> <5E974769-E9F0-4F0D-BE45-CC2BD55E8902@masklinn.net> <4FD4E1A1.6030409@kozea.fr> <8C7C9869-CB87-4E98-AD4E-AACD83375676@masklinn.net> <4FD4E66B.7080802@kozea.fr> Message-ID: <25BE8DE9-B5DF-48B9-B4DF-E6C82033C3E1@masklinn.net> On 2012-06-10, at 20:24 , Simon Sapin wrote: > Le 10/06/2012 20:11, Masklinn a ?crit : >> [...] when there's no code yet >> I'll try to get something done first > > There is code, with tests. Here is the link I posted earlier in this thread: > > https://github.com/mitsuhiko/python-pbkdf2/blob/master/pbkdf2.py Yes, I've seen it, but 1. I'll need to talk to Armin about using that code (which is why I CC'd him to the list when I responded to Nick's response to your comment), or have him do it, I don't think anybody is going to take his code without even asking for consent and try to push it into the stdlib 2. The interface is simple, but painful. Just look at the comment at the top: 3. Store ``algorithm$salt:costfactor$hash`` in the database so that you can upgrade later easily to a different algorithm if you need one. For instance ``PBKDF2-256$thesalt:10000$deadbeef...``. if we know what's supposed to be done, how about just doing it and returning *that*? If it goes into the stdlib, I'd like to have something non-cryptographers can use easily, correctly and without making mistakes. Then there's the issue of implementing the equality test, extracting stuff from that storage string on subsequent auths to test for matches. It should be possible to do all that in a single user-facing operations, no munging about in user's code. 3. The test suite needs to be converted to the stdlib's format 4. The documentation needs to be written From p.f.moore at gmail.com Sun Jun 10 20:36:03 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 10 Jun 2012 19:36:03 +0100 Subject: [Python-ideas] Add adaptive-load salt-mandatory hashing functions? In-Reply-To: <4FD4E66B.7080802@kozea.fr> References: <43D1A0EB-A850-4F67-9B6F-D852DE23B0F8@masklinn.net> <4FD4AC72.3070206@kozea.fr> <5E974769-E9F0-4F0D-BE45-CC2BD55E8902@masklinn.net> <4FD4E1A1.6030409@kozea.fr> <8C7C9869-CB87-4E98-AD4E-AACD83375676@masklinn.net> <4FD4E66B.7080802@kozea.fr> Message-ID: On 10 June 2012 19:24, Simon Sapin wrote: > Le 10/06/2012 20:11, Masklinn a ?crit : >> >> [...] when there's no code yet >> >> I'll try to get something done first > > > There is code, with tests. Here is the link I posted earlier in this thread: > > https://github.com/mitsuhiko/python-pbkdf2/blob/master/pbkdf2.py To use that would need Armin's approval and support. So far he's not commented here. Paul. From python at mrabarnett.plus.com Sun Jun 10 21:01:21 2012 From: python at mrabarnett.plus.com (MRAB) Date: Sun, 10 Jun 2012 20:01:21 +0100 Subject: [Python-ideas] Replacing the standard IO streams (was Re: changing sys.stdout encoding) In-Reply-To: References: <4FD3ABCB.5080800@gmail.com> <87fwa386hh.fsf@uwakimon.sk.tsukuba.ac.jp> <4FD4E3A7.6010506@mrabarnett.plus.com> Message-ID: <4FD4EF01.1070603@mrabarnett.plus.com> On 10/06/2012 19:34, Paul Moore wrote: > On 10 June 2012 19:12, MRAB wrote: >> On 10/06/2012 17:41, Stephen J. Turnbull wrote: >>> I am a little concerned with MRAB's report that >>> >>> import sys >>> print("hello") >>> sys.stdout.flush() >>> sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8') >>> print("hello") >>> >>> doesn't work as expected, though. (It does work for me on Mac OS X, >>> both as above -- of course there are no '\r's in the output -- and >>> with 'print("hello", end="\r\n")'.) >>> >> That's actually Python 3.1. From Python 3.2 it's slightly different, >> but still not quite right: >> >> Python 3.1: "hello\r\nhello\r\r\n" >> Python 3.2: "hello\nhello\r\n" >> Python 3.3.0a4: "hello\nhello\r\n" >> >> All on Windows. > > Not here (Win 7 32-bit): > > PS D:\Data> type t.py > import sys > print("Hello!") > sys.stdout.flush() > > sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8') > print("Hello!") > PS D:\Data> py -3.2 t.py | od -c > 0000000 H e l l o ! \r \n H e l l o ! \r \n > 0000020 > I'm using Windows XP Pro (32-bit), initially sys.stdout.encoding == "cp1252". From p.f.moore at gmail.com Sun Jun 10 22:07:00 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 10 Jun 2012 21:07:00 +0100 Subject: [Python-ideas] Replacing the standard IO streams (was Re: changing sys.stdout encoding) In-Reply-To: <4FD4EF01.1070603@mrabarnett.plus.com> References: <4FD3ABCB.5080800@gmail.com> <87fwa386hh.fsf@uwakimon.sk.tsukuba.ac.jp> <4FD4E3A7.6010506@mrabarnett.plus.com> <4FD4EF01.1070603@mrabarnett.plus.com> Message-ID: On 10 June 2012 20:01, MRAB wrote: > On 10/06/2012 19:34, Paul Moore wrote: >> >> On 10 June 2012 19:12, MRAB ?wrote: >>> >>> ?On 10/06/2012 17:41, Stephen J. Turnbull wrote: >>>> >>>> ?I am a little concerned with MRAB's report that >>>> >>>> ? ? ?import sys >>>> ? ? ?print("hello") >>>> ? ? ?sys.stdout.flush() >>>> ? ? ?sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8') >>>> ? ? ?print("hello") >>>> >>>> ?doesn't work as expected, though. ?(It does work for me on Mac OS X, >>>> ?both as above -- of course there are no '\r's in the output -- and >>>> ?with 'print("hello", end="\r\n")'.) >>>> >>> ?That's actually Python 3.1. From Python 3.2 it's slightly different, >>> ?but still not quite right: >>> >>> ?Python 3.1: ? ? "hello\r\nhello\r\r\n" >>> ?Python 3.2: ? ? "hello\nhello\r\n" >>> ?Python 3.3.0a4: "hello\nhello\r\n" >>> >>> ?All on Windows. >> >> >> Not here (Win 7 32-bit): >> >> PS D:\Data> ?type t.py >> import sys >> print("Hello!") >> sys.stdout.flush() >> >> sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8') >> print("Hello!") >> PS D:\Data> ?py -3.2 t.py | od -c >> 0000000 ? H ? e ? l ? l ? o ? ! ?\r ?\n ? H ? e ? l ? l ? o ? ! ?\r ?\n >> 0000020 >> > I'm using Windows XP Pro (32-bit), initially sys.stdout.encoding == > "cp1252". PS D:\Data> py -3 -c "import sys; print(sys.stdout.encoding)" cp850 This is at the console (Powershell) - are you running from within something like idle, or a GUI environment? Paul. From jeanpierreda at gmail.com Sun Jun 10 22:16:41 2012 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Sun, 10 Jun 2012 16:16:41 -0400 Subject: [Python-ideas] Add adaptive-load salt-mandatory hashing functions? In-Reply-To: References: <43D1A0EB-A850-4F67-9B6F-D852DE23B0F8@masklinn.net> <4FD4AC72.3070206@kozea.fr> <5E974769-E9F0-4F0D-BE45-CC2BD55E8902@masklinn.net> <4FD4E1A1.6030409@kozea.fr> <8C7C9869-CB87-4E98-AD4E-AACD83375676@masklinn.net> <4FD4E66B.7080802@kozea.fr> Message-ID: On Sun, Jun 10, 2012 at 2:36 PM, Paul Moore wrote: > To use that would need Armin's approval and support. So far he's not > commented here. Only if you want a different license than 3-clause BSD. P.S. I love this thread. Great suggestion. :) -- Devin From python at mrabarnett.plus.com Sun Jun 10 22:28:14 2012 From: python at mrabarnett.plus.com (MRAB) Date: Sun, 10 Jun 2012 21:28:14 +0100 Subject: [Python-ideas] Replacing the standard IO streams (was Re: changing sys.stdout encoding) In-Reply-To: References: <4FD3ABCB.5080800@gmail.com> <87fwa386hh.fsf@uwakimon.sk.tsukuba.ac.jp> <4FD4E3A7.6010506@mrabarnett.plus.com> <4FD4EF01.1070603@mrabarnett.plus.com> Message-ID: <4FD5035E.7080106@mrabarnett.plus.com> On 10/06/2012 21:07, Paul Moore wrote: > On 10 June 2012 20:01, MRAB wrote: >> On 10/06/2012 19:34, Paul Moore wrote: >>> >>> On 10 June 2012 19:12, MRAB wrote: >>>> >>>> On 10/06/2012 17:41, Stephen J. Turnbull wrote: >>>>> >>>>> I am a little concerned with MRAB's report that >>>>> >>>>> import sys >>>>> print("hello") >>>>> sys.stdout.flush() >>>>> sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8') >>>>> print("hello") >>>>> >>>>> doesn't work as expected, though. (It does work for me on Mac OS X, >>>>> both as above -- of course there are no '\r's in the output -- and >>>>> with 'print("hello", end="\r\n")'.) >>>>> >>>> That's actually Python 3.1. From Python 3.2 it's slightly different, >>>> but still not quite right: >>>> >>>> Python 3.1: "hello\r\nhello\r\r\n" >>>> Python 3.2: "hello\nhello\r\n" >>>> Python 3.3.0a4: "hello\nhello\r\n" >>>> >>>> All on Windows. >>> >>> >>> Not here (Win 7 32-bit): >>> >>> PS D:\Data> type t.py >>> import sys >>> print("Hello!") >>> sys.stdout.flush() >>> >>> sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8') >>> print("Hello!") >>> PS D:\Data> py -3.2 t.py | od -c >>> 0000000 H e l l o ! \r \n H e l l o ! \r \n >>> 0000020 >>> >> I'm using Windows XP Pro (32-bit), initially sys.stdout.encoding == >> "cp1252". > > PS D:\Data> py -3 -c "import sys; print(sys.stdout.encoding)" > cp850 > > This is at the console (Powershell) - are you running from within > something like idle, or a GUI environment? > It's at the system command prompt. When I redirect the script's stdout to a file (on the command line using ">output.txt") I get those 15 bytes from Python 3.2. Your output appears to be 32 bytes (the second line starts with "0000020"). From p.f.moore at gmail.com Sun Jun 10 22:38:14 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 10 Jun 2012 21:38:14 +0100 Subject: [Python-ideas] Replacing the standard IO streams (was Re: changing sys.stdout encoding) In-Reply-To: <4FD5035E.7080106@mrabarnett.plus.com> References: <4FD3ABCB.5080800@gmail.com> <87fwa386hh.fsf@uwakimon.sk.tsukuba.ac.jp> <4FD4E3A7.6010506@mrabarnett.plus.com> <4FD4EF01.1070603@mrabarnett.plus.com> <4FD5035E.7080106@mrabarnett.plus.com> Message-ID: On 10 June 2012 21:28, MRAB wrote: > On 10/06/2012 21:07, Paul Moore wrote: >> >> On 10 June 2012 20:01, MRAB ?wrote: >>> >>> ?On 10/06/2012 19:34, Paul Moore wrote: >>>> >>>> >>>> ?On 10 June 2012 19:12, MRAB ? ?wrote: >>>>> >>>>> >>>>> ? On 10/06/2012 17:41, Stephen J. Turnbull wrote: >>>>>> >>>>>> >>>>>> ? I am a little concerned with MRAB's report that >>>>>> >>>>>> ? ? ? import sys >>>>>> ? ? ? print("hello") >>>>>> ? ? ? sys.stdout.flush() >>>>>> ? ? ? sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8') >>>>>> ? ? ? print("hello") >>>>>> >>>>>> ? doesn't work as expected, though. ?(It does work for me on Mac OS X, >>>>>> ? both as above -- of course there are no '\r's in the output -- and >>>>>> ? with 'print("hello", end="\r\n")'.) >>>>>> >>>>> ? That's actually Python 3.1. From Python 3.2 it's slightly different, >>>>> ? but still not quite right: >>>>> >>>>> ? Python 3.1: ? ? "hello\r\nhello\r\r\n" >>>>> ? Python 3.2: ? ? "hello\nhello\r\n" >>>>> ? Python 3.3.0a4: "hello\nhello\r\n" >>>>> >>>>> ? All on Windows. >>>> >>>> >>>> >>>> ?Not here (Win 7 32-bit): >>>> >>>> ?PS D:\Data> ? ?type t.py >>>> ?import sys >>>> ?print("Hello!") >>>> ?sys.stdout.flush() >>>> >>>> ?sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8') >>>> ?print("Hello!") >>>> ?PS D:\Data> ? ?py -3.2 t.py | od -c >>>> ?0000000 ? H ? e ? l ? l ? o ? ! ?\r ?\n ? H ? e ? l ? l ? o ? ! ?\r ?\n >>>> ?0000020 >>>> >>> ?I'm using Windows XP Pro (32-bit), initially sys.stdout.encoding == >>> ?"cp1252". >> >> >> PS D:\Data> ?py -3 -c "import sys; print(sys.stdout.encoding)" >> cp850 >> >> This is at the console (Powershell) - are you running from within >> something like idle, or a GUI environment? >> > It's at the system command prompt. When I redirect the script's stdout to a > file > (on the command line using ">output.txt") I get those 15 bytes from Python > 3.2. > > Your output appears to be 32 bytes (the second line starts with > "0000020"). Well spotted - PowerShell does funny things with Unicode in pipes, I'd forgotten. Indeed, I get the same output as you from cmd. Odd. Paul From ben+python at benfinney.id.au Mon Jun 11 00:12:29 2012 From: ben+python at benfinney.id.au (Ben Finney) Date: Mon, 11 Jun 2012 08:12:29 +1000 Subject: [Python-ideas] BindError as a built-in TypeError subclass (on the margin of PEP 362 discussion) References: <20120609200703.GF2587@chopin.edu.pl> <4FD3EFB2.1010800@pearwood.info> <4FD48C5E.6000102@pearwood.info> Message-ID: <87oboqzuia.fsf@benfinney.id.au> Nick Coghlan writes: > On Sun, Jun 10, 2012 at 10:00 PM, Steven D'Aprano wrote: > > My concept is that errors due to the wrong argument count, duplicate > > or missing keyword arguments, etc. which currently raise TypeError > > could raise ArgumentError, a subclass, instead. > > > > That will make distinguishing between "passed the wrong number of > > arguments" from "passed the wrong type of argument" easier. > > This is actually why I prefer "BindError" to the name "ArgumentError". > > The former is explicit about what has gone wrong: the supplied > arguments could not be bound to the parameters expected by the > supplied callable. ?ArgumentBindError?, then? -- \ ?Airports are ugly. Some are very ugly. Some attain a degree of | `\ ugliness that can only be the result of a special effort.? | _o__) ?Douglas Adams, _The Long Dark Tea-Time Of The Soul_ | Ben Finney From acarter at cs.hmc.edu Mon Jun 11 00:42:53 2012 From: acarter at cs.hmc.edu (Andrew Carter) Date: Sun, 10 Jun 2012 15:42:53 -0700 Subject: [Python-ideas] Saving state in list/generator comprehension Message-ID: Forgive me for any problems in this e-mail as I'm new to this mailing list. I thought it might be nice to be able to somehow save a state in list/generator comprehensions, a side effect of this (although not the intended goal) is it would make reduce feasible in a clean manner as the final result would just be the state. One mechanism I can think of is to overload the with/as keyword for use inside of list/generator comprehensions, and using the previous result as state I believe the change to the grammar in python3k would be comp_iter : comp_for | comp_if | comp_with comp_with: 'with' testlist 'as' testlist So something in the form of [expr for i in iterable with initializer as accumulator] would resolve to something like result = [] accumulator = initializer for i in iterable: accumulator = expr result.append(accumulator) return result For instance reduce could be defined as (assuming all 3 arguments are required) reduce = lambda function, iterable, initializer : ([initializer] + [function(accumulator, i) for i in iterable with initializer as accumulator])[-1] Breaking this down, the "with initializer as accumulator" statement means that when the list comprehension begins accumulator=initializer, then after each iteration, accumulator = function(accumulator, i), so with the function f, list [i1,i2,i3,...], and initial value i0, the resulting list of "[function(accumulator, i) for i in iterable with initializer as accumulator]" would be [f(i0,i1), f(f(i0,i1),i2), f(f(f(i0,i1),i2),i3),...], or in left associative infix form with the f = "+" operator, [i0+i1,i0+i1+i2,i0+i1+i2+i3,...]. Consing (effectively) initializer to the beginning of the list ensures clean behavior for empty lists, and indexing [-1] gets the last element which is really the only element that matters. Consider a slightly more complex example of a Fibonacci generator, one might define it as follows, def fibs(): a, b = 1, 0 while True: a, b = b, a + b yield b Using the with statement, it would require two generator comprehensions fibs = lambda : (b for a,b in (b, a+b for i in itertools.cycle((None,)) with a,b = 0,1)) The inner generator comprehension (b, a+b for i in itertools.repeat(None) with a,b = 0,1) creates an infinite generator of tuples which are consecutive Fibonacci numbers, the outer list comprehension strips off the unneeded "state". Some of the pros of doing it this way is that because with/as are already keywords in python backwards compatibility shouldn't be an issue, but if one is just mapping with state then an extra list/generator comprehension block is needed to strip the state from the intermediate list. I apologize if similar ideas have already been discussed. -Andrew Carter p.s. Is there a built-in way to get the last element from a generator (perhaps even with a default) a quick google search did not reveal one? -------------- next part -------------- An HTML attachment was scrubbed... URL: From zuo at chopin.edu.pl Mon Jun 11 01:16:29 2012 From: zuo at chopin.edu.pl (Jan Kaliszewski) Date: Mon, 11 Jun 2012 01:16:29 +0200 Subject: [Python-ideas] Weak-referencing/weak-proxying of (bound) methods Message-ID: <20120610231629.GA1792@chopin.edu.pl> Hello, Today, I encountered a surprising bug in my code which creates some weakref.proxies to instance methods... The actual Python behaviour related to the issue can be ilustrated with the following example: >>> import weakref >>> class A: ... def method(self): print(self) ... >>> A.method >>> a = A() >>> a.method > >>> r = weakref.ref(a.method) # creating a weak reference >>> r # ...but it appears to be dead >>> w = weakref.proxy(a.method) # the same with a weak proxy >>> w >>> w() Traceback (most recent call last): File "", line 1, in ReferenceError: weakly-referenced object no longer exists This behaviour is perfectly correct -- but still surprising, especially for people who know little about method creation machinery, descriptors etc. I think it would be nice to make this 'trap' less painful -- for example, by doing one or both of the following: 1. Describe and explain this behaviour in the weakref module documentation. 2. Provide (in functools?) a type-and-decorator that do the same what func_descr_get() does (transforms a function into a method) *plus* caches the created method (e.g. at the instance object). A prototype implementation: class InstanceCachedMethod(object): def __init__(self, func): self.func = func (self.instance_attr_name ) = '__{0}_method_ref'.format(func.__name__) def __get__(self, instance, owner): if instance is None: return self.func try: return getattr(instance, self.instance_attr_name) except AttributeError: method = types.MethodType(self.func, instance) setattr(instance, self.instance_attr_name, method) return method A simplified version that reuses the func.__name__ (works well as long as func.__name__ is the actual instance attribute name...): class InstanceCachedMethod(object): def __init__(self, func): self.func = func def __get__(self, instance, owner): if instance is None: return self.func method = types.MethodType(self.func, instance) setattr(instance, self.func.__name__, method) return method Both versions work well with weakref.proxy()/ref() objects: >>> class B: ... @InstanceCachedMethod ... def method(self): print(self) ... >>> B.method >>> b = B() >>> b.method > >>> r = weakref.ref(b.method) >>> r >>> w = weakref.proxy(b.method) >>> w >>> w() <__main__.B object at 0xb7206ccc> What do you think about it? Cheers. *j From pyideas at rebertia.com Mon Jun 11 03:12:36 2012 From: pyideas at rebertia.com (Chris Rebert) Date: Sun, 10 Jun 2012 18:12:36 -0700 Subject: [Python-ideas] BindError as a built-in TypeError subclass (on the margin of PEP 362 discussion) In-Reply-To: <4FD48C5E.6000102@pearwood.info> References: <20120609200703.GF2587@chopin.edu.pl> <4FD3EFB2.1010800@pearwood.info> <4FD48C5E.6000102@pearwood.info> Message-ID: On Sun, Jun 10, 2012 at 5:00 AM, Steven D'Aprano wrote: > Chris Rebert wrote: > >>> Since this will be an error that beginners see (frequently), I suggest >>> ArgumentError is more friendly than BindError. >> >> I note that Ruby also has an ArgumentError, which it raises both for >> calls with an incorrect number of arguments and in cases when Python >> would raise ValueError. > > Even if I wanted to replace ValueError with ArgumentError (and I don't), we > couldn't due to backward compatibility. > > (Although I suppose ArgumentError could inherit from both TypeError and > ValueError.) > > My concept is that errors due to the wrong argument count, duplicate or > missing keyword arguments, etc. which currently raise TypeError could raise > ArgumentError, a subclass, instead. > > That will make distinguishing between "passed the wrong number of arguments" > from "passed the wrong type of argument" easier. You seem to have misinterpreted the intent behind my post. I'm in no way arguing that ValueError and "ArgumentBindingError" should be conflated. I'm pointing out that another very similar language (Ruby) has an error of the same name with a very similar purpose (which it also distinguishes from its TypeError), thus providing further validation of the use case for the proposed ArgumentBindingError. Cheers, Chris From steve at pearwood.info Mon Jun 11 04:01:38 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 11 Jun 2012 12:01:38 +1000 Subject: [Python-ideas] Saving state in list/generator comprehension In-Reply-To: References: Message-ID: <4FD55182.4040808@pearwood.info> Andrew Carter wrote: > Forgive me for any problems in this e-mail as I'm new to this mailing list. > > I thought it might be nice to be able to somehow save a state in > list/generator comprehensions, > a side effect of this (although not the intended goal) is it would make > reduce feasible in a clean manner as the final result would just be the > state. reduce already exists; in Python 2, it is a built-in available at all times, in Python 3 it has been banished to the functools module. What is your use-case for this? "Saving state" is a means to an end. The beauty of list comprehensions and generator expressions is that they are intentionally quite simple and limited. If you need something more complex, write a function or generator. Not everything has to be a (very-long and unreadable) one-linear. reduce already exists, but if it didn't, you could write it quite easily. Here's a version with optional starting value which yields the intermediate results: import itertools _MISSING = object() # sentinel value def foldl(func, iterable, start=_MISSING): # foldr is left as an exercise :-) if start is _MISSING: it = iter(iterable) else: it = itertools.chain([start], iterable) a = next(it) # raises if iterable is empty and start not given try: b = next(it) except StopIteration: yield a return a = func(a, b) yield a for b in it: a = func(a, b) yield a Modifying this to return just the last value is easy, and in fact is simpler than the above: def foldl(func, iterable, start=_MISSING): if start is _MISSING: it = iter(iterable) else: it = itertools.chain([start], iterable) a = next(it) for b in it: a = func(a, b) return a [...] > Some of the pros of doing it this way is that because with/as are already > keywords in python backwards compatibility shouldn't be an issue, That's not an argument in favour of your request. That's merely the lack of one specific argument against it. There are an infinite number of things which could be done that won't break backwards compatibility, but that doesn't mean we should do them all. What positive arguments in favour of your proposal do you have? What does your proposal allow us to do that we can't already do, or at least do better? > p.s. Is there a built-in way to get the last element from a generator > (perhaps even with a default) a quick google search did not reveal one? The same as you would get the last element from any iterator, not just generators: iterate over it as quickly as possible, keeping only the last value seen. Because generator values are generated lazily as needed, there's no direct way to skip to the last value, or get random access to them. In pure Python: for x in iterator: pass This may be faster: collections.deque(iterator, maxlen=1)[0] Of course, both examples assume that the iterator or generator yields at least one value, and is not infinite. -- Steven From acarter at cs.hmc.edu Mon Jun 11 05:09:12 2012 From: acarter at cs.hmc.edu (Andrew Carter) Date: Sun, 10 Jun 2012 20:09:12 -0700 Subject: [Python-ideas] Saving state in list/generator comprehension In-Reply-To: References: <4FD55182.4040808@pearwood.info> Message-ID: On Sun, Jun 10, 2012 at 7:01 PM, Steven D'Aprano wrote: > Andrew Carter wrote: > >> Forgive me for any problems in this e-mail as I'm new to this mailing >> list. >> >> I thought it might be nice to be able to somehow save a state in >> list/generator comprehensions, >> a side effect of this (although not the intended goal) is it would make >> reduce feasible in a clean manner as the final result would just be the >> state. >> > > reduce already exists; in Python 2, it is a built-in available at all > times, in Python 3 it has been banished to the functools module. > > I think that it is was banished for good reason. I have found myself sometimes writing code that needs a reduce, but I feel that using the reduce function provided isn't very clear, especially if you see it and aren't familiar with the function. Admittedly writing a function that is a few short lines is possible, and what I end up doing, it just seems like there should be a more elegant way to do it than having a bunch of specialized functions. What is your use-case for this? "Saving state" is a means to an end. The > beauty of list comprehensions and generator expressions is that they are > intentionally quite simple and limited. If you need something more complex, > write a function or generator. Not everything has to be a (very-long and > unreadable) one-linear. > > reduce already exists, but if it didn't, you could write it quite easily. > Here's a version with optional starting value which yields the intermediate > results: > As I have mentioned above occasionally I want to turn a list into a single value by some repeated operation, but actually I think its more common that I want to map some operation over a list with dependencies of previous operation passed through. I think my most common use case, is I have a function that operates on a single value, and also has some state. Unfortunately leaving my example purposely vague, I was iterating over a list, and had what was effectively an environment variable (more of state) initially as a dynamic environment, so it was updated each time the function was called across the list comprehension. I then for other reasons wanted to use the environment type as a key for dictionary (which is a problem if its mutable), but that meant that the original list comprehension (which I felt was rather simple). Admittedly it didn't take me any time at all to write a simple function that did the list comprehension, but it still felt like a simple enough problem that it could be elegantly solved without resorting to the helper function. > import itertools > _MISSING = object() # sentinel value > > def foldl(func, iterable, start=_MISSING): > # foldr is left as an exercise :-) > if start is _MISSING: > it = iter(iterable) > else: > it = itertools.chain([start], iterable) > a = next(it) # raises if iterable is empty and start not given > try: > b = next(it) > except StopIteration: > yield a > return > a = func(a, b) > yield a > for b in it: > a = func(a, b) > yield a > > > Modifying this to return just the last value is easy, and in fact is > simpler than the above: > > def foldl(func, iterable, start=_MISSING): > if start is _MISSING: > it = iter(iterable) > else: > it = itertools.chain([start], iterable) > a = next(it) > for b in it: > a = func(a, b) > return a > > > > [...] > > Some of the pros of doing it this way is that because with/as are already >> keywords in python backwards compatibility shouldn't be an issue, >> > > That's not an argument in favour of your request. That's merely the lack > of one specific argument against it. There are an infinite number of things > which could be done that won't break backwards compatibility, but that > doesn't mean we should do them all. What positive arguments in favour of > your proposal do you have? What does your proposal allow us to do that we > can't already do, or at least do better? > > I feel like there is a need from personal experience of mapping with state in a short concise way. However it is quite possible that it is just me, and I need to think about the problem differently, or perhaps live with 4 line functions that are only used once. As for the backwards compatibility I think was getting ahead of myself, I feel the with/as solution is quite clunky, but I couldn't come up with a more elegant solution that operated in a similar vein to how python feels as a language. > > p.s. Is there a built-in way to get the last element from a generator >> (perhaps even with a default) a quick google search did not reveal one? >> > > The same as you would get the last element from any iterator, not just > generators: iterate over it as quickly as possible, keeping only the last > value seen. Because generator values are generated lazily as needed, > there's no direct way to skip to the last value, or get random access to > them. > > In pure Python: > > for x in iterator: > pass > > This may be faster: > > collections.deque(iterator, maxlen=1)[0] > > That's a neat solution, a little bit confusing at first glance, but still very neat, thanks! > Of course, both examples assume that the iterator or generator yields at > least one value, and is not infinite. > > > > -- > Steven > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Jun 11 08:09:11 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 11 Jun 2012 16:09:11 +1000 Subject: [Python-ideas] Add adaptive-load salt-mandatory hashing functions? In-Reply-To: <25BE8DE9-B5DF-48B9-B4DF-E6C82033C3E1@masklinn.net> References: <43D1A0EB-A850-4F67-9B6F-D852DE23B0F8@masklinn.net> <4FD4AC72.3070206@kozea.fr> <5E974769-E9F0-4F0D-BE45-CC2BD55E8902@masklinn.net> <4FD4E1A1.6030409@kozea.fr> <8C7C9869-CB87-4E98-AD4E-AACD83375676@masklinn.net> <4FD4E66B.7080802@kozea.fr> <25BE8DE9-B5DF-48B9-B4DF-E6C82033C3E1@masklinn.net> Message-ID: On Mon, Jun 11, 2012 at 4:35 AM, Masklinn wrote: > On 2012-06-10, at 20:24 , Simon Sapin wrote: > >> Le 10/06/2012 20:11, Masklinn a ?crit : >>> [...] when there's no code yet >>> I'll try to get something done first >> >> There is code, with tests. Here is the link I posted earlier in this thread: >> >> https://github.com/mitsuhiko/python-pbkdf2/blob/master/pbkdf2.py > > Yes, I've seen it, but > > 1. I'll need to talk to Armin about using that code (which is why I CC'd > ? him to the list when I responded to Nick's response to your comment), > ? or have him do it, I don't think anybody is going to take his code > ? without even asking for consent and try to push it into the stdlib > > 2. The interface is simple, but painful. Just look at the comment at the top: > > ? ? ? ?3. ?Store ``algorithm$salt:costfactor$hash`` in the database so that > ? ? ? ?you can upgrade later easily to a different algorithm if you need > ? ? ? ?one. ?For instance ``PBKDF2-256$thesalt:10000$deadbeef...``. > > ? if we know what's supposed to be done, how about just doing it and > ? returning *that*? If it goes into the stdlib, I'd like to have > ? something non-cryptographers can use easily, correctly and without > ? making mistakes. Then there's the issue of implementing the equality > ? test, extracting stuff from that storage string on subsequent auths to > ? test for matches. It should be possible to do all that in a single > ? user-facing operations, no munging about in user's code. > > 3. The test suite needs to be converted to the stdlib's format > > 4. The documentation needs to be written Right. Given the time frames involved, it's probably best to target this at 3.4 as a simple way to do rainbow-table-and-brute-force-resistant password hashing and comparisons, defaulting to PBKDF2, but accepting alternative key derivation functions so people can plug in bcrypt, scrypt, etc (similar to the way hmac defaults to md5, but lets you specify any hash function with the appropriate API). I think Armin's already created a good foundation for that, but there'll be quite a bit of work in getting a PEP written, etc. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Mon Jun 11 08:12:45 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 11 Jun 2012 16:12:45 +1000 Subject: [Python-ideas] Replacing the standard IO streams (was Re: changing sys.stdout encoding) In-Reply-To: References: <4FD3ABCB.5080800@gmail.com> Message-ID: On Mon, Jun 11, 2012 at 2:43 AM, Serhiy Storchaka wrote: > On 10.06.12 18:44, Nick Coghlan wrote: >> >> This approach also has the advantage of leaving >> sys.__std(in/out/err)__ in a somewhat usable state. > > > And then sys.std* and sys.__std*__ have their own inconsistent buffers. Correct, but using detach() leaves sys.__std*__ completely broken (either throwing exceptions or silently failing to emit output). Creating two independent streams that share the underlying file handle is much closer to the 2.x behaviour when replacing sys.std*. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From stephen at xemacs.org Mon Jun 11 08:16:07 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 11 Jun 2012 15:16:07 +0900 Subject: [Python-ideas] Replacing the standard IO streams (was Re: changing sys.stdout encoding) In-Reply-To: <4FD4E3A7.6010506@mrabarnett.plus.com> References: <4FD3ABCB.5080800@gmail.com> <87fwa386hh.fsf@uwakimon.sk.tsukuba.ac.jp> <4FD4E3A7.6010506@mrabarnett.plus.com> Message-ID: <87ehpm8jbs.fsf@uwakimon.sk.tsukuba.ac.jp> MRAB writes: > That's actually Python 3.1. From Python 3.2 it's slightly different, > but still not quite right: > > Python 3.1: "hello\r\nhello\r\r\n" > Python 3.2: "hello\nhello\r\n" > Python 3.3.0a4: "hello\nhello\r\n" > > All on Windows. Hm. Maybe it's that port's implementation of universal newlines or something like that? What happens if you use an explicit "end=" argument? (I don't have a Python 3 to check on Windows easily available.) From ncoghlan at gmail.com Mon Jun 11 08:45:46 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 11 Jun 2012 16:45:46 +1000 Subject: [Python-ideas] Saving state in list/generator comprehension In-Reply-To: References: <4FD55182.4040808@pearwood.info> Message-ID: On Mon, Jun 11, 2012 at 1:09 PM, Andrew Carter wrote: > ?I feel like there is a need from personal experience of mapping with state > in a short concise way. However it is quite possible that it is just me, and > I need to think about the problem differently, or perhaps live with 4 line > functions that are only used once.?As for the backwards compatibility I > think was getting ahead of myself,?I feel the with/as solution is quite > clunky, but I couldn't come up with a more elegant solution that operated in > a similar vein to how python feels as a language. Part of how Python feels as a language is due to the fact that stateful operations cannot, in general, be expressed cleanly as expressions - you have to step up to a multi-statement procedural algorithm if your state can't be expressed cleanly through simple iteration. I and others have put forward various proposals to change this over the years, but it's a complex problem that touches on the heart of the statement/expression dichotomy that Guido deliberately introduced when creating the language. The mechanism I personally consider most promising is one that makes it easier to be explicit that a particular function is only used in the current statement (see PEP 403). It still feels like Python (i.e. no embedded assignments), but also clearly expresses when a function exists solely for code structure purposes, and has nothing to do with splitting out a component that will be used from multiple locations. The current design proposal in PEP 403 is still quite flawed, though, and needs a substantial amount of work to be brought up to a standard where it makes a compelling case for a change to Python. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From p.f.moore at gmail.com Mon Jun 11 10:06:42 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 11 Jun 2012 09:06:42 +0100 Subject: [Python-ideas] Replacing the standard IO streams (was Re: changing sys.stdout encoding) In-Reply-To: <87ehpm8jbs.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4FD3ABCB.5080800@gmail.com> <87fwa386hh.fsf@uwakimon.sk.tsukuba.ac.jp> <4FD4E3A7.6010506@mrabarnett.plus.com> <87ehpm8jbs.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 11 June 2012 07:16, Stephen J. Turnbull wrote: > MRAB writes: > > ?> That's actually Python 3.1. From Python 3.2 it's slightly different, > ?> but still not quite right: > ?> > ?> Python 3.1: ? ? "hello\r\nhello\r\r\n" > ?> Python 3.2: ? ? "hello\nhello\r\n" > ?> Python 3.3.0a4: "hello\nhello\r\n" > ?> > ?> All on Windows. > > > > Hm. ?Maybe it's that port's implementation of universal newlines or > something like that? ?What happens if you use an explicit "end=" > argument? ?(I don't have a Python 3 to check on Windows easily > available.) Explicit end= makes no difference to the behaviour. In fact, a minimal test suggests that universal newline mode is not enabled on Windows in Python 3. That's a regression from 2.x. See below. D:\Data>py -3 -c "print('x')" | od -c 0000000 x \n 0000002 D:\Data>py -2 -c "print('x')" | od -c 0000000 x \r \n 0000003 D:\Data>py -3 -V Python 3.2.2 D:\Data>py -2 -V Python 2.7.2 Paul. From amauryfa at gmail.com Mon Jun 11 10:11:34 2012 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Mon, 11 Jun 2012 10:11:34 +0200 Subject: [Python-ideas] Replacing the standard IO streams (was Re: changing sys.stdout encoding) In-Reply-To: References: <4FD3ABCB.5080800@gmail.com> <87fwa386hh.fsf@uwakimon.sk.tsukuba.ac.jp> <4FD4E3A7.6010506@mrabarnett.plus.com> <87ehpm8jbs.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: 2012/6/11 Paul Moore > Explicit end= makes no difference to the behaviour. In fact, a minimal > test suggests that universal newline mode is not enabled on Windows in > Python 3. That's a regression from 2.x. See below. > > D:\Data>py -3 -c "print('x')" | od -c > 0000000 x \n > 0000002 > > D:\Data>py -2 -c "print('x')" | od -c > 0000000 x \r \n > 0000003 > This is certainly related to http://bugs.python.org/issue11990 -- Amaury Forgeot d'Arc -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at cheimes.de Mon Jun 11 10:42:59 2012 From: lists at cheimes.de (Christian Heimes) Date: Mon, 11 Jun 2012 10:42:59 +0200 Subject: [Python-ideas] Add adaptive-load salt-mandatory hashing functions? In-Reply-To: References: <43D1A0EB-A850-4F67-9B6F-D852DE23B0F8@masklinn.net> <4FD4AC72.3070206@kozea.fr> <5E974769-E9F0-4F0D-BE45-CC2BD55E8902@masklinn.net> <4FD4E1A1.6030409@kozea.fr> <8C7C9869-CB87-4E98-AD4E-AACD83375676@masklinn.net> <4FD4E66B.7080802@kozea.fr> <25BE8DE9-B5DF-48B9-B4DF-E6C82033C3E1@masklinn.net> Message-ID: Am 11.06.2012 08:09, schrieb Nick Coghlan: > Right. Given the time frames involved, it's probably best to target > this at 3.4 as a simple way to do > rainbow-table-and-brute-force-resistant password hashing and > comparisons, defaulting to PBKDF2, but accepting alternative key > derivation functions so people can plug in bcrypt, scrypt, etc > (similar to the way hmac defaults to md5, but lets you specify any > hash function with the appropriate API). > > I think Armin's already created a good foundation for that, but > there'll be quite a bit of work in getting a PEP written, etc. Python already has an excellent library for password hashing: passlib [1]. It's well written and documented, contains more than 30 password hashing algorithms and schemas used by major platforms and applications like Unix, LDAP and databases. The library even contains a policy framework for handling, recognizing and migrating passwords as well as counteractive measures against side channel attacks. IMHO it's not enough to just provide the basic algorithm for PBKDF2 and friends. There is still too much space for error. Passlib hides the complex parts and has a user friendly API, for example http://packages.python.org/passlib/lib/passlib.context-tutorial.html#deprecation-hash-migration . Christian [1] http://packages.python.org/passlib/ From ncoghlan at gmail.com Mon Jun 11 12:03:35 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 11 Jun 2012 20:03:35 +1000 Subject: [Python-ideas] Add adaptive-load salt-mandatory hashing functions? In-Reply-To: References: <43D1A0EB-A850-4F67-9B6F-D852DE23B0F8@masklinn.net> <4FD4AC72.3070206@kozea.fr> <5E974769-E9F0-4F0D-BE45-CC2BD55E8902@masklinn.net> <4FD4E1A1.6030409@kozea.fr> <8C7C9869-CB87-4E98-AD4E-AACD83375676@masklinn.net> <4FD4E66B.7080802@kozea.fr> <25BE8DE9-B5DF-48B9-B4DF-E6C82033C3E1@masklinn.net> Message-ID: On Mon, Jun 11, 2012 at 6:42 PM, Christian Heimes wrote: > Am 11.06.2012 08:09, schrieb Nick Coghlan: >> Right. Given the time frames involved, it's probably best to target >> this at 3.4 as a simple way to do >> rainbow-table-and-brute-force-resistant password hashing and >> comparisons, defaulting to PBKDF2, but accepting alternative key >> derivation functions so people can plug in bcrypt, scrypt, etc >> (similar to the way hmac defaults to md5, but lets you specify any >> hash function with the appropriate API). >> >> I think Armin's already created a good foundation for that, but >> there'll be quite a bit of work in getting a PEP written, etc. > > Python already has an excellent library for password hashing: passlib > [1]. It's well written and documented, contains more than 30 password > hashing algorithms and schemas used by major platforms and applications > like Unix, LDAP and databases. The library even contains a policy > framework for handling, recognizing and migrating passwords as well as > counteractive measures against side channel attacks. > > IMHO it's not enough to just provide the basic algorithm for PBKDF2 and > friends. There is still too much space for error. Passlib hides the > complex parts and has a user friendly API, for example > http://packages.python.org/passlib/lib/passlib.context-tutorial.html#deprecation-hash-migration Thanks for the link Christian, it does appear this particular wheel has already been thoroughly invented. I'll be recommending passlib for use by others in the future and look into adopting it for my own projects. However, password hashing is an important and common enough problem that it would be good to have some basic level of support in the standard library, with a clear migration path to a more feature complete approach like passlib. It would be good if someone was willing to do the work of raising this discussion with the passlib authors, and looking to see if a suitably stable core could be extracted that is API compatible with passlib, and could be proposed as a standard library addition for 3.4. Regards, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From techtonik at gmail.com Mon Jun 11 12:31:50 2012 From: techtonik at gmail.com (anatoly techtonik) Date: Mon, 11 Jun 2012 13:31:50 +0300 Subject: [Python-ideas] Isolated (?transactional) exec (?subinterpreter) calls In-Reply-To: References: Message-ID: On Fri, Jun 8, 2012 at 4:00 PM, Amaury Forgeot d'Arc wrote: > 2012/6/8 anatoly techtonik >> >> Optionally isolated from parent environment: >> ?- a feature to execute user script in a snapshot of current >> environment and have >> ? ?a choice whenever to merge its modifications back to environment or not > > > It would be a really interesting feature, but seems very difficult to > implement. > Do you have the?slightest?idea how this would work? > What about global state, environment variables,?threads, and all kinds of > side-effects? > > Or are you thinking about a solution based on the multiprocessing module? For my original user story both approaches will suffice. I've never used multiprocessing (mostly because 2.6 only compatibility) and it looks like it is capable to do what I want with some tweaks. But first approach with fine-grained environment control (object space, state, memory) will be more beneficial for Python as it can bring a nice research methodology for interpreter improvements (and hopefully some pictures). Two things are required: 1. Execution rollback 2. Scope control Execution rollback (or transaction) can be either "save the state and restore" or "keep track of changes and discard". On a lowest possible level it is something like using memory copy-on-write while Python bytecode modifies it and discarding the copied stuff in the end. Like in OSI model for networking, this low level memory and code abstraction is the 1st layer. But you're absolutely right about global state, environment variables, threads and other stuff - when we jump to a higher layer - rolling back execution pointer to a saved checkpoint and discarding memory will not be enough. We need to ensure that reverted operation did not affect state of the system outside the execution scope. "Scope control" means that every pathway when execution can alter global state outside needs to be carefully recorded and classified. It will then be possible to detect "escaped" transactions automatically and detect if the operation is safe to revert or not. From lists at cheimes.de Mon Jun 11 15:41:18 2012 From: lists at cheimes.de (Christian Heimes) Date: Mon, 11 Jun 2012 15:41:18 +0200 Subject: [Python-ideas] Add adaptive-load salt-mandatory hashing functions? In-Reply-To: References: <43D1A0EB-A850-4F67-9B6F-D852DE23B0F8@masklinn.net> <4FD4AC72.3070206@kozea.fr> <5E974769-E9F0-4F0D-BE45-CC2BD55E8902@masklinn.net> <4FD4E1A1.6030409@kozea.fr> <8C7C9869-CB87-4E98-AD4E-AACD83375676@masklinn.net> <4FD4E66B.7080802@kozea.fr> <25BE8DE9-B5DF-48B9-B4DF-E6C82033C3E1@masklinn.net> Message-ID: <4FD5F57E.2090403@cheimes.de> Am 11.06.2012 12:03, schrieb Nick Coghlan: > Thanks for the link Christian, it does appear this particular wheel > has already been thoroughly invented. I'll be recommending passlib for > use by others in the future and look into adopting it for my own > projects. You are welcome! I'm using passlib for about two years and really like its API. PyPI surprises now and then with its hidden gems. I wished we had a way to draw more attention to good solutions, something like "official endorsed projects" or so. > However, password hashing is an important and common enough problem > that it would be good to have some basic level of support in the > standard library, with a clear migration path to a more feature > complete approach like passlib. > > It would be good if someone was willing to do the work of raising this > discussion with the passlib authors, and looking to see if a suitably > stable core could be extracted that is API compatible with passlib, > and could be proposed as a standard library addition for 3.4. That's a nice idea, Nick! I've added one of the two core developers of passlib to the CC list. The other one doesn't have his/her email address exposed on Google Code. A stripped down and API compatible version of passlib would make a good addition for Python's standard library. IMHO the complete passlib package is too big for the core. The context API and handlers for bcrypt, pbkdf2 and sha*_crypt are sufficient. Developers can still install passlib if they need all features. We need to come up with a different name (passhash ?) for the stdlib variant. Christian From jimjjewett at gmail.com Mon Jun 11 16:21:29 2012 From: jimjjewett at gmail.com (Jim Jewett) Date: Mon, 11 Jun 2012 10:21:29 -0400 Subject: [Python-ideas] changing sys.stdout encoding In-Reply-To: References: <87fwa7a945.fsf@uwakimon.sk.tsukuba.ac.jp> <1339102104.17621.YahooMailClassic@web161503.mail.bf1.yahoo.com> Message-ID: On Thu, Jun 7, 2012 at 5:00 PM, Mike Meyer wrote: > On Thu, Jun 7, 2012 at 4:48 PM, Rurpy wrote: >> I suspect the vast majority of >> programmers are interested in a language that allows >> them to *effectively* get done what they need to, Agreed. The problem is that your use case gets hit by several special cases at once. Usually, you don't need to worry about encodings at all; the default is sufficient. Obviously not the case for you. Usually, the answer is just to open a file (or stream) the way you want to. sys.stdout is special because you don't open it. If you do want to change sys.stdout, usually the answer is to replace it with a different object. Apparently (though I missed the reason why) that doesn't work for you, and you need to keep using the same underlying stream. So at that point, replacing it with a wrapped version of itself probably *is* the simplest solution. The remaining problem is how to find the least bad way of doing that. Your solution does work. Adding it as an example to the docs would probably be reasonable, but someone seems to have worked pretty hard at keeping the sys module documentation short. I could personally support a wrap function on the sys.std* streams that took care of flushing before wrapping, but ... there is a cost, in that the API gets longer, and therefore harder to learn. > or applications > outside of those built for your system that have a "--encoding" type > flag? There are plenty of applications with an encoding flag; I'm not sure how often it applies to sys.std*, as opposed to named files. -jJ From rurpy at yahoo.com Mon Jun 11 16:42:46 2012 From: rurpy at yahoo.com (Rurpy) Date: Mon, 11 Jun 2012 07:42:46 -0700 (PDT) Subject: [Python-ideas] TextIOWrapper callable encoding parameter Message-ID: <1339425766.92977.YahooMailClassic@web161502.mail.bf1.yahoo.com> Here is another issue that came up in my ongoing adventure porting to Python3... Executive summary: ================== There is no good way to read a text file when the encoding has to be determined by reading the start of the file. A long-winded version of that follows. Scroll down the the "Proposal" section to skip it. Problem: ======== When one opens a text file for reading, one must specify (explicitly or by default) an encoding which Python will use to convert the raw bytes read into Python strings. This means one must know the encoding of a file before opening it, which is usually the case, but not always. Plain text files have no meta-data giving their encoding so sometimes it may not be known and some of the file must be read and a guess made. Other data like html pages, xml files or python source code have encoding information inside them, but that too requires reading the start of the file without knowing the encoding in advance. I see three ways in general in Python 3 currently to attack this problem, but each has some severe drawbacks: 1. The most straight-forward way to handle this is to open the file twice, first in binary mode or with latin1 encoding and again in text mode after the encoding has been determined This of course has a performance cost since the data is read twice. Further, it can't be used if the data source is a from a pipe, socket or other non-rewindable source. This includes sys.stdin when it comes from a pipe. 2. Alternatively, with a little more expertise, one can rewrap the open binary stream in a TextIOWrapper to avoid a second OS file open. The standard library's tokenize.open() function does this: def open(filename): buffer = builtins.open(filename, 'rb') encoding, lines = detect_encoding(buffer.readline) buffer.seek(0) text = TextIOWrapper(buffer, encoding, line_buffering=True) text.mode = 'r' return text This too seems to read the data twice and of course the seek(0) prevents this method also from being usable with pipes, sockets and other non-seekable sources. 3. Another method is to simply leave the file open in binary mode, read bytes data, and manually decode it to text. This seems to be the only option when reading from non-rewindable sources like pipes and sockets, etc. But then ones looses the all the advantages of having a text stream even though one wants to be reading text! And if one tries to hide this, one ends up reimplementing a good part of TextIOWrapper! I believe these problems could be addressed with a fairly simple and clean modification of the io.TextIOWrapper class... Proposal ======== The following is a logical description; I don't mean to imply that the code must follow this outline exactly. It is based on looking at _pyio; I hope the C code is equivalent. 1. Allow io.TextIOWrapper's encoding parameter to be a callable object in addition to a string or None. 2. In __init__(), if the encoding parameter was callable, record it as an encoding hook and leave encoding set to None. 3. The places in Io.TextIOWrapper that currently read undecoded data from the internal buffer object and decode (only methods read() and read_chunk() I think) it would be modified to do so in this way: 4. Read data from the buffer object as is done now. 5. If the encoding has been set, get a decoder if necessary and continue on as usual. 6. If the encoding is None, call the encoding callable with the data just read and the buffer object. 7. The callable will examine the data, possibly using the buffer object's peek method to look further ahead in the file. It returns the name of an encoding. 8. io.TextIOWrapper will get the encoding and record it, and setup the decoder the same way as if the encoding name had been received as a parameter, decode the read data and continue on as usual. 9. In other non-read paths where encoding needs to be known, raise an error if it is still None. Were io.TextWrapper modified this way, it would offer: * Better performance since there is no need to reread data * Read data is decoded after being examined so the stream is usable with serial datasources like pipes, sockets, etc. * User code is simplified and clearer; there is better separation of concerns. For example, the code in the "Problem" section could be written: stream = open(filename, encoding=detect_encoding): ... def detect_encoding (data, buffer): # This is still basically the same function as # in the code in the "Problem" section. ... look for Python coding declaration in first two lines of the 'data' bytes object. if not found_encoding: raise Error ("unable to determine encoding") return found_encoding I have modified a copy the _pyio module as described and the changes required seemed unsurprising and relatively few, though I am sure there are subtleties and other considerations I am missing. Hence this post seeking feedback... From rurpy at yahoo.com Mon Jun 11 17:06:18 2012 From: rurpy at yahoo.com (Rurpy) Date: Mon, 11 Jun 2012 08:06:18 -0700 (PDT) Subject: [Python-ideas] TextIOWrapper callable encoding parameter Message-ID: <1339427178.24737.YahooMailClassic@web161502.mail.bf1.yahoo.com> As a followup, here are some timing data that seem to confirm a modest increase in speed as a result of implementing the callable encoding parameter I proposed (although that would not be the main reason for wanting to do it.) These are just for illustration. (Among many other reasons, _pyio benchmarks are not very useful.) I read four short test files using four methods for determining the test file's encoding. The test files are a simplified model of a python coding declaration (always on first line in our case with no BOM present [*1]) followed by mixed english and japanese text. Method 0 (reopen0): Use the encoding callable I am proposing. def reopen0 (fname): def hook (data,buf): return get_encoding (data) t = io.open (fname, encoding=hook) Method 1 (reopen1): Open in binary to determine encoding, then rewrap in a TextIOWrapper with the correct encoding. def reopen1 (fname): b = io.open (fname, 'rb') line = b.readline() enc = get_encoding (line) b.seek (0) t = io.TextIOWrapper (b, enc, line_buffering=True) t.mode = 'r' Method 2 (reopen2): Open in binary to determine encoding, then reopen in text mode with correct encoding. def reopen2 (fname): b = io.open (fname, 'rb') line = b.readline() enc = get_encoding (line) t = io.open (fname, encoding=enc) Method 3 (reopen3): Open in text mode (latin1) to determine encoding, then reopen in text mode with correct encoding. def reopen3 (fname): f = io.open (fname, encoding='latin1') line = f.readline() enc = get_encoding (line) t = io.open (fname, encoding=enc) The same get_encoding() function is used in all methods [*1]. The input test data are all small files (because we want to measure encoding detection, not how fast read() runs.) Each has a python/emacs coding declaration in the first line. test.utf8 -- Tiny python program with coding declaration and single print statement in main() function that prints a short word (literal) in Japanese. Encoding is utf-8 (122 bytes). test.sjis -- Identical to test.utf8 but sjis encoding (111 bytes). test2.utf8 -- A python coding declaration followed by approximately 50 long lines with mixed English and Japanese (4274 bytes). test2.sjis -- Identical to test2.utf8 but sjis encoding (3401 bytes). Results: --------------------------------------------------------- $ python3 bm.py test.utf8 test.utf8 / reopen0: total time (10000 reps) was 1.188323 test.utf8 / reopen1: total time (10000 reps) was 1.490757 test.utf8 / reopen2: total time (10000 reps) was 1.766081 test.utf8 / reopen3: total time (10000 reps) was 2.141996 $ python3 bm.py test.sjis test.sjis / reopen0: total time (10000 reps) was 1.175914 test.sjis / reopen1: total time (10000 reps) was 1.471780 test.sjis / reopen2: total time (10000 reps) was 1.764444 test.sjis / reopen3: total time (10000 reps) was 2.122550 $ python3 bm.py test2.utf8 test2.utf8 / reopen0: total time (10000 reps) was 1.690255 test2.utf8 / reopen1: total time (10000 reps) was 1.996235 test2.utf8 / reopen2: total time (10000 reps) was 2.278798 test2.utf8 / reopen3: total time (10000 reps) was 2.727867 $ python3 bm.py test2.sjis test2.sjis / reopen0: total time (10000 reps) was 1.841388 test2.sjis / reopen1: total time (10000 reps) was 2.147142 test2.sjis / reopen2: total time (10000 reps) was 2.426701 test2.sjis / reopen3: total time (10000 reps) was 2.873278 ---------------------------------------------------------- Here is what happen when a test data file is piped into a program using the four methods above: $ cat test.utf8 | python3 stdin.py reopen0 read 102 characters $ cat test.utf8 | python3 stdin.py reopen1 got exception: [Errno 29] Illegal seek $ cat test.utf8 | python3 stdin.py reopen2 read 0 characters $ cat test.utf8 | python3 stdin.py reopen3 read 0 characters ---- [*1] Here is the get_encoding function used above. It is a toy simplified python source encoding line reader. Toy, in that is looks at only one line, doesn't consider a BOM, etc. It purpose was to allow me to sanity check the benefits of having a callable encoding parameter. def get_encoding (line): if isinstance (line, bytes): nlpos = line.index(b'\n') mo = ENC_PATTERN_B.search (line, 0, nlpos) if not mo: return None enc = mo.group(1).decode ('latin1') else: nlpos = line.index('\n') mo = ENC_PATTERN_S.search (line, 0, nlpos) if not mo: return None enc = mo.group(1) return enc From ncoghlan at gmail.com Mon Jun 11 17:10:47 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 12 Jun 2012 01:10:47 +1000 Subject: [Python-ideas] TextIOWrapper callable encoding parameter In-Reply-To: <1339425766.92977.YahooMailClassic@web161502.mail.bf1.yahoo.com> References: <1339425766.92977.YahooMailClassic@web161502.mail.bf1.yahoo.com> Message-ID: Immediate thought: it seems like it would be easier to offer a way to inject data back into a buffered IO object's internal buffer. -- Sent from my phone, thus the relative brevity :) On Jun 12, 2012 12:43 AM, "Rurpy" wrote: > Here is another issue that came up in my ongoing > adventure porting to Python3... > > Executive summary: > ================== > > There is no good way to read a text file when the > encoding has to be determined by reading the start > of the file. A long-winded version of that follows. > Scroll down the the "Proposal" section to skip it. > > Problem: > ======== > > When one opens a text file for reading, one must specify > (explicitly or by default) an encoding which Python will > use to convert the raw bytes read into Python strings. > This means one must know the encoding of a file before > opening it, which is usually the case, but not always. > > Plain text files have no meta-data giving their encoding > so sometimes it may not be known and some of the file must > be read and a guess made. Other data like html pages, xml > files or python source code have encoding information inside > them, but that too requires reading the start of the file > without knowing the encoding in advance. > > I see three ways in general in Python 3 currently to attack > this problem, but each has some severe drawbacks: > > 1. The most straight-forward way to handle this is to open > the file twice, first in binary mode or with latin1 encoding > and again in text mode after the encoding has been determined > This of course has a performance cost since the data is read > twice. Further, it can't be used if the data source is a > from a pipe, socket or other non-rewindable source. This > includes sys.stdin when it comes from a pipe. > > 2. Alternatively, with a little more expertise, one can rewrap > the open binary stream in a TextIOWrapper to avoid a second > OS file open. The standard library's tokenize.open() > function does this: > > def open(filename): > buffer = builtins.open(filename, 'rb') > encoding, lines = detect_encoding(buffer.readline) > buffer.seek(0) > text = TextIOWrapper(buffer, encoding, line_buffering=True) > text.mode = 'r' > return text > > This too seems to read the data twice and of course the > seek(0) prevents this method also from being usable with > pipes, sockets and other non-seekable sources. > > 3. Another method is to simply leave the file open in > binary mode, read bytes data, and manually decode it to > text. This seems to be the only option when reading from > non-rewindable sources like pipes and sockets, etc. > But then ones looses the all the advantages of having > a text stream even though one wants to be reading text! > And if one tries to hide this, one ends up reimplementing > a good part of TextIOWrapper! > > I believe these problems could be addressed with a fairly > simple and clean modification of the io.TextIOWrapper > class... > > Proposal > ======== > The following is a logical description; I don't mean to > imply that the code must follow this outline exactly. > It is based on looking at _pyio; I hope the C code is > equivalent. > > 1. Allow io.TextIOWrapper's encoding parameter to be a > callable object in addition to a string or None. > > 2. In __init__(), if the encoding parameter was callable, > record it as an encoding hook and leave encoding set to > None. > > 3. The places in Io.TextIOWrapper that currently read > undecoded data from the internal buffer object and decode > (only methods read() and read_chunk() I think) it would > be modified to do so in this way: > > 4. Read data from the buffer object as is done now. > > 5. If the encoding has been set, get a decoder if necessary > and continue on as usual. > > 6. If the encoding is None, call the encoding callable > with the data just read and the buffer object. > > 7. The callable will examine the data, possibly using the > buffer object's peek method to look further ahead in the > file. It returns the name of an encoding. > > 8. io.TextIOWrapper will get the encoding and record it, > and setup the decoder the same way as if the encoding name > had been received as a parameter, decode the read data and > continue on as usual. > > 9. In other non-read paths where encoding needs to be known, > raise an error if it is still None. > > Were io.TextWrapper modified this way, it would offer: > > * Better performance since there is no need to reread data > > * Read data is decoded after being examined so the stream > is usable with serial datasources like pipes, sockets, etc. > > * User code is simplified and clearer; there is better > separation of concerns. For example, the code in the > "Problem" section could be written: > > stream = open(filename, encoding=detect_encoding): > ... > def detect_encoding (data, buffer): > # This is still basically the same function as > # in the code in the "Problem" section. > ... look for Python coding declaration in > first two lines of the 'data' bytes object. > if not found_encoding: > raise Error ("unable to determine encoding") > return found_encoding > > I have modified a copy the _pyio module as described and > the changes required seemed unsurprising and relatively > few, though I am sure there are subtleties and other > considerations I am missing. Hence this post seeking > feedback... > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Mon Jun 11 17:11:50 2012 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Mon, 11 Jun 2012 09:11:50 -0600 Subject: [Python-ideas] TextIOWrapper callable encoding parameter In-Reply-To: <1339425766.92977.YahooMailClassic@web161502.mail.bf1.yahoo.com> References: <1339425766.92977.YahooMailClassic@web161502.mail.bf1.yahoo.com> Message-ID: On Mon, Jun 11, 2012 at 8:42 AM, Rurpy wrote: > Here is another issue that came up in my ongoing > adventure porting to Python3... > > Executive summary: > ================== > > There is no good way to read a text file when the > encoding has to be determined by reading the start > of the file. ?A long-winded version of that follows. > Scroll down the the "Proposal" section to skip it. FWIW, the import system does an encoding check on Python source files that is somewhat related. See http://www.python.org/dev/peps/pep-0263/. -eric From stephen at xemacs.org Mon Jun 11 18:24:20 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 12 Jun 2012 01:24:20 +0900 Subject: [Python-ideas] TextIOWrapper callable encoding parameter In-Reply-To: References: <1339425766.92977.YahooMailClassic@web161502.mail.bf1.yahoo.com> Message-ID: <871ull95qj.fsf@uwakimon.sk.tsukuba.ac.jp> Nick Coghlan writes: > Immediate thought: it seems like it would be easier to offer a way to > inject data back into a buffered IO object's internal buffer. ungetch()? If you're only interested in the top of the file (see below), I would suggest allowing only one bufferfull, and then simply rewinding the buffer pointer once you're done. This is one strategy used by Emacsen for encoding detection (for the reason pointed out by Rurpy: not all streams are rewindable). But is that really "easier"? It might be more general, but you still need to reinitialize the encoding (ie, from the trivial "binary" to whatever is detected), with all the hair that comes with that. > > Executive summary: > > ================== > > > > There is no good way to read a text file when the > > encoding has to be determined by reading the start > > of the file. A long-winded version of that follows. > > Scroll down the the "Proposal" section to skip it. This may be insufficiently general. Specifically, both Emacsen and vi allow specification of editor configuration variables at the bottom of the file as well as the top. I don't know whether vi allows encoding specs at the bottom, but Emacsen do (but only for files). I wouldn't recommend paying much attention to what Emacsen actually *do* when initializing a stream (it's, uh, "baroque"). From guido at python.org Mon Jun 11 18:49:45 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 11 Jun 2012 09:49:45 -0700 Subject: [Python-ideas] Add adaptive-load salt-mandatory hashing functions? In-Reply-To: References: <43D1A0EB-A850-4F67-9B6F-D852DE23B0F8@masklinn.net> <4FD4AC72.3070206@kozea.fr> <5E974769-E9F0-4F0D-BE45-CC2BD55E8902@masklinn.net> <4FD4E1A1.6030409@kozea.fr> <8C7C9869-CB87-4E98-AD4E-AACD83375676@masklinn.net> <4FD4E66B.7080802@kozea.fr> <25BE8DE9-B5DF-48B9-B4DF-E6C82033C3E1@masklinn.net> Message-ID: On Mon, Jun 11, 2012 at 3:03 AM, Nick Coghlan wrote: > However, password hashing is an important and common enough problem > that it would be good to have some basic level of support in the > standard library, with a clear migration path to a more feature > complete approach like passlib. I usually like this approach, but here I am hesitant, because of the cost if the basic approach is found inadequate. The stdlib support should either be state-of-the art or so poor that people are naturally driven to a state-of-the art alternative on PyPI that is maintained regularly. In this case I think our only option is the latter. I do think it is another example of a situation where the stdlib docs ought to contain some hints about where to go instead for this functionality. -- --Guido van Rossum (python.org/~guido) From masklinn at masklinn.net Mon Jun 11 22:08:11 2012 From: masklinn at masklinn.net (Masklinn) Date: Mon, 11 Jun 2012 22:08:11 +0200 Subject: [Python-ideas] Add adaptive-load salt-mandatory hashing functions? In-Reply-To: References: <43D1A0EB-A850-4F67-9B6F-D852DE23B0F8@masklinn.net> <4FD4AC72.3070206@kozea.fr> <5E974769-E9F0-4F0D-BE45-CC2BD55E8902@masklinn.net> <4FD4E1A1.6030409@kozea.fr> <8C7C9869-CB87-4E98-AD4E-AACD83375676@masklinn.net> <4FD4E66B.7080802@kozea.fr> <25BE8DE9-B5DF-48B9-B4DF-E6C82033C3E1@masklinn.net> Message-ID: On 2012-06-11, at 18:49 , Guido van Rossum wrote: > On Mon, Jun 11, 2012 at 3:03 AM, Nick Coghlan wrote: >> However, password hashing is an important and common enough problem >> that it would be good to have some basic level of support in the >> standard library, with a clear migration path to a more feature >> complete approach like passlib. > > I usually like this approach, but here I am hesitant, because of the > cost if the basic approach is found inadequate. The stdlib support > should either be state-of-the art Well depends what you mean by "state of the art", PBKDF2 is still the "tried and true" trusted password-hashing algorithm (it's the one used in TrueCrypt, 1Password, WPA2, DPAPI and many others). bcrypt is the "old newness", working on the same principle as PBKDF2 (do lots of work) but a different underlying algorithm, and scrypt is the "new newness" as it includes being memory-hard on top of being processing-hard, but is significantly less trusted as it's only a few years old. So as far as I know, PBKDF2 is indeed "state of the art", scrypt is "bleeding edge" and bcrypt is somewhere in-between[0] (but if PBKDF2 is found to be insufficient, bcrypt will fall for similar reasons: it's only binding on CPU power and is easy to parallelize). Ulrich Drepper also built an MD5crypt-inspired crypt based on SHA2 (and fixed a few weak ideas of MD5crypt[1]) a few years ago. As a matter of facts, passlib notes PBKDF2/SHA512 as one of its three recommendation (alongside bcrypt and sha512_crypt) and notes it is the most portable of three roughly equivalent choices[2] (and that sha512_crypt is somewhat baroque and harder to analyze for flaws than the alternatives). > or so poor that people are naturally > driven to a state-of-the art alternative on PyPI that is maintained > regularly. In this case I think our only option is the latter. I do > think it is another example of a situation where the stdlib docs ought > to contain some hints about where to go instead for this > functionality. The issue with this idea is that people are *not* driven to state-of-the-art alternatives because they don't understand or know the issue. And as a result, as we've seen last week, they'll use cryptographic hashes (with or without salts) even though those are insufficient, because that's available and they read on the internet that it was what they needed. And how are you going to make people understand there's a difference between a cryptographic hash and a password hash by doing nothing, giving them cryptographic hashes and leaving them to their own devices? [0] and beyond the bleeding edge lies ubiquitous 2-factor auth, probably. [1] MD5crypt can not use adaptive load factors and injects constant data at some points, it also allows longer salts. [2] http://packages.python.org/passlib/new_app_quickstart.html#recommended-hashes From guido at python.org Mon Jun 11 22:21:07 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 11 Jun 2012 13:21:07 -0700 Subject: [Python-ideas] Add adaptive-load salt-mandatory hashing functions? In-Reply-To: References: <43D1A0EB-A850-4F67-9B6F-D852DE23B0F8@masklinn.net> <4FD4AC72.3070206@kozea.fr> <5E974769-E9F0-4F0D-BE45-CC2BD55E8902@masklinn.net> <4FD4E1A1.6030409@kozea.fr> <8C7C9869-CB87-4E98-AD4E-AACD83375676@masklinn.net> <4FD4E66B.7080802@kozea.fr> <25BE8DE9-B5DF-48B9-B4DF-E6C82033C3E1@masklinn.net> Message-ID: On Mon, Jun 11, 2012 at 1:08 PM, Masklinn wrote: > On 2012-06-11, at 18:49 , Guido van Rossum wrote: > >> On Mon, Jun 11, 2012 at 3:03 AM, Nick Coghlan wrote: >>> However, password hashing is an important and common enough problem >>> that it would be good to have some basic level of support in the >>> standard library, with a clear migration path to a more feature >>> complete approach like passlib. >> >> I usually like this approach, but here I am hesitant, because of the >> cost if the basic approach is found inadequate. The stdlib support >> should either be state-of-the art > > Well depends what you mean by "state of the art", PBKDF2 is still the > "tried and true" trusted password-hashing algorithm (it's the one used > in TrueCrypt, 1Password, WPA2, DPAPI and many others). bcrypt is the > "old newness", working on the same principle as PBKDF2 (do lots of work) > but a different underlying algorithm, and scrypt is the "new newness" as > it includes being memory-hard on top of being processing-hard, but is > significantly less trusted as it's only a few years old. > > So as far as I know, PBKDF2 is indeed "state of the art", scrypt is > "bleeding edge" and bcrypt is somewhere in-between[0] (but if PBKDF2 is > found to be insufficient, bcrypt will fall for similar reasons: it's > only binding on CPU power and is easy to parallelize). Ulrich Drepper > also built an MD5crypt-inspired crypt based on SHA2 (and fixed a few > weak ideas of MD5crypt[1]) a few years ago. > > As a matter of facts, passlib notes PBKDF2/SHA512 as one of its three > recommendation (alongside bcrypt and sha512_crypt) and notes it is the > most portable of three roughly equivalent choices[2] (and that > sha512_crypt is somewhat baroque and harder to analyze for flaws than > the alternatives). > >> or so poor that people are naturally >> driven to a state-of-the art alternative on PyPI that is maintained >> regularly. In this case I think our only option is the latter. I do >> think it is another example of a situation where the stdlib docs ought >> to contain some hints about where to go instead for this >> functionality. > > The issue with this idea is that people are *not* driven to > state-of-the-art alternatives because they don't understand or know the > issue. And as a result, as we've seen last week, they'll use > cryptographic hashes (with or without salts) even though those are > insufficient, because that's available and they read on the internet > that it was what they needed. Is there any indication that Python was involved in last week's incidents? (I'm only aware of the Linkedin one -- were there others?) > And how are you going to make people understand there's a difference > between a cryptographic hash and a password hash by doing nothing, > giving them cryptographic hashes and leaving them to their own devices? Do you really think that including some API in the stdlib is going to make a difference in education? And what would we do if in 2 years time the stdlib's "basic functionality" were somehow compromised (not due to a bug in Python's implementation but simply through some advance in the crypto world) -- how would we get everyone who relied on the stdlib to switch to a different algorithm? I really think that the right approach here is to get *everyone* who needs this to use a 3rd party library. Diversity is very good here! > [0] and beyond the bleeding edge lies ubiquitous 2-factor auth, > ? ?probably. > [1] MD5crypt can not use adaptive load factors and injects constant > ? ?data at some points, it also allows longer salts. > [2] http://packages.python.org/passlib/new_app_quickstart.html#recommended-hashes TBH it's possible that I'm not sufficiently familiar with the issue to have a valid opinion here -- I would never dream of taking on the responsibility of password security for anything, since I don't have the right crypto hacker mindset. But I do worry about having attractive suboptimal solutions to common security problems in the stdlib. -- --Guido van Rossum (python.org/~guido) From lists at cheimes.de Mon Jun 11 22:39:32 2012 From: lists at cheimes.de (Christian Heimes) Date: Mon, 11 Jun 2012 22:39:32 +0200 Subject: [Python-ideas] Add adaptive-load salt-mandatory hashing functions? In-Reply-To: References: <43D1A0EB-A850-4F67-9B6F-D852DE23B0F8@masklinn.net> <4FD4AC72.3070206@kozea.fr> <5E974769-E9F0-4F0D-BE45-CC2BD55E8902@masklinn.net> <4FD4E1A1.6030409@kozea.fr> <8C7C9869-CB87-4E98-AD4E-AACD83375676@masklinn.net> <4FD4E66B.7080802@kozea.fr> <25BE8DE9-B5DF-48B9-B4DF-E6C82033C3E1@masklinn.net> Message-ID: Am 11.06.2012 22:21, schrieb Guido van Rossum: > Is there any indication that Python was involved in last week's > incidents? (I'm only aware of the Linkedin one -- were there others?) No, zero Pythons were harmed. The other victims were last.fm and eHarmony. Surprisingly, Sony wasn't hacked last week! *scnr* > Do you really think that including some API in the stdlib is going to > make a difference in education? And what would we do if in 2 years > time the stdlib's "basic functionality" were somehow compromised (not > due to a bug in Python's implementation but simply through some > advance in the crypto world) -- how would we get everyone who relied > on the stdlib to switch to a different algorithm? I really think that > the right approach here is to get *everyone* who needs this to use a > 3rd party library. Diversity is very good here! +1 I'm against adding just the password hashing algorithms. Developers can easily screw up right algorithm with a erroneous approach. It's the beauty of passlib: The framework hides all the complex and easy-to-get-wrong stuff behind a minimal API. Christian From ncoghlan at gmail.com Mon Jun 11 22:54:43 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 12 Jun 2012 06:54:43 +1000 Subject: [Python-ideas] Add adaptive-load salt-mandatory hashing functions? In-Reply-To: References: <43D1A0EB-A850-4F67-9B6F-D852DE23B0F8@masklinn.net> <4FD4AC72.3070206@kozea.fr> <5E974769-E9F0-4F0D-BE45-CC2BD55E8902@masklinn.net> <4FD4E1A1.6030409@kozea.fr> <8C7C9869-CB87-4E98-AD4E-AACD83375676@masklinn.net> <4FD4E66B.7080802@kozea.fr> <25BE8DE9-B5DF-48B9-B4DF-E6C82033C3E1@masklinn.net> Message-ID: On Tue, Jun 12, 2012 at 6:21 AM, Guido van Rossum wrote: > On Mon, Jun 11, 2012 at 1:08 PM, Masklinn wrote: >> The issue with this idea is that people are *not* driven to >> state-of-the-art alternatives because they don't understand or know the >> issue. And as a result, as we've seen last week, they'll use >> cryptographic hashes (with or without salts) even though those are >> insufficient, because that's available and they read on the internet >> that it was what they needed. > > Is there any indication that Python was involved in last week's > incidents? (I'm only aware of the Linkedin one -- were there others?) eHarmony and last.fm were the other two prominent sites I saw mentioned. We're not aware of any specific Python connection, it just prompted the current discussion of whether or not there was anything CPython could do to nudge developers in the right direction. Even a native PBKDF2 would be an awful lot better than nothing. >> And how are you going to make people understand there's a difference >> between a cryptographic hash and a password hash by doing nothing, >> giving them cryptographic hashes and leaving them to their own devices? > > Do you really think that including some API in the stdlib is going to > make a difference in education? And what would we do if in 2 years > time the stdlib's "basic functionality" were somehow compromised (not > due to a bug in Python's implementation but simply through some > advance in the crypto world) -- how would we get everyone who relied > on the stdlib to switch to a different algorithm? I really think that > the right approach here is to get *everyone* who needs this to use a > 3rd party library. Diversity is very good here! I think it's similar to the situation with hmac: for backwards compatibility reasons, the default hash in hmac is still MD5. That doesn't mean hmac is useless, and using MD5 is still better than doing nothing. It's all about raising the bar for attackers, and the fact that attackers are continually inventing better ladders and grappling hooks doesn't mean the older walls become completely useless. However, I also think, with the right API design, we could allow for the key derivation algorithms to be retuned in security releases, *because* the state of the art of evolves (and because computers get faster). The passlib core APIs and hash formats are designed with precisely that problem in mind. >> [0] and beyond the bleeding edge lies ubiquitous 2-factor auth, >> ? ?probably. >> [1] MD5crypt can not use adaptive load factors and injects constant >> ? ?data at some points, it also allows longer salts. >> [2] http://packages.python.org/passlib/new_app_quickstart.html#recommended-hashes > > TBH it's possible that I'm not sufficiently familiar with the issue to > have a valid opinion here -- I would never dream of taking on the > responsibility of password security for anything, since I don't have > the right crypto hacker mindset. But I do worry about having > attractive suboptimal solutions to common security problems in the > stdlib. The trick is that even a suboptimal solution is a whole lot better than the next-to-nothing that many people do currently. At the moment, the available approaches are: 1. store plaintext passwords (eek) 2. store hashed unsalted passwords (vulnerable to rainbow tables) 3. store hashed salted passwords (vulnerable to massively parallel brute force attacks) 4. store tunable cost hashed salted passwords (reduces vulnerability to brute force, currently requires a third party library) Option 4 *is* the state of the art, it's just a matter of tinkering with the key derivation algorithm in response to advances in crypto improvements, as well as ramping up the tuning parameters over time to account for Moore's law. By making it as easy as possible for people to use Option 4 instead of one of the first 3, we increase the odds of people doing the right thing. A third party library like passlib can then focus on more dynamic things like: 1. Providing API compatible interfaces to 3rd party key derivation algorithms (e.g. bcrypt, or the accelerated PBKDF2 implementation in M2crypto), as well as to newer ones like scrypt 2. Providing convenient interfaces for reading and writing 3rd party hash storage formats (e.g. LDAP) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Mon Jun 11 23:00:27 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 12 Jun 2012 07:00:27 +1000 Subject: [Python-ideas] Add adaptive-load salt-mandatory hashing functions? In-Reply-To: References: <43D1A0EB-A850-4F67-9B6F-D852DE23B0F8@masklinn.net> <4FD4AC72.3070206@kozea.fr> <5E974769-E9F0-4F0D-BE45-CC2BD55E8902@masklinn.net> <4FD4E1A1.6030409@kozea.fr> <8C7C9869-CB87-4E98-AD4E-AACD83375676@masklinn.net> <4FD4E66B.7080802@kozea.fr> <25BE8DE9-B5DF-48B9-B4DF-E6C82033C3E1@masklinn.net> Message-ID: On Tue, Jun 12, 2012 at 6:39 AM, Christian Heimes wrote: > Am 11.06.2012 22:21, schrieb Guido van Rossum: >> Do you really think that including some API in the stdlib is going to >> make a difference in education? And what would we do if in 2 years >> time the stdlib's "basic functionality" were somehow compromised (not >> due to a bug in Python's implementation but simply through some >> advance in the crypto world) -- how would we get everyone who relied >> on the stdlib to switch to a different algorithm? I really think that >> the right approach here is to get *everyone* who needs this to use a >> 3rd party library. Diversity is very good here! > > +1 > > I'm against adding just the password hashing algorithms. Developers can > easily screw up right algorithm with a erroneous approach. It's the > beauty of passlib: The framework hides all the complex and > easy-to-get-wrong stuff behind a minimal API. Right, when I suggested looking for an "API compatible stable core" that could be added for 3.4, I was specifically thinking of: 1. The core CryptContext API 2. The PBKDF2 and sha512_crypt derivation functions Based on a brief look a the module documentation, those parts seem like they're sufficiently mature to be suitable for the stdlib, whereas the rest of passlib is more suited to development as a 3rd party library with its own release schedule. However, I could be completely wrong, thus the suggestion that it be looked into, rather than "we should definitely do this". At the very least, we should be directing people towards passlib for password storage and comparison purposes. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From techtonik at gmail.com Tue Jun 12 11:04:58 2012 From: techtonik at gmail.com (anatoly techtonik) Date: Tue, 12 Jun 2012 12:04:58 +0300 Subject: [Python-ideas] stdlib crowdsourcing In-Reply-To: References: Message-ID: On Sat, Jun 2, 2012 at 8:24 PM, Calvin Spealman wrote: > On Fri, Jun 1, 2012 at 11:08 AM, anatoly techtonik wrote: >> On Tue, May 29, 2012 at 9:02 AM, Nick Coghlan wrote: >>> Once again, you're completely ignoring all existing knowledge and >>> expertise on open collaboration and trying to reinvent the world. It's >>> *not going to happen*. >> >> It's too boring to live in a world of existing knowledge and >> expertise, > > Frankly, this one fragment is enough to stop me reading further. Who > wants to learn > from the vast and broad experience when you could simply randomize the rules of > reality through ignorance and stubbornness? If everybody would think like this, the world will never learn about anti-patterns, and the software craftmanship collapsed in astonishing agony some years ago. If it doesn't make it clear - it is not randomizing - it is putting beliefs to the test asking for the current status. > I sound fickle, because I am. It doesn't matter how do you sound, what matters is that you spoiled the fun to discuss the technical part no matter how long ago it was invented. If you have a lot of people who ask the same question - create a FAQ. That's not a vast and broad experience - that's just a time proven practice from usenet times. Common guys, what's wrong with you? It is just an idea, not a proposal or scientific paper. And I am not a scientist - I just want to discuss the idea, and I am not sending mails to python-dev anymore, because you asked to. I've spent some time trying to make the idea interesting. It is fine If you know a scientific paper about the matter, can explain it in a few words and send a link for more details. But the replies like "you're stubborn and ignorant, and nobody should help you" doesn't make you a better person. I am criticizing, because I lack time, motivation and fantasy to write stuff about good and bright sides in my life that I just don't see. I write because I see bad things that can be better, and I am still open to discuss if it is real or not. -- anatoly t. From fetchinson at googlemail.com Tue Jun 12 12:16:06 2012 From: fetchinson at googlemail.com (Daniel Fetchinson) Date: Tue, 12 Jun 2012 12:16:06 +0200 Subject: [Python-ideas] stdlib crowdsourcing In-Reply-To: References: Message-ID: > I just want to discuss the idea, Great! You even got a perfectly good answer already: *not going to happen*! Hey, this seems to be working: you raise a point, the community discusses it and after careful deliberation comes up with an answer! This is how things are supposed to be working, aren't they? > But the replies like "you're stubborn and ignorant, and > nobody should help you" doesn't make you a better person. Hey, hey, hey, you are overlooking the other answers you got! You think they came from thin air? People read your post, thought about it, considered the pros and cons and then put in the time to answer it, write an email and hit the Send button. Now, move on, nothing to be seen here, chop, chop, carry on! Cheers, Daniel -- Psss, psss, put it down! - http://www.cafepress.com/putitdown From oscar.j.benjamin at gmail.com Tue Jun 12 16:25:32 2012 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Tue, 12 Jun 2012 15:25:32 +0100 Subject: [Python-ideas] changing sys.stdout encoding In-Reply-To: References: <87fwa7a945.fsf@uwakimon.sk.tsukuba.ac.jp> <1339102104.17621.YahooMailClassic@web161503.mail.bf1.yahoo.com> Message-ID: On 11 June 2012 15:21, Jim Jewett wrote: > On Thu, Jun 7, 2012 at 5:00 PM, Mike Meyer wrote: > > On Thu, Jun 7, 2012 at 4:48 PM, Rurpy wrote: > >> I suspect the vast majority of > >> programmers are interested in a language that allows > >> them to *effectively* get done what they need to, > > Agreed. > > The problem is that your use case gets hit by several special cases at > once. > > Usually, you don't need to worry about encodings at all; the default > is sufficient. Obviously not the case for you. > > Usually, the answer is just to open a file (or stream) the way you > want to. sys.stdout is special because you don't open it. > > If you do want to change sys.stdout, usually the answer is to replace > it with a different object. Apparently (though I missed the reason > why) that doesn't work for you, and you need to keep using the same > underlying stream. > I also think I missed something in this thread. At the beginning of the original thread it seemed that everyone was agreed that writer = codecs.getwriter(desired_encoding) sys.stdout = writer(sys.stdout.buffer) was a reasonable solution (with the caveat that it should happen before any output is written). Is there some reason why this is not a good approach? The only problem I know of is that under Python 2.x it becomes an error to print _already_ encoded strings (they get decoded as ascii before being encoded) but that's probably not a problem for an application that takes a disciplined approach to unicode. > > So at that point, replacing it with a wrapped version of itself > probably *is* the simplest solution. > > The remaining problem is how to find the least bad way of doing that. > Your solution does work. Adding it as an example to the docs would > probably be reasonable, but someone seems to have worked pretty hard > at keeping the sys module documentation short. I could personally > support a wrap function on the sys.std* streams that took care of > flushing before wrapping, but ... there is a cost, in that the API > gets longer, and therefore harder to learn. > > > or applications > > outside of those built for your system that have a "--encoding" type > > flag? > > There are plenty of applications with an encoding flag; I'm not sure > how often it applies to sys.std*, as opposed to named files. > > -jJ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimjjewett at gmail.com Tue Jun 12 17:15:11 2012 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 12 Jun 2012 11:15:11 -0400 Subject: [Python-ideas] changing sys.stdout encoding In-Reply-To: <1339213174.4337.YahooMailClassic@web161506.mail.bf1.yahoo.com> References: <1339213174.4337.YahooMailClassic@web161506.mail.bf1.yahoo.com> Message-ID: On Fri, Jun 8, 2012 at 11:39 PM, Rurpy wrote: > On 06/07/2012 03:00 PM, Mike Meyer wrote: >> On Thu, Jun 7, 2012 at 4:48 PM, Rurpy wrote: >> how do other programming languages deal with wanting to >> change the encoding of the standard IO streams? > This is how it seems to be done in Perl: > ?binmode(STDOUT, ":encoding(sjis)"); > which seems quite a bit simpler than Python. Agreed, in isolation. But in my limited experience, and from reading http://perldoc.perl.org/functions/binmode.html ... I think you probably need to hold at least as many concepts in your head simultaneously to get it to work. > ... The description of binmode() > in "man perlfunc" sounds like encoding can be changed > on-the-fly but my attempt to do so had no effect which sort of belies simple > TCL appears to have on-the-fly encoding changes: > ?| encoding system ?encoding? > ?| The system ?encoding is used whenever Tcl passes strings > | to system calls. > ?http://www.tcl.tk/man/tcl8.4/TclCmd/encoding.htm So if you call rename, the system encoding is used for the filename, but does that mean it is used for sysout? -jJ From jimjjewett at gmail.com Tue Jun 12 17:33:27 2012 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 12 Jun 2012 11:33:27 -0400 Subject: [Python-ideas] changing sys.stdout encoding In-Reply-To: <1339214221.27166.YahooMailClassic@web161503.mail.bf1.yahoo.com> References: <1339214221.27166.YahooMailClassic@web161503.mail.bf1.yahoo.com> Message-ID: On Fri, Jun 8, 2012 at 11:57 PM, Rurpy wrote: > On 06/07/2012 07:01 PM, Nick Coghlan wrote: >> On Fri, Jun 8, 2012 at 10:14 AM, Rurpy wrote: >>> ?sys.stdout = codecs.getwriter(opts.encoding)(sys.stdout.buffer) >>> But that code is not obvious to someone who has been able to do >>> all his encoded IO (with the exception of sys.stdout) using just >>> the encoding parameter of open(). Well, you could do it with sys.stdout too, if you did as part of open. Unfortunately, by the time your code comes along, it is already open -- and may well have already been written to. > OK, I can see that as a use-case design principle. ?I still > don't see any hard technical reason why the same streams could > not be kept and simply allow their encoding's to be reset if > they haven't been used yet. Unfortunately, that leads to very fragile code, that will break unexpectedly because something totally unrelated decided to write a license message to stdout. > But networks, shared files systems, email, etc have all > blurred the concept of localness. ?Just because I am running > my program on a Unix machine does not mean I may not need > to write files with '\n\r' line endings. So write a file, instead of stdout... stdin/stdout is more convenient for pipes, but most such programs do have -i and -o flags for cases like yours. > seems to be an implicit assumption that there is a single > encoding that needs to be determined. Which is reasonable; they aren't the only input/output, they are the *standard* input and output. If they have different encodings, they aren't really standard. (I have some sympathy for a more lenient encoding on stderr.) > That (IIUC) would not be workable for my problem. > > ?./myprog.py -e sjis,sjis [other options...] > > is acceptable. ?Something like: > > ?python -C 'sys.stdin=...; sys.stdout=...' myprog.py [other options...] > > would not be. Tastes differ; I actually prefer the second, as more explicit. > I think that being unable to easily change stream encoding > before first use is orders of magnitude more important than > being unable to change them on-the-fly. Yes, but since we're talking specifically about streams you don't start, that just makes for fragile code that breaks in the field. -jJ From ubershmekel at gmail.com Tue Jun 12 18:11:11 2012 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Tue, 12 Jun 2012 19:11:11 +0300 Subject: [Python-ideas] stdlib crowdsourcing In-Reply-To: References: Message-ID: On Tue, May 29, 2012 at 8:05 AM, anatoly techtonik wrote: > The problem with stdlib - it is all damn subjective. There is no > process to add functions and modules if you're not well-behaved and > skilled in public debates and don't have really a lot of time to be a > champion of your module/function. In other words - it is hard (if not > impossible for 80% of Python Earth population). So, many people and > projects decide to opt-out. Take a look at Twisted - a lot of useful > stuff, but not in Python stdlib. So.. > > Provide a way for people to opt-out from core stuff, but still allow > to share the changes and update code if necessary. > > This will require: > - a local stdlib Python path convention > - snippet normalization function and AST hash dumper > - web site with stats > - source code crawler > > How it works: > 1. Every project maintains its own stdlib directory with functions > that they feel are good to have in standard library > 2. Functions are placed so that they are imported as if from standard > library, but this time with stdlib prefix > 3. The license for this directory is public domain to remove all legal > barriers (credits are welcome, but optional) > 4. Crawler (probably PyPI) scans this stdlib dir, finds functions, > normalizes them, calculates hash and submits to web site > 4.1 Normalization is required to find the shared function > copy/pasted across different projects with different > indentation level, docstrings, parameters/variable names etc. > 4.2 Hash is calculated upon AST. There are at least three hashes for > each entry: > 4.2.1 Full hash - all docstrings and variable names are > preserved, whitespace normalized > 4.2.2 Stripped hash - docstrings are stripped, variable names > are normalized > 4.2.3 Signature hash - a mark placed in a comment above > function name, either calculated from function > signature or generated randomly, used for manual > tracking of copy/paste e.g. pd:ac546df6b8340a92 > 5. Web site maintains usage and popularity staff, accepts votes on > inclusion of snippets > > > User stories: > 1. "I want to find if there is a better/updated version of my function > available" > 1.1 I enter hash into web site search form > 1.2 Site gives me a link to my snippet > 1.3 I can see what people proposed to replace this function with > 1.4 I can choose the function with most votes > 1.5 I can flag the functions I may find irrelevant or > 1.5 I can tag the functions that divert in different direction > than I need to filter them > > 2. "I want to reuse code snippets without additional dependencies on > 3rd party projects" > 1.1 Just place them into my own stdlib directory > > 3. "I want to update code snippets when there is an update for them" > 1.1 I run scanner, it extracts signature hashes, stripped hashes > and looks if web-site version of signature matches normalized hash > > 4. "I want to see what people want to include in the next Python version" > 1.1 A call for proposals is made > 1.2 People place wannabe's into their stdlib dirs > 1.3 Crawl generates new functions on a web site > 1.4 Functions are categorized > 1.5 Optionally included / declined with a short one-liner reason - why > 1.6 Optionally provided with more detailed info why > > --- feature creep cut --- > 5. "I want to see what functions are popular in other languages" > 1.1 A separate crawler for Ruby, PHP etc. stdlib converts their > AST into compatible format where possible > 1.2 Submit to site stats > > 6. "I want to download the function in Ruby format" > 1.1 AST converter tries to do the job automatically where possible > 1.2 If it fails - you are encouraged to fix the converter rules or > write the replacement for this signature manually > > > Just an idea. > -- > anatoly t. > I think having a separate site "anatloy's std-lib" which somehow implemented an easy install of the top 10-100 most useful/popular/selected packages on pypi could be nice. I considered making such a bundle myself a while ago. I don't think it really needs to be python.org sanctioned. Yuval PS I like how candid the replies you got were, and indeed getting a reply is better than the sound of crickets. Though some of these replies carried the scent of excrement poredom - the author's need to import niceness. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwm at mired.org Tue Jun 12 18:51:47 2012 From: mwm at mired.org (Mike Meyer) Date: Tue, 12 Jun 2012 12:51:47 -0400 Subject: [Python-ideas] stdlib crowdsourcing In-Reply-To: References: Message-ID: <20120612125147.72397dd1@bhuda.mired.org> On Tue, 12 Jun 2012 12:04:58 +0300 anatoly techtonik wrote: > On Sat, Jun 2, 2012 at 8:24 PM, Calvin Spealman wrote: > > On Fri, Jun 1, 2012 at 11:08 AM, anatoly techtonik wrote: > >> On Tue, May 29, 2012 at 9:02 AM, Nick Coghlan wrote: > >>> Once again, you're completely ignoring all existing knowledge and > >>> expertise on open collaboration and trying to reinvent the world. It's > >>> *not going to happen*. > >> It's too boring to live in a world of existing knowledge and > >> expertise, > > Frankly, this one fragment is enough to stop me reading further. Who > > wants to learn > > from the vast and broad experience when you could simply randomize the rules of > > reality through ignorance and stubbornness? > If everybody would think like this, the world will never learn about > anti-patterns, and the software craftmanship collapsed in astonishing > agony some years ago. If it doesn't make it clear - it is not > randomizing - it is putting beliefs to the test asking for the current > status. Ah, I think I see Anatoly's problem here. It's an impedance mismatch. He wants to discuss language/platform/environment ideas. This is valuable work, and he does have some interesting ideas. It definitely has a place in the world. It's just that this isn't that place. Python has a set of objectives for the language that have been around long enough to qualify as "traditions". As such, it's not a good place to experiment with arbitrary changes to things, because you keep running afoul of the traditions. > Common guys, what's wrong with you? It is just an idea, not a proposal > or scientific paper. Yes, but it's an idea that ignores the traditions of the environment you're proposing it for. If you're serious about discussing ideas about changing Python, you need to do the groundwork of understanding those traditions, and try and make sure your ideas don't collide with them. It doesn't matter whether or not they're good ideas, if they clash with the traditions, they aren't going to happen. You need to figure that out yourself, and not ask us to do it for you. If, on the other hand, you want to talk about language/platform/environment design ideas without that restriction, then you need a different forum. Just because you happen to be working in Python doesn't mean that a Python forum is appropriate for them, any more than discussing (say) drone control programs would be appropriate in a Python forum just because I happen to be writing it in Python. If you're somewhere in between the two, maybe a PyPy forum would be more appropriate? I dunno. I'm sorry I can't really recommend a good forum for you. The last time I was seriously interested in such things, Python hadn't been released yet. http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From solipsis at pitrou.net Tue Jun 12 23:34:40 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 12 Jun 2012 23:34:40 +0200 Subject: [Python-ideas] Add adaptive-load salt-mandatory hashing functions? References: <43D1A0EB-A850-4F67-9B6F-D852DE23B0F8@masklinn.net> <4FD4AC72.3070206@kozea.fr> <5E974769-E9F0-4F0D-BE45-CC2BD55E8902@masklinn.net> Message-ID: <20120612233440.387d03b4@pitrou.net> On Sun, 10 Jun 2012 10:56:46 -0700 "Gregory P. Smith" wrote: > I'd just stick it in hmac myself but getpass was also a good suggestion. > Cross reference to it from the docs of all three as the real goal of > adding pbkdf2 is to advertise it to users so that they might use it rather > than something more naive. > > hashlib itself should be kept pure as is for standard low level hash > algorithms. It can't have a dependency on anything else. I don't really understand this requirement. Can you elaborate? Regards Antoine. From victor.stinner at gmail.com Tue Jun 12 23:44:00 2012 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 12 Jun 2012 23:44:00 +0200 Subject: [Python-ideas] Replacing the standard IO streams (was Re: changing sys.stdout encoding) In-Reply-To: <4FD3ABCB.5080800@gmail.com> References: <4FD3ABCB.5080800@gmail.com> Message-ID: >> ? ? sys.stdin = open(sys.stdin.fileno(), 'r',) >> ? ? sys.stdout = open(sys.stdout.fileno(), 'w',) >> ? ? sys.stderr = open(sys.stderr.fileno(), 'w',) > > > ? ?sys.stdin = io.TextIOWrapper(sys.stdin.detach(), ) > ? ?sys.stdout = io.TextIOWrapper(sys.stdout.detach(), ) > ? ?... > > None of these methods are not guaranteed to work if the input or output have > occurred before. You should set the newline option for sys.std* files. Python 3 does something like this: if os.name == "win32: # translate "\r\n" to "\n" for sys.stdin on Windows newline = None else: newline = "\n" sys.stdin = io.TextIOWrapper(sys.stdin.detach(), newline=newline, ) sys.stdout = io.TextIOWrapper(sys.stdout.detach(), newline="\n", ) sys.stderr = io.TextIOWrapper(sys.stderr.detach(), newline="\n", ) -- Lib/test/regrtest.py uses the following code which is not exactly correct (it creates a new buffered writer instead of reusing sys.stdout buffered writer): def replace_stdout(): """Set stdout encoder error handler to backslashreplace (as stderr error handler) to avoid UnicodeEncodeError when printing a traceback""" import atexit stdout = sys.stdout sys.stdout = open(stdout.fileno(), 'w', encoding=stdout.encoding, errors="backslashreplace", closefd=False, newline='\n') def restore_stdout(): sys.stdout.close() sys.stdout = stdout atexit.register(restore_stdout) Victor From victor.stinner at gmail.com Tue Jun 12 23:48:08 2012 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 12 Jun 2012 23:48:08 +0200 Subject: [Python-ideas] TextIOWrapper callable encoding parameter In-Reply-To: <1339425766.92977.YahooMailClassic@web161502.mail.bf1.yahoo.com> References: <1339425766.92977.YahooMailClassic@web161502.mail.bf1.yahoo.com> Message-ID: > 1. ?The most straight-forward way to handle this is to open > the file twice, first in binary mode or with latin1 encoding > and again in text mode after the encoding has been determined > This of course has a performance cost since the data is read > twice. ?Further, it can't be used if the data source is a > from a pipe, socket or other non-rewindable source. ?This > includes sys.stdin when it comes from a pipe. Some months ago, I proposed to automatically detect if a file contains a BOM and uses it to set the encoding. Various methods were proposed but there was no real consensus. One proposition was to use a codec (e.g. "bom") which uses the BOM if it is present, and so don't need to reread the file twice. For the pipe issue: it depends where the encoding specification is. If the encoding is written at the end of your "file" (stream), you have to store the whole stream content (few MB or maybe much more?) into memory. If it is in the first lines, you have to store these lines in a buffer. It's not easy to decide for the threshold. I don't like the codec approach because the codec is disconnected from the stream. For example, the codec doesn't know the current position in stream nor can read a few more bytes forward or backward. If you open the file in "append" mode, you are not writing at the beginning but at the end of the file. You may also seek at an arbitrary position before the first read... There are also some special cases. For example, when a text file is opened in write mode, the file is seekable and the file position is not zero, TextIOWrapper calls encoder.setstate(0) to not write the BOM in the middle of the file. (See also Lib/test/test_io.py for related tests.) > 2. ?Alternatively, with a little more expertise, one can rewrap > the open binary stream in a TextIOWrapper to avoid a second > OS file open. That's my favorite method because you have the full control on the stream. (I wrote tokenize.open). But yes, it does not work on non-seekable streams (e.g. pipes). > This too seems to read the data twice and of course the > seek(0) prevents this method also from being usable with > pipes, sockets and other non-seekable sources. Does it really matter? You usually need to read few bytes to get the encoding. > 9. In other non-read paths where encoding needs to be known, > ?raise an error if it is still None. Why not reading data until you the encoding is known instead? > I have modified a copy the _pyio module as described and > the changes required seemed unsurprising and relatively > few, though I am sure there are subtleties and other > considerations I am missing. ?Hence this post seeking > feedback... Can you post the modified somewhere so I can play with it? Victor From greg at krypto.org Tue Jun 12 23:49:35 2012 From: greg at krypto.org (Gregory P. Smith) Date: Tue, 12 Jun 2012 14:49:35 -0700 Subject: [Python-ideas] Add adaptive-load salt-mandatory hashing functions? In-Reply-To: <20120612233440.387d03b4@pitrou.net> References: <43D1A0EB-A850-4F67-9B6F-D852DE23B0F8@masklinn.net> <4FD4AC72.3070206@kozea.fr> <5E974769-E9F0-4F0D-BE45-CC2BD55E8902@masklinn.net> <20120612233440.387d03b4@pitrou.net> Message-ID: On Tue, Jun 12, 2012 at 2:34 PM, Antoine Pitrou wrote: > On Sun, 10 Jun 2012 10:56:46 -0700 > "Gregory P. Smith" wrote: > > I'd just stick it in hmac myself but getpass was also a good suggestion. > > Cross reference to it from the docs of all three as the real goal of > > adding pbkdf2 is to advertise it to users so that they might use it > rather > > than something more naive. > > > > hashlib itself should be kept pure as is for standard low level hash > > algorithms. It can't have a dependency on anything else. > > I don't really understand this requirement. Can you elaborate? > I wrote that quickly. I don't want a circular dependency or things that aren't well established standards in hashlib. I see hashlib as being for low level algorithms only (FIPS standards, etc) where fast implementations are available in most VM runtimes. hmac depends on hashlib therefore nothing in hashlib should ever depend on hmac. That doesn't prevent someone from deciding hmac shouldn't be a module of its own and moving it to live within hashlib some day but that would seem like needless API churn outside of a major language version change. -gps -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Wed Jun 13 00:13:50 2012 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 13 Jun 2012 00:13:50 +0200 Subject: [Python-ideas] TextIOWrapper callable encoding parameter In-Reply-To: References: <1339425766.92977.YahooMailClassic@web161502.mail.bf1.yahoo.com> Message-ID: 2012/6/11 Nick Coghlan : > Immediate thought: it seems like it would be easier to offer a way to inject > data back into a buffered IO object's internal buffer. BufferedReader has already an useful peek() method to read data without changing the position. http://docs.python.org/library/io.html#io.BufferedReader.peek It's not perfect ("The number of bytes returned may be less or more than requested.") but better than nothing. Victor From stephen at xemacs.org Wed Jun 13 06:58:28 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 13 Jun 2012 13:58:28 +0900 Subject: [Python-ideas] changing sys.stdout encoding In-Reply-To: References: <87fwa7a945.fsf@uwakimon.sk.tsukuba.ac.jp> <1339102104.17621.YahooMailClassic@web161503.mail.bf1.yahoo.com> Message-ID: <87k3zb7qq3.fsf@uwakimon.sk.tsukuba.ac.jp> Oscar Benjamin writes: > I also think I missed something in this thread. At the beginning of the > original thread it seemed that everyone was agreed that > > writer = codecs.getwriter(desired_encoding) > sys.stdout = writer(sys.stdout.buffer) > > was a reasonable solution (with the caveat that it should happen before any > output is written). Is there some reason why this is not a good > approach? It's undocumented and unobvious, but it's needed for standard stream filtering in some environments -- where a lot of coding is done by people who otherwise never need to understand streams at anything but a superficial level -- and the analogous case of a newly opened file, pipe, or socket is documented and obvious, and usable by novices. It's damn shame that we can't say the same about the stdin, stdout, and stderr streams (even if I too have been at pains to explain why that's hard to fix). From stephen at xemacs.org Wed Jun 13 07:09:04 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 13 Jun 2012 14:09:04 +0900 Subject: [Python-ideas] stdlib crowdsourcing In-Reply-To: References: Message-ID: <87ipev7q8f.fsf@uwakimon.sk.tsukuba.ac.jp> > I write because I see bad things that can be better, and I am still > open to discuss if it is real or not. There's nothing wrong with that in its place. But python-ideas is a place for ideas where the poster is pretty sure it's real *and* has a concrete proposal (the "idea" in python-ideas) to make it better *and* has the will to follow up themselves if nobody else grabs the ball. There's some room for blue-sky ideas (lacking concrete proposals or personal commitment), but if all you ever offer is blue-sky ideas that get no uptake, you're just wasting time, yours as well as everybody else's. From guido at python.org Wed Jun 13 07:21:45 2012 From: guido at python.org (Guido van Rossum) Date: Tue, 12 Jun 2012 22:21:45 -0700 Subject: [Python-ideas] changing sys.stdout encoding In-Reply-To: <87k3zb7qq3.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87fwa7a945.fsf@uwakimon.sk.tsukuba.ac.jp> <1339102104.17621.YahooMailClassic@web161503.mail.bf1.yahoo.com> <87k3zb7qq3.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Tue, Jun 12, 2012 at 9:58 PM, Stephen J. Turnbull wrote: > Oscar Benjamin writes: > > ?> I also think I missed something in this thread. At the beginning of the > ?> original thread it seemed that everyone was agreed that > ?> > ?> ? writer = codecs.getwriter(desired_encoding) > ?> ? sys.stdout = writer(sys.stdout.buffer) > ?> > ?> was a reasonable solution (with the caveat that it should happen before any > ?> output is written). Is there some reason why this is not a good > ?> approach? > > It's undocumented and unobvious, but it's needed for standard stream > filtering in some environments -- where a lot of coding is done by > people who otherwise never need to understand streams at anything but > a superficial level -- and the analogous case of a newly opened file, > pipe, or socket is documented and obvious, and usable by novices. > > It's damn shame that we can't say the same about the stdin, stdout, > and stderr streams (even if I too have been at pains to explain why > that's hard to fix). I'm probably missing something, but in all my naivete I have what feels like a simple solution, and I can't seem to see what's wrong with it. In C there used to be a function to set the buffer size on an open stream that could only be called when the stream hadn't been used yet. ISTM the OP's use case would be covered by a similar function on an open TextIOWrapper to set the encoding that can only be used when it hasn't been used to write (or read) anything yet? When called under any other circumstances it should raise an error. The TextIOWrapper should maintain a "used" flag so that it can raise this exception reliably. This ought to work for stdin and stdout when used at the start of the program, assuming nothing is written by code run before main starts. (This should normally be fine, otherwise you couldn't use a Python program as a filter at all.) It won't work for stderr if connected to a tty-ish device (since the version stuff is written there) but that should be okay, and it should still be okay with stderr if it's not a tty, since then it starts silent. (But I don't think the use case is very strong for stderr anyway.) I'm not sure about a name, but it might well be called set_encoding(). The error message when misused should clarify to people who misunderstand the name that it can only be called when the stream hasn't been used yet; I don't think it's necessary to encode that information in the name. (C's setbuf() wasn't called set_buffer_on_virgin_stream() either. :-) I don't care about the integrity of the underlying binary stream. It's a binary stream, you can write whatever bytes you want to it. But if a TextIOWrapper is used properly, it won't write a mixture of encodings to the underlying binary stream, since you can only set the encoding before reading/writing a single byte. (And the TextIOWrapper is careful not to use the binary stream before the first actual read() or write() call -- it just tries to calls tell(), if it's seekable, which should be safe.) -- --Guido van Rossum (python.org/~guido) From ncoghlan at gmail.com Wed Jun 13 07:42:24 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 13 Jun 2012 15:42:24 +1000 Subject: [Python-ideas] changing sys.stdout encoding In-Reply-To: References: <87fwa7a945.fsf@uwakimon.sk.tsukuba.ac.jp> <1339102104.17621.YahooMailClassic@web161503.mail.bf1.yahoo.com> <87k3zb7qq3.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Wed, Jun 13, 2012 at 3:21 PM, Guido van Rossum wrote: > On Tue, Jun 12, 2012 at 9:58 PM, Stephen J. Turnbull wrote: >> Oscar Benjamin writes: >> >> ?> I also think I missed something in this thread. At the beginning of the >> ?> original thread it seemed that everyone was agreed that >> ?> >> ?> ? writer = codecs.getwriter(desired_encoding) >> ?> ? sys.stdout = writer(sys.stdout.buffer) >> ?> >> ?> was a reasonable solution (with the caveat that it should happen before any >> ?> output is written). Is there some reason why this is not a good >> ?> approach? >> >> It's undocumented and unobvious, but it's needed for standard stream >> filtering in some environments -- where a lot of coding is done by >> people who otherwise never need to understand streams at anything but >> a superficial level -- and the analogous case of a newly opened file, >> pipe, or socket is documented and obvious, and usable by novices. >> >> It's damn shame that we can't say the same about the stdin, stdout, >> and stderr streams (even if I too have been at pains to explain why >> that's hard to fix). > > I'm probably missing something, but in all my naivete I have what > feels like a simple solution, and I can't seem to see what's wrong > with it. I think you're right, and such a method in combination with stream.buffer.peek() should actually handle a lot of encoding detection cases, too. The alternative approaches (calling TextIOWrapper on stream.detach(), or open on stream.fileno()) either break any references to the old stream or else create two independent IO stacks on top of a single underlying file descriptor, which may create some odd behaviour. Being able to set the encoding on a previously unused stream would also interact better with the existing subprocess PIPE API. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From solipsis at pitrou.net Wed Jun 13 10:25:21 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 13 Jun 2012 10:25:21 +0200 Subject: [Python-ideas] TextIOWrapper callable encoding parameter References: <1339425766.92977.YahooMailClassic@web161502.mail.bf1.yahoo.com> Message-ID: <20120613102521.6f67a930@pitrou.net> On Tue, 12 Jun 2012 01:10:47 +1000 Nick Coghlan wrote: > Immediate thought: it seems like it would be easier to offer a way to > inject data back into a buffered IO object's internal buffer. Except that it would be limited by buffer size, which is not necessarily something you have control over. Regards Antoine. From stephen at xemacs.org Wed Jun 13 10:35:59 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 13 Jun 2012 17:35:59 +0900 Subject: [Python-ideas] changing sys.stdout encoding In-Reply-To: References: <87fwa7a945.fsf@uwakimon.sk.tsukuba.ac.jp> <1339102104.17621.YahooMailClassic@web161503.mail.bf1.yahoo.com> <87k3zb7qq3.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87bokn7gnk.fsf@uwakimon.sk.tsukuba.ac.jp> Guido van Rossum writes: > I'm not sure about a name, but it might well be called set_encoding(). I would still prefer "initialize_encoding" or something like that, but the main thing I was worried about was a "consenting adults" function that shouldn't be called after I/O, but *could* be. From rurpy at yahoo.com Wed Jun 13 17:46:01 2012 From: rurpy at yahoo.com (Rurpy) Date: Wed, 13 Jun 2012 08:46:01 -0700 (PDT) Subject: [Python-ideas] TextIOWrapper callable encoding parameter Message-ID: <1339602361.77455.YahooMailClassic@web161503.mail.bf1.yahoo.com> On 06/11/2012 10:24 AM, Stephen J. Turnbull wrote: > > Nick Coghlan writes: > > > > > Immediate thought: it seems like it would be easier to offer a way to > > > inject data back into a buffered IO object's internal buffer. > > > > ungetch()? What would be the TextIOWrapper api for that? > > If you're only interested in the top of the file (see below), I would > > suggest allowing only one bufferfull, and then simply rewinding the > > buffer pointer once you're done. This is one strategy used by Emacsen > > for encoding detection (for the reason pointed out by Rurpy: not all > > streams are rewindable). > > > > But is that really "easier"? It might be more general, but you still > > need to reinitialize the encoding (ie, from the trivial "binary" to > > whatever is detected), with all the hair that comes with that. I don't think there is any hair involved. In at least the _pyio version of TextIOWrapper, initializing the encoding (in the read path) consists of calling self._get_decoder(). One needs to move the few places where that is called now to nearby places that are after the raw buffer has been read but before it is decoded. There may be need for some consideration given to raising errors at the old locations in the case the callable encoding hook is not being used (to maintain complete backwards compatibility; not sure that is necessary), but I wouldn't call that hairy. Of course there may be other factors I am missing... > > > > Executive summary: > > > > ================== > > > > > > > > There is no good way to read a text file when the > > > > encoding has to be determined by reading the start > > > > of the file. A long-winded version of that follows. > > > > Scroll down the the "Proposal" section to skip it. > > > > This may be insufficiently general. Specifically, both Emacsen and vi > > allow specification of editor configuration variables at the bottom of > > the file as well as the top. I don't know whether vi allows encoding > > specs at the bottom, but Emacsen do (but only for files). > > > > I wouldn't recommend paying much attention to what Emacsen actually > > *do* when initializing a stream (it's, uh, "baroque"). Looking only at the beginning of an input stream is general enough for a large class of problems including tokenizing python source code. From rurpy at yahoo.com Wed Jun 13 17:56:02 2012 From: rurpy at yahoo.com (Rurpy) Date: Wed, 13 Jun 2012 08:56:02 -0700 (PDT) Subject: [Python-ideas] TextIOWrapper callable encoding parameter Message-ID: <1339602962.62586.YahooMailClassic@web161504.mail.bf1.yahoo.com> On 06/12/2012 03:48 PM, Victor Stinner wrote: >> >> 1. The most straight-forward way to handle this is to open >> >> the file twice, first in binary mode or with latin1 encoding >> >> and again in text mode after the encoding has been determined >> >> This of course has a performance cost since the data is read >> >> twice. Further, it can't be used if the data source is a >> >> from a pipe, socket or other non-rewindable source. This >> >> includes sys.stdin when it comes from a pipe. > > > > Some months ago, I proposed to automatically detect if a file contains > > a BOM and uses it to set the encoding. Various methods were proposed > > but there was no real consensus. One proposition was to use a codec > > (e.g. "bom") which uses the BOM if it is present, and so don't need to > > reread the file twice. > > > > For the pipe issue: it depends where the encoding specification is. If > > the encoding is written at the end of your "file" (stream), you have > > to store the whole stream content (few MB or maybe much more?) into > > memory. If it is in the first lines, you have to store these lines in > > a buffer. It's not easy to decide for the threshold. That's always a problem. When trying to determine a character encoding one may have to read the entire file because it could consist of all ascii characters except the very last one. (And of course there is no guarantee one can determine *the* encoding at all). Nevertheless, I think thee is a very large class of problems that can be usefully handled by looking at a limited amount of data at the start of a file (or stream). The Python coding declaration in one example (obviously picked hoping it would have some resonance here.) The buffer object used by TextIOWrapper already reads the start of the stream and buffers the first few lines, so why not take advantage of that rather than repeating the work? One of the things I am not sure about is if there are cases when the buffered read returns, say, only one line, as might happen with tty input. > > I don't like the codec approach because the codec is disconnected from > > the stream. For example, the codec doesn't know the current position > > in stream nor can read a few more bytes forward or backward. If you > > open the file in "append" mode, you are not writing at the beginning > > but at the end of the file. You may also seek at an arbitrary position > > before the first read... > > > > There are also some special cases. For example, when a text file is > > opened in write mode, the file is seekable and the file position is > > not zero, TextIOWrapper calls encoder.setstate(0) to not write the BOM > > in the middle of the file. (See also Lib/test/test_io.py for related > > tests.) A callable encoding parameter would not be terribly useful with a file opened in write or append mode, but it's behavior would be predictable: a write would result in an error because the encoding hadn't been set. A read in the middle' of the file would work the same way as at the beginning. This is probably not very useful, but is consistent. Of course one could choose to implement a callable encoding parameter such that some or all of these paths are detected at open and declared illegal then. One could prohibit the encoding call after a seek though I'm not sure there is any point to that. >> >> 2. Alternatively, with a little more expertise, one can rewrap >> >> the open binary stream in a TextIOWrapper to avoid a second >> >> OS file open. > > > > That's my favorite method because you have the full control on the > > stream. (I wrote tokenize.open). But yes, it does not work on > > non-seekable streams (e.g. pipes). > > >> >> This too seems to read the data twice and of course the >> >> seek(0) prevents this method also from being usable with >> >> pipes, sockets and other non-seekable sources. > > > > Does it really matter? You usually need to read few bytes to get the encoding. It certainly matters if input is from a pipe. Quoting from my other message: $ cat test.utf8 | python3 stdin.py reopen1 got exception: [Errno 29] Illegal seek The whole point of my suggestion was that you've already read those few bytes -- but by the time you have access to them, you've already been forced to choose an encoding. My suggestion simply defers that encoding setting until after you've had a chance to look at the bytes. >> >> 9. In other non-read paths where encoding needs to be known, >> >> raise an error if it is still None. > > > > Why not reading data until you the encoding is known instead? That's how I do it now -- open file in binary mode and read it, buffer it, determine encoding, and henceforth decode the bytes data "by hand" to text. But that's an awful lot like what TextIOWrpper does, yes? Why can't I use TextIOWrapper instead of rewriting it myself? (Yes, I know I can reopen or rewrap the binary stream but as I said, that loses the one-pass processing which breaks pipes.) >> >> I have modified a copy the _pyio module as described and >> >> the changes required seemed unsurprising and relatively >> >> few, though I am sure there are subtleties and other >> >> considerations I am missing. Hence this post seeking >> >> feedback... > > > > Can you post the modified somewhere so I can play with it? I put a diff against the Python-3.2.3 _pyio.py file at: http://pastebin.com/kZHmcBdm Much of the diff is just moving existing stuff around. The note at the bottom says: | It is in no way supposed to be a serious patch. | | It was the minimal changes I could make in order to | see if my suggestion to allow a callable encoding parameter | in TextIOWrapper was feasible, and allow some timing tests. | | I am quite sure it will not pass the Python's tests. | | It does I hope give some idea of the nature and scale of the | code changes needed to implement a callable encodign parameter. From barry at python.org Wed Jun 13 22:38:19 2012 From: barry at python.org (Barry Warsaw) Date: Wed, 13 Jun 2012 16:38:19 -0400 Subject: [Python-ideas] Add adaptive-load salt-mandatory hashing functions? References: <43D1A0EB-A850-4F67-9B6F-D852DE23B0F8@masklinn.net> Message-ID: <20120613163819.52645944@resist.wooz.org> I'd love to have a PBKDF2 implementation in the stdlib. My flufl.password module has an implementation donated by security expert Bob Fleck. Any insecure implementation bugs are solely blamed on me though. ;) http://bazaar.launchpad.net/~barry/flufl.password/trunk/view/head:/flufl/password/schemes.py#L171 The API is a little odd because it fits into the larger API for flufl.password, but if it's useful, I'd happily cleanup and donate the code for the stdlib. OTOH, I'd be just as happy (maybe more) to get rid of it in favor of a stdlib implementation. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From lists at cheimes.de Thu Jun 14 00:33:51 2012 From: lists at cheimes.de (Christian Heimes) Date: Thu, 14 Jun 2012 00:33:51 +0200 Subject: [Python-ideas] Add adaptive-load salt-mandatory hashing functions? In-Reply-To: <20120613163819.52645944@resist.wooz.org> References: <43D1A0EB-A850-4F67-9B6F-D852DE23B0F8@masklinn.net> <20120613163819.52645944@resist.wooz.org> Message-ID: Am 13.06.2012 22:38, schrieb Barry Warsaw: > I'd love to have a PBKDF2 implementation in the stdlib. My flufl.password > module has an implementation donated by security expert Bob Fleck. Any > insecure implementation bugs are solely blamed on me though. ;) > > http://bazaar.launchpad.net/~barry/flufl.password/trunk/view/head:/flufl/password/schemes.py#L171 > > The API is a little odd because it fits into the larger API for > flufl.password, but if it's useful, I'd happily cleanup and donate the code > for the stdlib. OTOH, I'd be just as happy (maybe more) to get rid of it in > favor of a stdlib implementation. At first glance your implementation is vulnerable to side channel attacks because you aren't using a constant time equality function. Also you are using the least secure variant of PBKDF2 (SHA-1 instead of SHA-256 or SHA-512). At least you are using os.urandom() as source for the salt, which is usually fine. Passlib supports the LDAP variants, too. [1] Outside of LDAP the established notation is $pbkdf2-digest$rounds$salt$checksum. Christian [1] http://packages.python.org/passlib/lib/passlib.hash.ldap_pbkdf2_digest.html From gatesda at gmail.com Fri Jun 15 10:49:18 2012 From: gatesda at gmail.com (David Gates) Date: Fri, 15 Jun 2012 02:49:18 -0600 Subject: [Python-ideas] Multi-line comment blocks. Message-ID: Multi-line strings as comments don't nest, don't play well with docstrings, and are counter-intuitive when there's special language support for single-line comments. Python should only have one obvious way to do things, and Python has two ways to comment, only one of which is obvious. My suggestion is to add language support for comment blocks, using Python's existing comment delimiter: # Single-line comment #: Multi-line comment #: Nested multi-line comments work perfectly Of course they do, they're just nested blocks def foo(): """Docstrings work perfectly. Why wouldn't they?""" pass # No need for an end-delimiter like """ or */ -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Fri Jun 15 11:50:40 2012 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 15 Jun 2012 10:50:40 +0100 Subject: [Python-ideas] Multi-line comment blocks. In-Reply-To: References: Message-ID: On 6/15/12 9:49 AM, David Gates wrote: > Multi-line strings as comments don't nest, don't play well with docstrings, and > are counter-intuitive when there's special language support for single-line > comments. Python should only have one obvious way to do things, and Python has > two ways to comment, only one of which is obvious. Multi-line string literals aren't comments. They are multi-line string literals. Unlike a comment, which does not show up in the compiled bytecode, the Python interpreter actually does something with those string literals. Sometimes people abuse them as ways to poorly emulate block comments, but this is an abuse, not a feature of the language. > My suggestion is to add > language support for comment blocks, using Python's existing comment delimiter: > > # Single-line comment > #: > Multi-line comment > #: > Nested multi-line comments work perfectly > Of course they do, they're just nested blocks > def foo(): > """Docstrings work perfectly. Why wouldn't they?""" > pass > # No need for an end-delimiter like """ or */ The main problem is that #: currently has a meaning as a line comment. This could break existing code. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From solipsis at pitrou.net Fri Jun 15 12:33:54 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 15 Jun 2012 12:33:54 +0200 Subject: [Python-ideas] Multi-line comment blocks. References: Message-ID: <20120615123354.6e8622af@pitrou.net> On Fri, 15 Jun 2012 02:49:18 -0600 David Gates wrote: > Multi-line strings as comments don't nest, don't play well with docstrings, > and are counter-intuitive when there's special language support for > single-line comments. Python should only have one obvious way to do things, > and Python has two ways to comment, only one of which is obvious. My > suggestion is to add language support for comment blocks, using Python's > existing comment delimiter: Any decent text editor has a way to comment and uncomment whole blocks of text (in Kate, it is Ctrl+D IIRC). Regards Antoine. From sven at marnach.net Fri Jun 15 12:49:47 2012 From: sven at marnach.net (Sven Marnach) Date: Fri, 15 Jun 2012 11:49:47 +0100 Subject: [Python-ideas] Multi-line comment blocks. In-Reply-To: References: Message-ID: <20120615104947.GM4256@bagheera> Robert Kern schrieb am Fri, 15. Jun 2012, um 10:50:40 +0100: > Multi-line string literals aren't comments. They are multi-line > string literals. Unlike a comment, which does not show up in the > compiled bytecode, the Python interpreter actually does something > with those string literals. Sometimes people abuse them as ways to > poorly emulate block comments, but this is an abuse, not a feature > of the language. Multi-line string literals do not generate code in CPython, and their use as comments has BDFL approval: https://twitter.com/gvanrossum/status/112670605505077248 (I don't use them as comments either, and rather rely on my editor for commenting blocks.) Cheers, Sven From gatesda at gmail.com Fri Jun 15 13:47:57 2012 From: gatesda at gmail.com (David Gates) Date: Fri, 15 Jun 2012 05:47:57 -0600 Subject: [Python-ideas] Python-ideas Digest, Vol 67, Issue 51 In-Reply-To: References: Message-ID: @Robert Kern: "Multi-line string literals aren't comments. They are multi-line string literals. Unlike a comment, which does not show up in the compiled bytecode, the Python interpreter actually does something with those string literals." They have Guido's stamp of approval, and apparently the interpreter ignores them: https://twitter.com/gvanrossum/status/112670605505077248 They feel like an ugly hack to me too, though. @Robert Kern: "The main problem is that #: currently has a meaning as a line comment. This could break existing code." It could, but the only case I can see is when the comment isn't following indentation convention: #: Valid either way; next line's not indented, #: so it's not counted as part of the block. print('a') # Causes an IndentationError in existing code. #: print('b') def foo(): #: This one would break. print('c') On Fri, Jun 15, 2012 at 4:00 AM, wrote: > Send Python-ideas mailing list submissions to > python-ideas at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://mail.python.org/mailman/listinfo/python-ideas > or, via email, send a message with subject or body 'help' to > python-ideas-request at python.org > > You can reach the person managing the list at > python-ideas-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Python-ideas digest..." > > > Today's Topics: > > 1. Multi-line comment blocks. (David Gates) > 2. Re: Multi-line comment blocks. (Robert Kern) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 15 Jun 2012 02:49:18 -0600 > From: David Gates > To: python-ideas at python.org > Subject: [Python-ideas] Multi-line comment blocks. > Message-ID: > > > Content-Type: text/plain; charset="iso-8859-1" > > Multi-line strings as comments don't nest, don't play well with docstrings, > and are counter-intuitive when there's special language support for > single-line comments. Python should only have one obvious way to do things, > and Python has two ways to comment, only one of which is obvious. My > suggestion is to add language support for comment blocks, using Python's > existing comment delimiter: > > # Single-line comment > #: > Multi-line comment > #: > Nested multi-line comments work perfectly > Of course they do, they're just nested blocks > def foo(): > """Docstrings work perfectly. Why wouldn't they?""" > pass > # No need for an end-delimiter like """ or */ > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://mail.python.org/pipermail/python-ideas/attachments/20120615/84005c9f/attachment-0001.html > > > > ------------------------------ > > Message: 2 > Date: Fri, 15 Jun 2012 10:50:40 +0100 > From: Robert Kern > To: python-ideas at python.org > Subject: Re: [Python-ideas] Multi-line comment blocks. > Message-ID: > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > On 6/15/12 9:49 AM, David Gates wrote: > > Multi-line strings as comments don't nest, don't play well with > docstrings, and > > are counter-intuitive when there's special language support for > single-line > > comments. Python should only have one obvious way to do things, and > Python has > > two ways to comment, only one of which is obvious. > > Multi-line string literals aren't comments. They are multi-line string > literals. > Unlike a comment, which does not show up in the compiled bytecode, the > Python > interpreter actually does something with those string literals. Sometimes > people > abuse them as ways to poorly emulate block comments, but this is an abuse, > not a > feature of the language. > > > My suggestion is to add > > language support for comment blocks, using Python's existing comment > delimiter: > > > > # Single-line comment > > #: > > Multi-line comment > > #: > > Nested multi-line comments work perfectly > > Of course they do, they're just nested blocks > > def foo(): > > """Docstrings work perfectly. Why wouldn't they?""" > > pass > > # No need for an end-delimiter like """ or */ > > The main problem is that #: currently has a meaning as a line comment. This > could break existing code. > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma > that is made terrible by our own mad attempt to interpret it as though it > had > an underlying truth." > -- Umberto Eco > > > > ------------------------------ > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > > End of Python-ideas Digest, Vol 67, Issue 51 > ******************************************** > -------------- next part -------------- An HTML attachment was scrubbed... URL: From taleinat at gmail.com Fri Jun 15 14:41:10 2012 From: taleinat at gmail.com (Tal Einat) Date: Fri, 15 Jun 2012 15:41:10 +0300 Subject: [Python-ideas] Weak-referencing/weak-proxying of (bound) methods In-Reply-To: <20120610231629.GA1792@chopin.edu.pl> References: <20120610231629.GA1792@chopin.edu.pl> Message-ID: On Mon, Jun 11, 2012 at 2:16 AM, Jan Kaliszewski wrote: > Hello, > > Today, I encountered a surprising bug in my code which creates > some weakref.proxies to instance methods... The actual Python > behaviour related to the issue can be ilustrated with the > following example: > > >>> import weakref > >>> class A: > ... def method(self): print(self) > ... > >>> A.method > > >>> a = A() > >>> a.method > > > >>> r = weakref.ref(a.method) # creating a weak reference > >>> r # ...but it appears to be dead > > >>> w = weakref.proxy(a.method) # the same with a weak proxy > >>> w > > >>> w() > Traceback (most recent call last): > File "", line 1, in > ReferenceError: weakly-referenced object no longer exists > > This behaviour is perfectly correct -- but still surprising, > especially for people who know little about method creation > machinery, descriptors etc. > > I think it would be nice to make this 'trap' less painful -- > for example, by doing one or both of the following: > > 1. Describe and explain this behaviour in the weakref > module documentation. > > 2. Provide (in functools?) a type-and-decorator that do the > same what func_descr_get() does (transforms a function into > a method) *plus* caches the created method (e.g. at the > instance object). > > A prototype implementation: > > class InstanceCachedMethod(object): > > def __init__(self, func): > self.func = func > (self.instance_attr_name > ) = '__{0}_method_ref'.format(func.__name__) > > def __get__(self, instance, owner): > if instance is None: > return self.func > try: > return getattr(instance, self.instance_attr_name) > except AttributeError: > method = types.MethodType(self.func, instance) > setattr(instance, self.instance_attr_name, method) > return method > > A simplified version that reuses the func.__name__ (works well > as long as func.__name__ is the actual instance attribute name...): > > class InstanceCachedMethod(object): > > def __init__(self, func): > self.func = func > > def __get__(self, instance, owner): > if instance is None: > return self.func > method = types.MethodType(self.func, instance) > setattr(instance, self.func.__name__, method) > return method > > Both versions work well with weakref.proxy()/ref() objects: > > >>> class B: > ... @InstanceCachedMethod > ... def method(self): print(self) > ... > >>> B.method > > >>> b = B() > >>> b.method > > > >>> r = weakref.ref(b.method) > >>> r > > >>> w = weakref.proxy(b.method) > >>> w > > >>> w() > <__main__.B object at 0xb7206ccc> > > What do you think about it? > I was bitten by this issue a while ago as well. It made working with weakref proxies much more involved than I expected it would be. Wouldn't it be better to approach the issue from the opposite end, and improve/wrap/replace weakref.proxy with something that can handle bound methods? - Tal -------------- next part -------------- An HTML attachment was scrubbed... URL: From shibturn at gmail.com Fri Jun 15 15:01:23 2012 From: shibturn at gmail.com (shibturn) Date: Fri, 15 Jun 2012 14:01:23 +0100 Subject: [Python-ideas] Weak-referencing/weak-proxying of (bound) methods In-Reply-To: References: <20120610231629.GA1792@chopin.edu.pl> Message-ID: On 15/06/2012 1:41pm, Tal Einat wrote: > I was bitten by this issue a while ago as well. It made working with > weakref proxies much more involved than I expected it would be. > > Wouldn't it be better to approach the issue from the opposite end, and > improve/wrap/replace weakref.proxy with something that can handle bound > methods? Maybe just add something like the following to weakref: def weakboundmethod(m): return m.__func__.__get__(weakref.proxy(m.__self__), type(m.__self__)) Cheers, Richard From robert.kern at gmail.com Fri Jun 15 16:04:30 2012 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 15 Jun 2012 15:04:30 +0100 Subject: [Python-ideas] Multi-line comment blocks. In-Reply-To: <20120615104947.GM4256@bagheera> References: <20120615104947.GM4256@bagheera> Message-ID: On 6/15/12 11:49 AM, Sven Marnach wrote: > Robert Kern schrieb am Fri, 15. Jun 2012, um 10:50:40 +0100: >> Multi-line string literals aren't comments. They are multi-line >> string literals. Unlike a comment, which does not show up in the >> compiled bytecode, the Python interpreter actually does something >> with those string literals. Sometimes people abuse them as ways to >> poorly emulate block comments, but this is an abuse, not a feature >> of the language. > > Multi-line string literals do not generate code in CPython, and their > use as comments has BDFL approval: > > https://twitter.com/gvanrossum/status/112670605505077248 Well fancy that. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From gatesda at gmail.com Fri Jun 15 18:23:39 2012 From: gatesda at gmail.com (David Gates) Date: Fri, 15 Jun 2012 10:23:39 -0600 Subject: [Python-ideas] Multi-line comment blocks. In-Reply-To: References: <20120615104947.GM4256@bagheera> Message-ID: I agree that using multi-line strings as literals comes across as an ugly hack, even if it is BDFL-approved. Your other point is valid, though as far as I can tell it's only an issue when the comment is indented less than it ought to be (and starts with "#:", of course): #: Valid either way. The next line has the #: same level of indentation, so it's not #: counted as part of the block. print('a') # Causes an IndentationError in existing code. #: print('b') def foo(): #: This one would break. print('c') On Fri, Jun 15, 2012 at 8:04 AM, Robert Kern wrote: > On 6/15/12 11:49 AM, Sven Marnach wrote: > >> Robert Kern schrieb am Fri, 15. Jun 2012, um 10:50:40 +0100: >> >>> Multi-line string literals aren't comments. They are multi-line >>> string literals. Unlike a comment, which does not show up in the >>> compiled bytecode, the Python interpreter actually does something >>> with those string literals. Sometimes people abuse them as ways to >>> poorly emulate block comments, but this is an abuse, not a feature >>> of the language. >>> >> >> Multi-line string literals do not generate code in CPython, and their >> use as comments has BDFL approval: >> >> https://twitter.com/gvanrossum/status/112670605505077248 >> > > Well fancy that. > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma > that is made terrible by our own mad attempt to interpret it as though it > had > an underlying truth." > -- Umberto Eco > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Fri Jun 15 18:43:35 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 15 Jun 2012 09:43:35 -0700 Subject: [Python-ideas] Multi-line comment blocks. In-Reply-To: References: <20120615104947.GM4256@bagheera> Message-ID: Let's not try to design a syntax for multi-line comments. There are already enough ways to emulate them. Designing a new syntax based on # plus some special character is doomed for backwards compatibility (never mind the clever tricks proposed). --Guido On Fri, Jun 15, 2012 at 9:23 AM, David Gates wrote: > I agree that using multi-line strings as literals comes across as an ugly > hack, even if it is BDFL-approved. > > Your other point is valid, though as far as I can tell it's only an issue > when the comment is indented less than it ought to be (and starts with "#:", > of course): > > #: Valid either way. The next line has the > #: same level of indentation, so it's not > #: counted as part of the block. > print('a') > > # Causes an IndentationError in existing code. > #: > ? ? print('b') > > def foo(): > #: This one would break. > ? ? print('c') > > On Fri, Jun 15, 2012 at 8:04 AM, Robert Kern wrote: >> >> On 6/15/12 11:49 AM, Sven Marnach wrote: >>> >>> Robert Kern schrieb am Fri, 15. Jun 2012, um 10:50:40 +0100: >>>> >>>> Multi-line string literals aren't comments. They are multi-line >>>> string literals. Unlike a comment, which does not show up in the >>>> compiled bytecode, the Python interpreter actually does something >>>> with those string literals. Sometimes people abuse them as ways to >>>> poorly emulate block comments, but this is an abuse, not a feature >>>> of the language. >>> >>> >>> Multi-line string literals do not generate code in CPython, and their >>> use as comments has BDFL approval: >>> >>> ? ? https://twitter.com/gvanrossum/status/112670605505077248 >> >> >> Well fancy that. >> >> -- >> Robert Kern >> >> "I have come to believe that the whole world is an enigma, a harmless >> enigma >> ?that is made terrible by our own mad attempt to interpret it as though it >> had >> ?an underlying truth." >> ?-- Umberto Eco >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- --Guido van Rossum (python.org/~guido) From elic at astllc.org Fri Jun 15 21:07:00 2012 From: elic at astllc.org (Eli Collins) Date: Fri, 15 Jun 2012 19:07:00 +0000 (UTC) Subject: [Python-ideas] =?utf-8?q?Add_adaptive-load_salt-mandatory_hashing?= =?utf-8?q?=09functions=3F?= References: <43D1A0EB-A850-4F67-9B6F-D852DE23B0F8@masklinn.net> <4FD4AC72.3070206@kozea.fr> <5E974769-E9F0-4F0D-BE45-CC2BD55E8902@masklinn.net> <4FD4E1A1.6030409@kozea.fr> <8C7C9869-CB87-4E98-AD4E-AACD83375676@masklinn.net> <4FD4E66B.7080802@kozea.fr> <25BE8DE9-B5DF-48B9-B4DF-E6C82033C3E1@masklinn.net> Message-ID: Christian Heimes writes: > Am 11.06.2012 22:21, schrieb Guido van Rossum: > > Do you really think that including some API in the stdlib is going to > > make a difference in education? And what would we do if in 2 years > > time the stdlib's "basic functionality" were somehow compromised (not > > due to a bug in Python's implementation but simply through some > > advance in the crypto world) -- how would we get everyone who relied > > on the stdlib to switch to a different algorithm? I really think that > > the right approach here is to get *everyone* who needs this to use a > > 3rd party library. Diversity is very good here! > > +1 > > I'm against adding just the password hashing algorithms. Developers can > easily screw up right algorithm with a erroneous approach. It's the > beauty of passlib: The framework hides all the complex and > easy-to-get-wrong stuff behind a minimal API. > > Christian > I know I'm a little late to this thread, but as the primary Passlib author, I wanted to throw in my two cents. I wholeheartedly agree with the idea of not having a high-level password hashing library in stdlib. I'd be honored and happy to help in extracting a subset of passlib for inclusion in the standard library. However, for all the reasons GvR pointed out, I'm scared at the thought of how slowly end deployments would get needed security updates (for one thing, I update the adaptive cost of the hashes in passlib about once a year just as a matter of course). I'm reminded of how the Debian project has had to create a "security" repository to supplement the "stable" repository, just so the slow-moving "stable" release gets timely security updates. All that said, I wouldn't mind seeing a pbkdf2() primitive added to stdlib, along the lines of M2Crypto's pbkdf2 function [1]. I agree such a function might mislead developers to roll their own password hashing routines, but a word of warning and redirection in the documentation might help with that. The reason I see a need for such a function is that all existing password hashing libraries (passlib, cryptacular, flufl.password, django.contrib.auth.hashers, etc) have had to roll their own pure-python pbkdf2 implementations, to varying degrees of speed. And speed is paramount for pbkdf2 usage, since security depends on squeezing as many rounds / second out of the implementation as possible. Having a single C-accelerated primitive would be great for all of the above libraries, and all the other uses pbkdf2 has. Furthermore, it wouldn't need frequent security updates, since the hash storage format, default cost, default digest, etc, would all be handled by the higher-level libraries. Not that I'm advocating such a thing is *needed*, but that's what I'd love to see, were anything to be added in this direction. Hope all that helps in your decision making. Thanks, Eli [1] http://www.heikkitoivonen.net/m2crypto/api/M2Crypto.EVP-module.html#pbkdf2 - Eli Collins From steve at pearwood.info Sat Jun 16 00:12:51 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 16 Jun 2012 08:12:51 +1000 Subject: [Python-ideas] Multi-line comment blocks. In-Reply-To: References: Message-ID: <4FDBB363.5080603@pearwood.info> David Gates wrote: > Multi-line strings as comments don't nest, don't play well with docstrings, > and are counter-intuitive when there's special language support for > single-line comments. Python should only have one obvious way to do things, That's not what the Zen says. The zen says: There should be one-- and preferably only one --obvious way to do it. which is a positive statement that there should be an obvious way to solve problems, NOT a negative statement that there shouldn't be non-obvious ways. > and Python has two ways to comment, only one of which is obvious. My > suggestion is to add language support for comment blocks, using Python's > existing comment delimiter: There is already support for nested multi-line comments: the humble # symbol can be nested arbitrarily deep. All you need is a modern editor that understands Python syntax, and with a single command you can comment or uncomment a block: # This is a commented line. # def fun(a, b, c): # """Docstrings are fine when commented""" # pass # # This is a nested comment. # And no need for an end-delimiter either. If your editor is old or too basic, you can do it by hand, which is a pain, but doable. Python doesn't need dedicated syntax to make up for the limitations of your editor. Don't complicate the language for the sake of those poor fools stuck using Notepad. -- Steven From gatesda at gmail.com Sat Jun 16 00:47:12 2012 From: gatesda at gmail.com (David Gates) Date: Fri, 15 Jun 2012 16:47:12 -0600 Subject: [Python-ideas] Multi-line comment blocks. In-Reply-To: <4FDBB363.5080603@pearwood.info> References: <4FDBB363.5080603@pearwood.info> Message-ID: My proposal wasn't for people who hand-code the single-line comment syntax but for those that use multi-line string comments. Since the multi-line string hack's BDFL-approved, people will use it and other people will have to deal with it. The best alternative would be official discouragement of multi-line string comments. It's fine if Python doesn't have an officially-sanctioned multi-line comment syntax, but if it's going to have one, it should have one that makes sense. On Fri, Jun 15, 2012 at 4:12 PM, Steven D'Aprano wrote: > David Gates wrote: > >> Multi-line strings as comments don't nest, don't play well with >> docstrings, >> and are counter-intuitive when there's special language support for >> single-line comments. Python should only have one obvious way to do >> things, >> > > That's not what the Zen says. The zen says: > > There should be one-- and preferably only one --obvious way to do it. > > which is a positive statement that there should be an obvious way to solve > problems, NOT a negative statement that there shouldn't be non-obvious ways. > > and Python has two ways to comment, only one of which is obvious. My >> suggestion is to add language support for comment blocks, using Python's >> existing comment delimiter: >> > > There is already support for nested multi-line comments: the humble # > symbol can be nested arbitrarily deep. All you need is a modern editor that > understands Python syntax, and with a single command you can comment or > uncomment a block: > > # This is a commented line. > > # def fun(a, b, c): > # """Docstrings are fine when commented""" > # pass > # # This is a nested comment. > # And no need for an end-delimiter either. > > If your editor is old or too basic, you can do it by hand, which is a > pain, but doable. > > Python doesn't need dedicated syntax to make up for the limitations of > your editor. Don't complicate the language for the sake of those poor fools > stuck using Notepad. > > > > -- > Steven > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sat Jun 16 00:51:09 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 15 Jun 2012 15:51:09 -0700 Subject: [Python-ideas] Multi-line comment blocks. In-Reply-To: References: <4FDBB363.5080603@pearwood.info> Message-ID: On Fri, Jun 15, 2012 at 3:47 PM, David Gates wrote: > My proposal wasn't for people who hand-code the single-line comment syntax > but for those that use multi-line string comments. ?Since the multi-line > string hack's BDFL-approved, people will use it and other people will have > to deal with it. What's wrong with it? > The best alternative would be official discouragement of multi-line string > comments. ?It's fine if Python doesn't have an officially-sanctioned > multi-line comment syntax, but if it's going to have one, it should have one > that makes sense. What doesn't make sense about it? --Guido > On Fri, Jun 15, 2012 at 4:12 PM, Steven D'Aprano > wrote: >> >> David Gates wrote: >>> >>> Multi-line strings as comments don't nest, don't play well with >>> docstrings, >>> and are counter-intuitive when there's special language support for >>> single-line comments. Python should only have one obvious way to do >>> things, >> >> >> That's not what the Zen says. The zen says: >> >> There should be one-- and preferably only one --obvious way to do it. >> >> which is a positive statement that there should be an obvious way to solve >> problems, NOT a negative statement that there shouldn't be non-obvious ways. >> >>> and Python has two ways to comment, only one of which is obvious. My >>> suggestion is to add language support for comment blocks, using Python's >>> existing comment delimiter: >> >> >> There is already support for nested multi-line comments: the humble # >> symbol can be nested arbitrarily deep. All you need is a modern editor that >> understands Python syntax, and with a single command you can comment or >> uncomment a block: >> >> # This is a commented line. >> >> # def fun(a, b, c): >> # ? ? """Docstrings are fine when commented""" >> # ? ? pass >> # ? ? # This is a nested comment. >> # And no need for an end-delimiter either. >> >> If your editor is old or too basic, you can do it by hand, which is a >> pain, but doable. >> >> Python doesn't need dedicated syntax to make up for the limitations of >> your editor. Don't complicate the language for the sake of those poor fools >> stuck using Notepad. >> >> >> >> -- >> Steven >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- --Guido van Rossum (python.org/~guido) From gatesda at gmail.com Sat Jun 16 01:33:15 2012 From: gatesda at gmail.com (David Gates) Date: Fri, 15 Jun 2012 17:33:15 -0600 Subject: [Python-ideas] Multi-line comment blocks. In-Reply-To: References: <4FDBB363.5080603@pearwood.info> Message-ID: On discussions I've seen, including in this very thread ( http://mail.python.org/pipermail/python-ideas/2012-June/015544.html ), there are inevitably people that think the multi-line string comment syntax is non-Pythonic, confusing, and/or a bad practice. While they can adapt to it, the initial impression is often that it's an overly-clever hack. String literals *work* as comments in other languages, but the idiomatic usage is always the dedicated comment syntax (even if there isn't a multi-line syntax) because people assume that uncommented code is active and significant. The same goes for other tricks that use dead code as comments, such as the "if false:" block I've seen suggested as an alternative. Comments do more than just delimit non-code: they signal developer intent. On Fri, Jun 15, 2012 at 4:51 PM, Guido van Rossum wrote: > On Fri, Jun 15, 2012 at 3:47 PM, David Gates wrote: > > My proposal wasn't for people who hand-code the single-line comment > syntax > > but for those that use multi-line string comments. Since the multi-line > > string hack's BDFL-approved, people will use it and other people will > have > > to deal with it. > > What's wrong with it? > > > The best alternative would be official discouragement of multi-line > string > > comments. It's fine if Python doesn't have an officially-sanctioned > > multi-line comment syntax, but if it's going to have one, it should have > one > > that makes sense. > > What doesn't make sense about it? > > --Guido > > > On Fri, Jun 15, 2012 at 4:12 PM, Steven D'Aprano > > wrote: > >> > >> David Gates wrote: > >>> > >>> Multi-line strings as comments don't nest, don't play well with > >>> docstrings, > >>> and are counter-intuitive when there's special language support for > >>> single-line comments. Python should only have one obvious way to do > >>> things, > >> > >> > >> That's not what the Zen says. The zen says: > >> > >> There should be one-- and preferably only one --obvious way to do it. > >> > >> which is a positive statement that there should be an obvious way to > solve > >> problems, NOT a negative statement that there shouldn't be non-obvious > ways. > >> > >>> and Python has two ways to comment, only one of which is obvious. My > >>> suggestion is to add language support for comment blocks, using > Python's > >>> existing comment delimiter: > >> > >> > >> There is already support for nested multi-line comments: the humble # > >> symbol can be nested arbitrarily deep. All you need is a modern editor > that > >> understands Python syntax, and with a single command you can comment or > >> uncomment a block: > >> > >> # This is a commented line. > >> > >> # def fun(a, b, c): > >> # """Docstrings are fine when commented""" > >> # pass > >> # # This is a nested comment. > >> # And no need for an end-delimiter either. > >> > >> If your editor is old or too basic, you can do it by hand, which is a > >> pain, but doable. > >> > >> Python doesn't need dedicated syntax to make up for the limitations of > >> your editor. Don't complicate the language for the sake of those poor > fools > >> stuck using Notepad. > >> > >> > >> > >> -- > >> Steven > >> _______________________________________________ > >> Python-ideas mailing list > >> Python-ideas at python.org > >> http://mail.python.org/mailman/listinfo/python-ideas > > > > > > > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > http://mail.python.org/mailman/listinfo/python-ideas > > > > > > -- > --Guido van Rossum (python.org/~guido) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sat Jun 16 01:37:05 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 15 Jun 2012 16:37:05 -0700 Subject: [Python-ideas] Multi-line comment blocks. In-Reply-To: References: <4FDBB363.5080603@pearwood.info> Message-ID: You can never get agreement on what is Pythonic or not, that's why we have a BDFL. Feel free not to use strings as comments; as noted the multi-line # form is fine. Do note that Python has docstrings, so strings used as comments aren't completely alien like they would be in most languages. On Fri, Jun 15, 2012 at 4:33 PM, David Gates wrote: > On discussions I've seen, including in this very thread ( > http://mail.python.org/pipermail/python-ideas/2012-June/015544.html ), there > are inevitably people that think the multi-line string comment syntax is > non-Pythonic, confusing, and/or a bad practice. ?While they can adapt to it, > the initial impression is often that it's an overly-clever hack. > > String literals work as comments in other languages, but the idiomatic usage > is always the dedicated comment syntax (even if there isn't a multi-line > syntax) because people assume that uncommented code is active and > significant. ?The same goes for other tricks that use dead code as comments, > such as the "if false:" block I've seen suggested as an alternative. > Comments do more than just delimit non-code: they signal developer intent. > > On Fri, Jun 15, 2012 at 4:51 PM, Guido van Rossum wrote: >> >> On Fri, Jun 15, 2012 at 3:47 PM, David Gates wrote: >> > My proposal wasn't for people who hand-code the single-line comment >> > syntax >> > but for those that use multi-line string comments. ?Since the multi-line >> > string hack's BDFL-approved, people will use it and other people will >> > have >> > to deal with it. >> >> What's wrong with it? >> >> > The best alternative would be official discouragement of multi-line >> > string >> > comments. ?It's fine if Python doesn't have an officially-sanctioned >> > multi-line comment syntax, but if it's going to have one, it should have >> > one >> > that makes sense. >> >> What doesn't make sense about it? >> >> --Guido >> >> > On Fri, Jun 15, 2012 at 4:12 PM, Steven D'Aprano >> > wrote: >> >> >> >> David Gates wrote: >> >>> >> >>> Multi-line strings as comments don't nest, don't play well with >> >>> docstrings, >> >>> and are counter-intuitive when there's special language support for >> >>> single-line comments. Python should only have one obvious way to do >> >>> things, >> >> >> >> >> >> That's not what the Zen says. The zen says: >> >> >> >> There should be one-- and preferably only one --obvious way to do it. >> >> >> >> which is a positive statement that there should be an obvious way to >> >> solve >> >> problems, NOT a negative statement that there shouldn't be non-obvious >> >> ways. >> >> >> >>> and Python has two ways to comment, only one of which is obvious. My >> >>> suggestion is to add language support for comment blocks, using >> >>> Python's >> >>> existing comment delimiter: >> >> >> >> >> >> There is already support for nested multi-line comments: the humble # >> >> symbol can be nested arbitrarily deep. All you need is a modern editor >> >> that >> >> understands Python syntax, and with a single command you can comment or >> >> uncomment a block: >> >> >> >> # This is a commented line. >> >> >> >> # def fun(a, b, c): >> >> # ? ? """Docstrings are fine when commented""" >> >> # ? ? pass >> >> # ? ? # This is a nested comment. >> >> # And no need for an end-delimiter either. >> >> >> >> If your editor is old or too basic, you can do it by hand, which is a >> >> pain, but doable. >> >> >> >> Python doesn't need dedicated syntax to make up for the limitations of >> >> your editor. Don't complicate the language for the sake of those poor >> >> fools >> >> stuck using Notepad. >> >> >> >> >> >> >> >> -- >> >> Steven >> >> _______________________________________________ >> >> Python-ideas mailing list >> >> Python-ideas at python.org >> >> http://mail.python.org/mailman/listinfo/python-ideas >> > >> > >> > >> > _______________________________________________ >> > Python-ideas mailing list >> > Python-ideas at python.org >> > http://mail.python.org/mailman/listinfo/python-ideas >> > >> >> >> >> -- >> --Guido van Rossum (python.org/~guido) > > -- --Guido van Rossum (python.org/~guido) From carl at oddbird.net Sat Jun 16 01:07:55 2012 From: carl at oddbird.net (Carl Meyer) Date: Fri, 15 Jun 2012 17:07:55 -0600 Subject: [Python-ideas] Multi-line comment blocks. In-Reply-To: References: <4FDBB363.5080603@pearwood.info> Message-ID: <4FDBC04B.50307@oddbird.net> On 06/15/2012 04:51 PM, Guido van Rossum wrote: > On Fri, Jun 15, 2012 at 3:47 PM, David Gates wrote: >> My proposal wasn't for people who hand-code the single-line comment syntax >> but for those that use multi-line string comments. Since the multi-line >> string hack's BDFL-approved, people will use it and other people will have >> to deal with it. > > What's wrong with it? The reason I discourage using multi-line strings as comments is that they don't nest (which I think David mentioned earlier). If you've got a short multi-line-string-as-comment in the middle of a function, and then you try to use multi-line-string technique to comment out that entire function, you don't get what you want, you get a syntax error as your short comment is now parsed as code. (FWIW, I don't think this means Python needs a dedicated syntax for multi-line comments, I think multiple lines beginning with # works just fine.) Carl From guido at python.org Sat Jun 16 01:47:59 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 15 Jun 2012 16:47:59 -0700 Subject: [Python-ideas] Multi-line comment blocks. In-Reply-To: <4FDBC04B.50307@oddbird.net> References: <4FDBB363.5080603@pearwood.info> <4FDBC04B.50307@oddbird.net> Message-ID: On Fri, Jun 15, 2012 at 4:07 PM, Carl Meyer wrote: > On 06/15/2012 04:51 PM, Guido van Rossum wrote: >> On Fri, Jun 15, 2012 at 3:47 PM, David Gates wrote: >>> My proposal wasn't for people who hand-code the single-line comment syntax >>> but for those that use multi-line string comments. ?Since the multi-line >>> string hack's BDFL-approved, people will use it and other people will have >>> to deal with it. >> >> What's wrong with it? > > The reason I discourage using multi-line strings as comments is that > they don't nest (which I think David mentioned earlier). If you've got a > short multi-line-string-as-comment in the middle of a function, and then > you try to use multi-line-string technique to comment out that entire > function, you don't get what you want, you get a syntax error as your > short comment is now parsed as code. > > (FWIW, I don't think this means Python needs a dedicated syntax for > multi-line comments, I think multiple lines beginning with # works just > fine.) In which languages do multi-line comments nest? AFAIK not in the Java/C/C++/JavaScript family. -- --Guido van Rossum (python.org/~guido) From zuo at chopin.edu.pl Sat Jun 16 01:41:50 2012 From: zuo at chopin.edu.pl (Jan Kaliszewski) Date: Sat, 16 Jun 2012 01:41:50 +0200 Subject: [Python-ideas] Weak-referencing/weak-proxying of (bound) methods In-Reply-To: References: <20120610231629.GA1792@chopin.edu.pl> Message-ID: <20120615234150.GA1757@chopin.edu.pl> Tal Einat dixit (2012-06-15, 15:41): > On Mon, Jun 11, 2012 at 2:16 AM, Jan Kaliszewski wrote: [snip] > > >>> import weakref > > >>> class A: > > ... def method(self): print(self) > > ... > > >>> A.method > > > > >>> a = A() > > >>> a.method > > > > > >>> r = weakref.ref(a.method) # creating a weak reference > > >>> r # ...but it appears to be dead > > > > >>> w = weakref.proxy(a.method) # the same with a weak proxy > > >>> w > > > > >>> w() > > Traceback (most recent call last): > > File "", line 1, in > > ReferenceError: weakly-referenced object no longer exists > > > > This behaviour is perfectly correct -- but still surprising, > > especially for people who know little about method creation > > machinery, descriptors etc. > > > > I think it would be nice to make this 'trap' less painful -- [snip] > > A prototype implementation: > > > > class InstanceCachedMethod(object): > > > > def __init__(self, func): > > self.func = func > > (self.instance_attr_name > > ) = '__{0}_method_ref'.format(func.__name__) > > > > def __get__(self, instance, owner): > > if instance is None: > > return self.func > > try: > > return getattr(instance, self.instance_attr_name) > > except AttributeError: > > method = types.MethodType(self.func, instance) > > setattr(instance, self.instance_attr_name, method) > > return method [snip] > I was bitten by this issue a while ago as well. It made working with > weakref proxies much more involved than I expected it would be. > > Wouldn't it be better to approach the issue from the opposite end, and > improve/wrap/replace weakref.proxy with something that can handle bound > methods? Indeed, probably could it be done by wrapping weakref.ref()/proxy() with something like the following: # here `obj` is the object that is being weak-referenced... if isinstance(obj, types.MethodType): try: cache = obj.__self__.__method_cache__ except AttributeError: cache = obj.__self__.__method_cache__ = WeakKeyDictionary() method_cache.setdefault(obj.__func__, set()).add(obj) (Using WeakKeyDictionary with corresponding function objects as weak keys -- to provide automagic cleanup when a function is deleted, e.g. replaced with another one. In other words: the actual weak ref/proxy to a method lives as long as the corresponding function does). Any thoughts? Cheers. *j From gatesda at gmail.com Sat Jun 16 02:28:39 2012 From: gatesda at gmail.com (David Gates) Date: Fri, 15 Jun 2012 18:28:39 -0600 Subject: [Python-ideas] Multi-line comment blocks. In-Reply-To: References: <4FDBB363.5080603@pearwood.info> <4FDBC04B.50307@oddbird.net> Message-ID: Perl, Ruby, Lisps, OCaml, F#, Haskell, and I believe Pascal. There are probably others. None of these use significant indentation; they're just smart enough to not ignore beginning delimiters within multi-line comments. On Fri, Jun 15, 2012 at 5:47 PM, Guido van Rossum wrote: > On Fri, Jun 15, 2012 at 4:07 PM, Carl Meyer wrote: > > On 06/15/2012 04:51 PM, Guido van Rossum wrote: > >> On Fri, Jun 15, 2012 at 3:47 PM, David Gates wrote: > >>> My proposal wasn't for people who hand-code the single-line comment > syntax > >>> but for those that use multi-line string comments. Since the > multi-line > >>> string hack's BDFL-approved, people will use it and other people will > have > >>> to deal with it. > >> > >> What's wrong with it? > > > > The reason I discourage using multi-line strings as comments is that > > they don't nest (which I think David mentioned earlier). If you've got a > > short multi-line-string-as-comment in the middle of a function, and then > > you try to use multi-line-string technique to comment out that entire > > function, you don't get what you want, you get a syntax error as your > > short comment is now parsed as code. > > > > (FWIW, I don't think this means Python needs a dedicated syntax for > > multi-line comments, I think multiple lines beginning with # works just > > fine.) > > In which languages do multi-line comments nest? AFAIK not in the > Java/C/C++/JavaScript family. > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Sat Jun 16 02:47:09 2012 From: python at mrabarnett.plus.com (MRAB) Date: Sat, 16 Jun 2012 01:47:09 +0100 Subject: [Python-ideas] Multi-line comment blocks. In-Reply-To: References: <4FDBB363.5080603@pearwood.info> <4FDBC04B.50307@oddbird.net> Message-ID: <4FDBD78D.6010506@mrabarnett.plus.com> On 16/06/2012 01:28, David Gates wrote: > Perl, Ruby, Lisps, OCaml, F#, Haskell, and I believe Pascal. There are > probably others. None of these use significant indentation; they're > just smart enough to not ignore beginning delimiters within multi-line > comments. > In Pascal, comments start with "{" or "(*" and end with "}" or "*)". How do you write a nested comment in Perl? As far as I'm aware, it doesn't have nested comments either. From gatesda at gmail.com Sat Jun 16 03:00:10 2012 From: gatesda at gmail.com (David Gates) Date: Fri, 15 Jun 2012 19:00:10 -0600 Subject: [Python-ideas] Multi-line comment blocks. In-Reply-To: <4FDBD78D.6010506@mrabarnett.plus.com> References: <4FDBB363.5080603@pearwood.info> <4FDBC04B.50307@oddbird.net> <4FDBD78D.6010506@mrabarnett.plus.com> Message-ID: A Perl nested comment: =for Comment =for Nested comment =cut =cut On Fri, Jun 15, 2012 at 6:47 PM, MRAB wrote: > On 16/06/2012 01:28, David Gates wrote: > >> Perl, Ruby, Lisps, OCaml, F#, Haskell, and I believe Pascal. There are >> probably others. None of these use significant indentation; they're >> just smart enough to not ignore beginning delimiters within multi-line >> comments. >> >> In Pascal, comments start with "{" or "(*" and end with "}" or "*)". > > How do you write a nested comment in Perl? As far as I'm aware, it > doesn't have nested comments either. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeanpierreda at gmail.com Sat Jun 16 04:27:17 2012 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Fri, 15 Jun 2012 22:27:17 -0400 Subject: [Python-ideas] Multi-line comment blocks. In-Reply-To: References: <4FDBB363.5080603@pearwood.info> Message-ID: On Fri, Jun 15, 2012 at 6:51 PM, Guido van Rossum wrote: > On Fri, Jun 15, 2012 at 3:47 PM, David Gates wrote: >> My proposal wasn't for people who hand-code the single-line comment syntax >> but for those that use multi-line string comments. ?Since the multi-line >> string hack's BDFL-approved, people will use it and other people will have >> to deal with it. > > What's wrong with it? It behaves "badly" in a lot of circumstances. If you put it in an expression, it's treated as a string (not as an invisible thing) (IMHO this is its worst failing, the rest is just fluff). And to add on some more reasons, if you put it at the top of a file, class statement, or def statement, it's treated as a docstring, which may accidentally be included in autogenerated docs. In fact, for some common tools (epydoc), even if you put it in some other places it may be grabbed as a docstring (e.g. if you put it after a variable definition inside a class statement), and be included in the documentation. Basically the Python tool world seems to think that strings that aren't inside an expression are "docstrings", not comments, and you have to be careful to avoid being misinterpreted by your tools, which is unfortunate. In contrast, the reason that multiline comments are so great is that they can go virtually anywhere without too much concern. For example: def foo(a, b(*=None*)): ... In this hypothetical code, I commented out the =None in order to run the test suite and see if any of my code omitted that argument, maybe to judge how reasonable it is to remove the default. Here, neither "#" comments nor docstrings really make this easy. The closest equivalent is: def foo(a, b): #=None): ... And that has to be done entirely by hand, and might be especially painful (involving copy-paste) if it isn't the last argument that's being changed. I have done this sort of thing (commenting out stuff inside def statements) many times, I don't even remember why. It crops up. Of course, multiline comments go anywhere, not just in def statements. And they span multiple lines! In practice, most of the time that's just as easy with the editor key that inserts "#", I just wanted to point out a case where no existing solution makes it so easy. -- Devin From bruce at leapyear.org Sat Jun 16 05:26:33 2012 From: bruce at leapyear.org (Bruce Leban) Date: Fri, 15 Jun 2012 20:26:33 -0700 Subject: [Python-ideas] Multi-line comment blocks. In-Reply-To: <4FDBC04B.50307@oddbird.net> References: <4FDBB363.5080603@pearwood.info> <4FDBC04B.50307@oddbird.net> Message-ID: On Jun 15, 2012 4:46 PM, "Carl Meyer" oddbird.net> wrote: > The reason I discourage using multi-line strings as comments is that > they don't nest (which I think David mentioned earlier). If you've got a > short multi-line-string-as-comment in the middle of a function, and then > you try to use multi-line-string technique to comment out that entire > function, you don't get what you want, you get a syntax error as your > short comment is now parsed as code. I think "commenting out" code and writing true comments are different functions. I would not advocate multi line strings for commenting out but they work very well for long text comments. Nested #s and a modern editor work well enough for commenting out IMHO and are much harder to not notice. --- Bruce (from my phone) On Jun 15, 2012 7:28 PM, "Devin Jeanpierre" @ gmail.com > wrote: > the reason that multiline comments are so great is that > they can go virtually anywhere without too much concern. For example: > > def foo(a, b(*=None*)): > ... > > In this hypothetical code, I commented out the =None in order to run > the test suite and see if any of my code omitted that argument, maybe > to judge how reasonable it is to remove the default. Here, neither "#" > comments nor docstrings really make this easy. The closest equivalent > is: > > def foo(a, b): #=None): > ... > > And that has to be done entirely by hand, and might be especially > painful (involving copy-paste) if it isn't the last argument that's > being changed. For commenting out part of a line I think best practice is duplicating the entire line as a comment and editing it directly. That handles scenarios that inline comments don't and more importantly ensures reverting is error free. # def foo(a, b=None): def foo(a, b=[]): > Python tool world seems to think that strings that aren't inside an > expression are "docstrings", not comments, and you have to be careful > to avoid being misinterpreted by your tools, which is unfortunate. Agreed. But even if multiline/inline comments were added you'd still have that problem, right? --- Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeanpierreda at gmail.com Sat Jun 16 05:48:29 2012 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Fri, 15 Jun 2012 23:48:29 -0400 Subject: [Python-ideas] Multi-line comment blocks. In-Reply-To: References: <4FDBB363.5080603@pearwood.info> <4FDBC04B.50307@oddbird.net> Message-ID: On Fri, Jun 15, 2012 at 11:26 PM, Bruce Leban wrote: > For commenting out part of a line I think best practice is duplicating the > entire line as a comment and editing it directly. That handles scenarios > that inline comments don't and more importantly ensures reverting is error > free. I suppose so. So far I've done pretty much exactly what I wrote, and used the undo buffer for safety. There are also things like commenting out values inside lists and such, but these are much less common for me. Like, definitely inline comments are more flexible than EOL comments, but finding compelling use-cases is kinda hard. There's only a bunch of minor special cases and annoyances, as far as I can see. >> Python tool world seems to think that strings that aren't inside an >> expression are "docstrings", not comments, and you have to be careful >> to avoid being misinterpreted by your tools, which is unfortunate. > > Agreed. But even if multiline/inline comments were added you'd still have > that problem, right? I don't see why this problem would exist for comments. Comments do not have a (common) culture or behaviour of meaning anything else other than comments, whereas triple-quoted strings have three purposes: - Actual string objects - Docstrings - Comments Multiline comments would need a lot of time to accumulate that many orthogonal uses, and one would hope that they never do. Aside from that, most of these sorts of tools manipulate code either after parsing or after executing, and by then all comments have been discarded. They wouldn't even see multiline comments. Although, it's worth mentioning that doctest is an interesting exception to all this: it uses regexps to parse out comments, which are used as directives for the test runner. However, doctest only touches code that is explicitly meant to be touched by doctest, and that code generally doesn't need comments at all. -- Devin From bruce at leapyear.org Sat Jun 16 06:28:33 2012 From: bruce at leapyear.org (Bruce Leban) Date: Fri, 15 Jun 2012 21:28:33 -0700 Subject: [Python-ideas] Multi-line comment blocks. In-Reply-To: References: <4FDBB363.5080603@pearwood.info> <4FDBC04B.50307@oddbird.net> Message-ID: On Fri, Jun 15, 2012 at 8:48 PM, Devin Jeanpierre wrote: > On Fri, Jun 15, 2012 at 11:26 PM, Bruce Leban wrote: > > For commenting out part of a line I think best practice is duplicating > the > > entire line as a comment and editing it directly. That handles scenarios > > that inline comments don't and more importantly ensures reverting is > error > > free. > > I suppose so. So far I've done pretty much exactly what I wrote, and > used the undo buffer for safety. > > Undo is dangerous because in most editors it will undo other intervening changes to other parts of the program. You make a change like this to find a bug, then find and fix the bug. Undo will remove the fix. > Agreed. But even if multiline/inline comments were added you'd still have > > that problem, right? > > I don't see why this problem would exist for comments. Comments do not > have a (common) culture or behaviour of meaning anything else other > than comments, whereas triple-quoted strings have three purposes: > I meant that even if new comment syntax were added, string-style comments wouldn't be going away anytime soon. There's a high bar for adding features and an even higher bar for removing them. So tools will need handle the current string comments for quite a while as well as being modified to parse any new comment syntax. --- Bruce Follow me: http://www.twitter.com/Vroo http://www.vroospeak.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Sat Jun 16 08:34:58 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 16 Jun 2012 15:34:58 +0900 Subject: [Python-ideas] Multi-line comment blocks. In-Reply-To: <4FDBC04B.50307@oddbird.net> References: <4FDBB363.5080603@pearwood.info> <4FDBC04B.50307@oddbird.net> Message-ID: <87zk83hii5.fsf@uwakimon.sk.tsukuba.ac.jp> Carl Meyer writes: > The reason I discourage using multi-line strings as comments is that > they don't nest (which I think David mentioned earlier). I don't see that as a problem. While I don't use multiline strings as comments myself, I wouldn't object to others using them for commentary, especially given the syntactic analogy to docstrings. But for commenting out code, a nice heavy line in the left margin is an appropriate marker, and would certainly "discuss" the matter with colleagues who disabled large chunks of code with paired delimiters, whether primarily string or comment delimiters. I'm a big non-fan of preprocessor conditional compilation directives, for that matter. (Sure, your editor can mark or hide them, and it's not like you can avoid them in languages like C, but that doesn't mean I have to *like* them.) So IMO the current syntax encourages good style. From stephen at xemacs.org Sat Jun 16 08:45:23 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 16 Jun 2012 15:45:23 +0900 Subject: [Python-ideas] Multi-line comment blocks. In-Reply-To: References: <4FDBB363.5080603@pearwood.info> <4FDBC04B.50307@oddbird.net> Message-ID: <87y5nnhi0s.fsf@uwakimon.sk.tsukuba.ac.jp> Devin Jeanpierre writes: > [T]riple-quoted strings have three purposes: > > - Actual string objects > - Docstrings Docstrings are a subset of "actual string objects," of course. They just have a special syntax, and their primary use is "meta" (eg, introspection). > - Comments And so are strings-as-comments. Strings could be used as comments in C: void foo () { "This comment would be optimized away, most likely."; "Not to mention compilers may bitch about lack of effect."; return 42; } It's just a side effect of expression statements. You don't have to like it, of course. From steve at pearwood.info Sat Jun 16 08:53:10 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 16 Jun 2012 16:53:10 +1000 Subject: [Python-ideas] Multi-line comment blocks. In-Reply-To: References: <4FDBB363.5080603@pearwood.info> <4FDBC04B.50307@oddbird.net> Message-ID: <4FDC2D56.9080302@pearwood.info> Devin Jeanpierre wrote: > Although, it's worth mentioning that doctest is an interesting > exception to all this: it uses regexps to parse out comments, which > are used as directives for the test runner. However, doctest only > touches code that is explicitly meant to be touched by doctest, The normal way of running doctest is to use implicit test discovery: you point doctest at a module, and it will discover your doctests without you needing to explicitly list them. That's why there is a doctest directive to *disable* tests, but no directive to enable them: you only need to explicitly turn tests off, not turn them on. > and that code generally doesn't need comments at all. I write many functions or classes that include both documentation in the docstring, including doctests, and implementation comments in the body of the function. Docstrings and comments in the body of a function have very different purposes, just because a function has one doesn't mean that it won't have the other. -- Steven From steve at pearwood.info Sat Jun 16 08:55:51 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 16 Jun 2012 16:55:51 +1000 Subject: [Python-ideas] Multi-line comment blocks. In-Reply-To: <4FDBC04B.50307@oddbird.net> References: <4FDBB363.5080603@pearwood.info> <4FDBC04B.50307@oddbird.net> Message-ID: <4FDC2DF7.5030505@pearwood.info> Carl Meyer wrote: > The reason I discourage using multi-line strings as comments is that > they don't nest (which I think David mentioned earlier). If you've got a > short multi-line-string-as-comment in the middle of a function, and then > you try to use multi-line-string technique to comment out that entire > function, you don't get what you want, you get a syntax error as your > short comment is now parsed as code. You can nest two such string-comments, by using different string delimiters: '''Outermost comment def func(x, y): """Innermost comment or docstring goes here """ pass If you regularly need to do this, you're doing it wrong. You should be deleting unused code, not commenting it out. Nested comments as change tracking is *worse* than no change tracking, in my opinion. ''' > (FWIW, I don't think this means Python needs a dedicated syntax for > multi-line comments, I think multiple lines beginning with # works just > fine.) Agreed. -- Steven From steve at pearwood.info Sat Jun 16 09:17:35 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 16 Jun 2012 17:17:35 +1000 Subject: [Python-ideas] Multi-line comment blocks. In-Reply-To: References: <4FDBB363.5080603@pearwood.info> Message-ID: <4FDC330F.2000604@pearwood.info> Devin Jeanpierre wrote: > In contrast, the reason that multiline comments are so great is that > they can go virtually anywhere without too much concern. For example: > > def foo(a, b(*=None*)): > ... So now you're changing the semantics from *multiline* to *embedded* comments. Being able to embed a comment within an expression is a very different thing from just having comments extend across multiple lines. > In this hypothetical code, I commented out the =None in order to run > the test suite and see if any of my code omitted that argument, maybe > to judge how reasonable it is to remove the default. Here, neither "#" > comments nor docstrings really make this easy. The closest equivalent > is: > > def foo(a, b): #=None): > ... The simplest change here would be to just delete the "=None", run your tests, then put it back if the tests fail. Of course, alternatives are the comment above, or perhaps even better: #def foo(a, b=None): def foo(a, b): ... which avoids the risk of forgetting what change needs to be undone. In any case, all these alternatives are so trivial that they are hardly an argument for adding new comment syntax. > And that has to be done entirely by hand, and might be especially > painful (involving copy-paste) if it isn't the last argument that's > being changed. "Especially painful"? I fear you exaggerate somewhat. -- Steven From zuo at chopin.edu.pl Sat Jun 16 10:42:57 2012 From: zuo at chopin.edu.pl (Jan Kaliszewski) Date: Sat, 16 Jun 2012 10:42:57 +0200 Subject: [Python-ideas] Weak-referencing/weak-proxying of (bound) methods In-Reply-To: <20120615234150.GA1757@chopin.edu.pl> References: <20120610231629.GA1792@chopin.edu.pl> <20120615234150.GA1757@chopin.edu.pl> Message-ID: <20120616084257.GA1843@chopin.edu.pl> Jan Kaliszewski dixit (2012-06-16, 01:41): > Tal Einat dixit (2012-06-15, 15:41): > > > On Mon, Jun 11, 2012 at 2:16 AM, Jan Kaliszewski wrote: > [snip] > > > >>> import weakref > > > >>> class A: > > > ... def method(self): print(self) > > > ... > > > >>> A.method > > > > > > >>> a = A() > > > >>> a.method > > > > > > > >>> r = weakref.ref(a.method) # creating a weak reference > > > >>> r # ...but it appears to be dead > > > > > > >>> w = weakref.proxy(a.method) # the same with a weak proxy > > > >>> w > > > > > > >>> w() > > > Traceback (most recent call last): > > > File "", line 1, in > > > ReferenceError: weakly-referenced object no longer exists > > > > > > This behaviour is perfectly correct -- but still surprising, > > > especially for people who know little about method creation > > > machinery, descriptors etc. > > > > > > I think it would be nice to make this 'trap' less painful -- > [snip] > > > A prototype implementation: > > > > > > class InstanceCachedMethod(object): > > > > > > def __init__(self, func): > > > self.func = func > > > (self.instance_attr_name > > > ) = '__{0}_method_ref'.format(func.__name__) > > > > > > def __get__(self, instance, owner): > > > if instance is None: > > > return self.func > > > try: > > > return getattr(instance, self.instance_attr_name) > > > except AttributeError: > > > method = types.MethodType(self.func, instance) > > > setattr(instance, self.instance_attr_name, method) > > > return method > [snip] > > I was bitten by this issue a while ago as well. It made working with > > weakref proxies much more involved than I expected it would be. > > > > Wouldn't it be better to approach the issue from the opposite end, and > > improve/wrap/replace weakref.proxy with something that can handle bound > > methods? > > Indeed, probably could it be done by wrapping weakref.ref()/proxy() > with something like the following: > > # here `obj` is the object that is being weak-referenced... > if isinstance(obj, types.MethodType): > try: > cache = obj.__self__.__method_cache__ > except AttributeError: > cache = obj.__self__.__method_cache__ = WeakKeyDictionary() > method_cache.setdefault(obj.__func__, set()).add(obj) > > (Using WeakKeyDictionary with corresponding function objects as weak > keys -- to provide automagic cleanup when a function is deleted, e.g. > replaced with another one. In other words: the actual weak ref/proxy > to a method lives as long as the corresponding function does). On second thought -- no, it shouldn't be done on the side of weakref.ref()/proxy(). Why? My last idea described just above has such a bug: each time you create a new weak reference to the method another method object is cached (added to __method_cache__[func] set). You could think that caching only one object (just in __method_cache__[func]) would be a better idea, but it wouldn't: such a behaviour would be strange and unstable: after creating a new weakref to the method, the old weakref would became invalid... And yes, we can prevent it by ensuring that each time you take the method from a class instance you get the same object (per class instance) -- but then we come back to my previous idea of a descriptor-decorator. And IMHO such a decorator should not be applied on the class dictionary implicitly by weakref.ref()/proxy() but explicitly in the class body with the decorator syntax (applying such a decorater, i.e. replacing a function with a caching descriptor is a class dict, is too invasive operation to be done silently). So I renew (and update) my previous descriptor-decorator that could be added to functools (or to weakref as a helper?) and applied explicitly by programmers, when needed: class CachedMethod(object): def __init__(self, func): self.func = func def __get__(self, instance, owner): if instance is None: return self.func try: cache = instance.__method_cache__ except AttributeError: # not thread-safe :-( cache = instance.__method_cache__ = WeakKeyDictionary() return cache.setdefault( self.func, types.MethodType(self.func, instance)) Usage: class MyClass(object): @CachedMethod def my_method(self): ... instance = MyClass() method_weak_proxy = weakref.proxy(instance.my_method) method_weak_proxy() # works! It should be noted that caching a reference to a method in an instance causes circular referencing (class <-> instance). However, ofter it is not a problem and can help avoiding circular references involving other objects which we want to have circular-ref-free (typical use case: passing a bound method as a callback). Cheers. *j From zuo at chopin.edu.pl Sat Jun 16 10:46:48 2012 From: zuo at chopin.edu.pl (Jan Kaliszewski) Date: Sat, 16 Jun 2012 10:46:48 +0200 Subject: [Python-ideas] Weak-referencing/weak-proxying of (bound) methods [erratum, sorry] In-Reply-To: <20120616084257.GA1843@chopin.edu.pl> References: <20120610231629.GA1792@chopin.edu.pl> <20120615234150.GA1757@chopin.edu.pl> <20120616084257.GA1843@chopin.edu.pl> Message-ID: <20120616084648.GB1843@chopin.edu.pl> Jan Kaliszewski dixit (2012-06-16, 10:42): > instance causes circular referencing (class <-> instance). s/class/method/ From p.f.moore at gmail.com Sat Jun 16 10:56:49 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 16 Jun 2012 09:56:49 +0100 Subject: [Python-ideas] Multi-line comment blocks. In-Reply-To: References: <4FDBB363.5080603@pearwood.info> <4FDBC04B.50307@oddbird.net> <4FDBD78D.6010506@mrabarnett.plus.com> Message-ID: On 16 June 2012 02:00, David Gates wrote: > A Perl nested comment: > > =for > ? Comment > ? =for > ? ? Nested comment > ? =cut > =cut And the irony is that, as far as I recall, this is a form of Perl's embedded documentation syntax (and hence very similar in spirit to using multiline strings as comments). See http://www.perl6.org/archive//rfc/5.html (and note that perl 6 does *not*, apparently, include multiline comments). Paul. From jeanpierreda at gmail.com Sat Jun 16 10:57:30 2012 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Sat, 16 Jun 2012 04:57:30 -0400 Subject: [Python-ideas] Multi-line comment blocks. In-Reply-To: <4FDC2D56.9080302@pearwood.info> References: <4FDBB363.5080603@pearwood.info> <4FDBC04B.50307@oddbird.net> <4FDC2D56.9080302@pearwood.info> Message-ID: On Sat, Jun 16, 2012 at 2:53 AM, Steven D'Aprano wrote: Steven, the code I was talking about was the code inside the doctests, not the code surrounding the doctests. So, for example, whether or not the body of the function has comments doesn't matter. They could never be confused with doctest directives. -- Devin From jeanpierreda at gmail.com Sat Jun 16 11:09:51 2012 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Sat, 16 Jun 2012 05:09:51 -0400 Subject: [Python-ideas] Multi-line comment blocks. In-Reply-To: References: <4FDBB363.5080603@pearwood.info> <4FDBC04B.50307@oddbird.net> Message-ID: On Sat, Jun 16, 2012 at 12:28 AM, Bruce Leban wrote: > I meant that even if new comment syntax were added, string-style comments > wouldn't be going away anytime soon. There's a high bar for adding features > and an even higher bar for removing them. So tools will need?handle the > current string comments for quite a while as well as being?modified to parse > any new comment syntax. Sorry, I didn't understand your point at first. That's a concern. Although I'm not sure it pans out -- do any tools handle string comments? I only know of tools ignoring them or mistreating them, not handling them specially. -- Devin From greg.ewing at canterbury.ac.nz Sat Jun 16 09:59:21 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 16 Jun 2012 19:59:21 +1200 Subject: [Python-ideas] Multi-line comment blocks. In-Reply-To: References: <4FDBB363.5080603@pearwood.info> <4FDBC04B.50307@oddbird.net> Message-ID: <4FDC3CD9.6000402@canterbury.ac.nz> Guido van Rossum wrote: > In which languages do multi-line comments nest? AFAIK not in the > Java/C/C++/JavaScript family. Modula-2. -- Greg From gatesda at gmail.com Sat Jun 16 15:37:10 2012 From: gatesda at gmail.com (David Gates) Date: Sat, 16 Jun 2012 07:37:10 -0600 Subject: [Python-ideas] Multi-line comment blocks. In-Reply-To: References: <4FDBB363.5080603@pearwood.info> <4FDBC04B.50307@oddbird.net> <4FDBD78D.6010506@mrabarnett.plus.com> Message-ID: I was throwing together a quick language list, so I pulled some of them from hyperpolyglot, including Perl. So, guess it's not a dedicated comment syntax, but it does nest (the document you linked says it doesn't, but it's outdated). Found out that Lua also uses dead-code strings as comments. It supports nested strings, but the delimiters in each layer must be distinct. Trying to nest them otherwise is a syntax error, so you can't accidentally end a string early like you can with quote delimiters. On Sat, Jun 16, 2012 at 2:56 AM, Paul Moore wrote: > On 16 June 2012 02:00, David Gates wrote: > > A Perl nested comment: > > > > =for > > Comment > > =for > > Nested comment > > =cut > > =cut > > And the irony is that, as far as I recall, this is a form of Perl's > embedded documentation syntax (and hence very similar in spirit to > using multiline strings as comments). See > http://www.perl6.org/archive//rfc/5.html (and note that perl 6 does > *not*, apparently, include multiline comments). > > Paul. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sat Jun 16 16:56:27 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 16 Jun 2012 07:56:27 -0700 Subject: [Python-ideas] Multi-line comment blocks. In-Reply-To: References: <4FDBB363.5080603@pearwood.info> <4FDBC04B.50307@oddbird.net> <4FDBD78D.6010506@mrabarnett.plus.com> Message-ID: Please stop this discussion. Python is not going to change this. -- --Guido van Rossum (python.org/~guido) From ironfroggy at gmail.com Sat Jun 16 19:05:48 2012 From: ironfroggy at gmail.com (Calvin Spealman) Date: Sat, 16 Jun 2012 13:05:48 -0400 Subject: [Python-ideas] for/else statements considered harmful In-Reply-To: References: Message-ID: On Wed, Jun 6, 2012 at 7:20 PM, Alice Bevan?McGregor wrote: > Howdy! > > Was teaching a new user to Python the ropes a short while ago and ran into > an interesting headspace problem: the for/else syntax fails the obviousness > and consistency tests. ?When used in an if/else block the conditional code > is executed if the conditional passes, and the else block is executed if the > conditional fails. ?Compared to for loops where the for code is repeated and > the else code executed if we "naturally fall off the loop". ?(The new user's > reaction was "why the hoek would I ever use for/else?") I read it not as for/else and while/else, but break/else and this has been a much more natural framing for myself and those I've used the framing to explain the behavior to. > I forked Python 3.3 to experiment with an alternate implementation that > follows the logic of pass/fail implied by if/else: (and to refactor the > stdlib, but that's a different issue ;) > > ? for x in range(20): > ? ? ? if x > 10: break > ? else: > ? ? ? pass # we had no values to iterate > ? finally: > ? ? ? pass # we naturally fell off the loop > > It abuses finally (to avoid tying up a potentially common word as a reserved > word like "done") but makes possible an important distinction without having > to perform potentially expensive length calculations (which may not even be > possible!) on the value being iterated: that is, handling the case where > there were no values in the collection or returned by the generator. > > Templating engines generally implement this type of structure. ?Of course > this type of breaking change in semantics puts this idea firmly into Python > 4 land. > > I'll isolate the for/else/finally code from my fork and post a patch this > week-end, hopefully. > > ? ? ? ?? Alice. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy From lclarkmichalek at gmail.com Sun Jun 17 00:28:50 2012 From: lclarkmichalek at gmail.com (Laurie Clark-Michalek) Date: Sat, 16 Jun 2012 23:28:50 +0100 Subject: [Python-ideas] Make dict customisation easier Message-ID: Hi, A few weeks ago, a guy was on #python, looking to customise a dictionary to be case insensitive (he was assuming string keys). His naive implementation looked something like this: class CaseInsensitiveDict(dict): def __getitem__(self, key): return dict.__getitem__(self, key.lower()) def __setitem__(self, key, item): dict.__setitem__(self, key.lower(), item) However he was dismayed to find that this didn't work with other methods that dict uses: >>> d = CaseInsensitiveDict() >>> d['a'] = 3 >>> d {'a': 3} >>> d['A'] 3 >>> d.get('A', "No key found") 'No key found' Eventually he was directed to dir(dict), and he seemed to accept that he would have to wrap most of the methods of the dict builtin. This seemed like the worse solution to me, and I couldn't see any real reason why python couldn't either defer to user implemented __getitem__ and __setitem__, or provide an alternative dict implementation that did allow easy customisation. I realise that python dicts are fairly high performance structures, and that checking for a custom implementation might have an unacceptable impact for a solution to what might be seen as a minor problem. Still, I think it is worth the effort to clean up what seems to me to be a slight wart on a very fundamental type in python. Thanks, Laurie -------------- next part -------------- An HTML attachment was scrubbed... URL: From simon.sapin at kozea.fr Sun Jun 17 00:41:27 2012 From: simon.sapin at kozea.fr (Simon Sapin) Date: Sun, 17 Jun 2012 00:41:27 +0200 Subject: [Python-ideas] Make dict customisation easier In-Reply-To: References: Message-ID: <4FDD0B97.7070208@kozea.fr> Le 17/06/2012 00:28, Laurie Clark-Michalek a ?crit : > Eventually he was directed to dir(dict), and he seemed to accept that he > would have to wrap most of the methods of the dict builtin. This seemed > like the worse solution to me, and I couldn't see any real reason why > python couldn't either defer to user implemented __getitem__ and > __setitem__, or provide an alternative dict implementation that did > allow easy customisation. > > I realise that python dicts are fairly high performance structures, and > that checking for a custom implementation might have an unacceptable > impact for a solution to what might be seen as a minor problem. Still, I > think it is worth the effort to clean up what seems to me to be a slight > wart on a very fundamental type in python. Hi, The MutableMapping class in the collections module has default implementations for many methods, based on a few basic method. I think that inheriting from it and adding __len__, __iter__, __getitem__, __setitem__ and __delitem__ should be enough. Then you can override more methods for performance, but the defaults should be correct and consistent. Regards, -- Simon Sapin From lists at cheimes.de Sun Jun 17 02:13:49 2012 From: lists at cheimes.de (Christian Heimes) Date: Sun, 17 Jun 2012 02:13:49 +0200 Subject: [Python-ideas] Context helper for new os.*at functions Message-ID: Hello, Python 3.3 has got new wrappers for the 'at' variants of low level functions, for example os.openat(). The 'at' variants work like their brothers and sisters with one exception. The first argument must be a file descriptor of a directory. The fd is used to calculate the absolute path instead of the current working directory. File descriptors are harder to manage than files because a fd isnt't automatically closed when it gets out of scope. I've written a small wrapper that takes care of the details. It also ensures that only directories are opened. Example: with atcontext("/etc") as at: print(at.open) # functools.partial(, 3) f = at.open("fstab", os.O_RDONLY) print(os.read(f, 50)) os.close(f) Code: http://pastebin.com/J4SLjB6k The code calculates the name and creates dynamic wrapper with functool.partial. This may not be desired if the wrapper is added to the os module. I could add explicit methods and generate the doc strings from the methods' doc strings. def docfix(func): name = func.__name__ nameat = name + "at" doc = getattr(os, nameat).__doc__ func.__doc__ = doc.replace("{}(dirfd, ".format(nameat), "{}(".format(name)) return func class atcontext: ... @docfix def open(self, *args): return self.openat(self.dirf, *args) How do you like my proposal? Christian From guido at python.org Sun Jun 17 02:46:15 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 16 Jun 2012 17:46:15 -0700 Subject: [Python-ideas] Context helper for new os.*at functions In-Reply-To: References: Message-ID: Hmm... Isn't Larry Hastings working on replacing the separate functions with an api where you pass an 'fd=...' argument to the non-at function? On Sat, Jun 16, 2012 at 5:13 PM, Christian Heimes wrote: > Hello, > > Python 3.3 has got new wrappers for the 'at' variants of low level > functions, for example os.openat(). The 'at' variants work like their > brothers and sisters with one exception. The first argument must be a > file descriptor of a directory. The fd is used to calculate the absolute > path instead of the current working directory. > > File descriptors are harder to manage than files because a fd isnt't > automatically closed when it gets out of scope. I've written a small > wrapper that takes care of the details. It also ensures that only > directories are opened. > > Example: > > with atcontext("/etc") as at: > ? ?print(at.open) > ? ?# functools.partial(, 3) > ? ?f = at.open("fstab", os.O_RDONLY) > ? ?print(os.read(f, 50)) > ? ?os.close(f) > > Code: > http://pastebin.com/J4SLjB6k > > The code calculates the name and creates dynamic wrapper with > functool.partial. This may not be desired if the wrapper is added to the > os module. I could add explicit methods and generate the doc strings > from the methods' doc strings. > > def docfix(func): > ? ?name = func.__name__ > ? ?nameat = name + "at" > ? ?doc = getattr(os, nameat).__doc__ > ? ?func.__doc__ = doc.replace("{}(dirfd, ".format(nameat), > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? "{}(".format(name)) > ? ?return func > > class atcontext: > ? ?... > > ? ?@docfix > ? ?def open(self, *args): > ? ? ? ?return self.openat(self.dirf, *args) > > > How do you like my proposal? > > Christian > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- --Guido van Rossum (python.org/~guido) From lists at cheimes.de Sun Jun 17 03:34:13 2012 From: lists at cheimes.de (Christian Heimes) Date: Sun, 17 Jun 2012 03:34:13 +0200 Subject: [Python-ideas] Context helper for new os.*at functions In-Reply-To: References: Message-ID: <4FDD3415.70100@cheimes.de> Am 17.06.2012 02:46, schrieb Guido van Rossum: > Hmm... Isn't Larry Hastings working on replacing the separate > functions with an api where you pass an 'fd=...' argument to the > non-at function? Oh, is he? I didn't know that. Indeed, it sounds like a good approach. Users must still handle the fd correctly and make sure they open a directory. Linux's man(2) open warns about possibility of denial-of-service attempts for wrong fds. Linux has O_DIRECTORY for this purpose. On other OSes users should do a stat() call in front, which is open for race conditions but still better than getting stuck in a FIFO. I could modify the wrapper a bit to make the wrapper useful for the new API: class atcontext: def fileno(self): # for PyObject_AsFileDescriptor() return self.dirfd with atcontext("/etc") as at: os.open("fstab", os.O_RDONLY, fd=at) Christian From techtonik at gmail.com Mon Jun 18 17:26:59 2012 From: techtonik at gmail.com (anatoly techtonik) Date: Mon, 18 Jun 2012 18:26:59 +0300 Subject: [Python-ideas] Just __main__ Message-ID: How about global __main__ as a boolean? __name__ == '__main__' as a mark of entrypoint module is coherent and logical, but awkward to type and requires explicit explaination for newcomers even with prior background in other langauges. From matt at whoosh.ca Mon Jun 18 18:07:04 2012 From: matt at whoosh.ca (Matt Chaput) Date: Mon, 18 Jun 2012 12:07:04 -0400 Subject: [Python-ideas] Just __main__ In-Reply-To: References: Message-ID: <4FDF5228.4060706@whoosh.ca> On 18/06/2012 11:26 AM, anatoly techtonik wrote: > How about global __main__ as a boolean? Love it. From jkbbwr at gmail.com Mon Jun 18 18:09:11 2012 From: jkbbwr at gmail.com (Jakob Bowyer) Date: Mon, 18 Jun 2012 17:09:11 +0100 Subject: [Python-ideas] Just __main__ In-Reply-To: <4FDF5228.4060706@whoosh.ca> References: <4FDF5228.4060706@whoosh.ca> Message-ID: +1 On Mon, Jun 18, 2012 at 5:07 PM, Matt Chaput wrote: > On 18/06/2012 11:26 AM, anatoly techtonik wrote: >> >> How about global __main__ as a boolean? > > > Love it. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From ethan at stoneleaf.us Mon Jun 18 18:17:05 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 18 Jun 2012 09:17:05 -0700 Subject: [Python-ideas] Just __main__ In-Reply-To: References: Message-ID: <4FDF5481.7050403@stoneleaf.us> anatoly techtonik wrote: > How about global __main__ as a boolean? > > __name__ == '__main__' as a mark of entrypoint module is coherent and > logical, but awkward to type and requires explicit explaination for > newcomers even with prior background in other langauges. So instead of: if __name__ == '__main__': ... you would have: if __main__: ... ? ~Ethan~ From ubershmekel at gmail.com Mon Jun 18 18:49:41 2012 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Mon, 18 Jun 2012 19:49:41 +0300 Subject: [Python-ideas] Just __main__ In-Reply-To: <4FDF5481.7050403@stoneleaf.us> References: <4FDF5481.7050403@stoneleaf.us> Message-ID: On Mon, Jun 18, 2012 at 7:17 PM, Ethan Furman wrote: > anatoly techtonik wrote: > >> How about global __main__ as a boolean? >> >> __name__ == '__main__' as a mark of entrypoint module is coherent and >> logical, but awkward to type and requires explicit explaination for >> newcomers even with prior background in other langauges. >> > > So instead of: > > if __name__ == '__main__': > ... > > you would have: > > if __main__: > ... > > ? > > ~Ethan~ > > > +1 Makes sense.... if __main__: sys.exit(main()) http://www.artima.com/weblogs/viewpost.jsp?thread=4829 -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Mon Jun 18 18:57:15 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 19 Jun 2012 01:57:15 +0900 Subject: [Python-ideas] Just __main__ In-Reply-To: <4FDF5481.7050403@stoneleaf.us> References: <4FDF5481.7050403@stoneleaf.us> Message-ID: <878vfkblsk.fsf@uwakimon.sk.tsukuba.ac.jp> Ethan Furman writes: > anatoly techtonik wrote: > > How about global __main__ as a boolean? -1 Saves typing, yes, but otherwise there's no point. It would need just as much explanation, for one thing. > you would have: > > if __main__: > ... Would it be writable? __main__ = False if __main__: print("Oh, I didn't want to run these tests anyway...") From storchaka at gmail.com Mon Jun 18 19:12:39 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Mon, 18 Jun 2012 20:12:39 +0300 Subject: [Python-ideas] Just __main__ In-Reply-To: <4FDF5481.7050403@stoneleaf.us> References: <4FDF5481.7050403@stoneleaf.us> Message-ID: On 18.06.12 19:17, Ethan Furman wrote: > anatoly techtonik wrote: >> How about global __main__ as a boolean? >> >> __name__ == '__main__' as a mark of entrypoint module is coherent and >> logical, but awkward to type and requires explicit explaination for >> newcomers even with prior background in other langauges. > > So instead of: > > if __name__ == '__main__': > ... > > you would have: > > if __main__: > ... > > ? No, it is much easier. import sys if __main__ if sys.version_info >= (3, 9) else __name__ == '__main__': ... or try: __main__ except NameError: __main__ = __name__ == '__main__' if __main__: ... From mikegraham at gmail.com Mon Jun 18 19:25:11 2012 From: mikegraham at gmail.com (Mike Graham) Date: Mon, 18 Jun 2012 13:25:11 -0400 Subject: [Python-ideas] Just __main__ In-Reply-To: References: <4FDF5481.7050403@stoneleaf.us> Message-ID: On Mon, Jun 18, 2012 at 1:12 PM, Serhiy Storchaka wrote: > No, it is much easier. > > import sys > if __main__ if sys.version_info >= (3, 9) else __name__ == '__main__': > ... > > or > > try: > __main__ > except NameError: > __main__ = __name__ == '__main__' > if __main__: > ... That's nonsense. If you wanted to support old Python versions, you'd write `if __name__ == '__main__'` (there's no reason __name__ would change its behavior). If the oldest version you wanted to support had this feature, you're write `if __main__`. This is the way every other new feature works. (It even has the advantage of failing loudly if you try to do it on an older version of Python.) That being said, I'm -0 on the feature. I don't think it's really much easier to explain or worth any effort. Mike From amcnabb at mcnabbs.org Mon Jun 18 20:05:26 2012 From: amcnabb at mcnabbs.org (Andrew McNabb) Date: Mon, 18 Jun 2012 12:05:26 -0600 Subject: [Python-ideas] Just __main__ In-Reply-To: References: <4FDF5481.7050403@stoneleaf.us> Message-ID: <20120618180526.GE21902@mcnabbs.org> On Mon, Jun 18, 2012 at 01:25:11PM -0400, Mike Graham wrote: > > That being said, I'm -0 on the feature. I don't think it's really much > easier to explain or worth any effort. I agree that having a boolean called "__main__" wouldn't add much value, but I believe that recognizing a function called "__main__" could potentially add a bit more value. After executing the body of a script, the interpreter would automatically call the "__main__" function if it exists, and exit with its return value. Thus: def __main__(): return 42 would be roughly equivalent to: if __name__ == '__main__': sys.exit(42) It might make sense to have "python -i" not call the "__main__" function, making it easier to interact with a script after the time that its methods and global variables are all defined but before the time that it enters __main__. I'm not sure if a "__main__" function would add enough value, but I think it would add more value than a "__main__" boolean. -- Andrew McNabb http://www.mcnabbs.org/andrew/ PGP Fingerprint: 8A17 B57C 6879 1863 DE55 8012 AB4D 6098 8826 6868 From jeremiah.dodds at gmail.com Mon Jun 18 20:59:27 2012 From: jeremiah.dodds at gmail.com (Jeremiah Dodds) Date: Mon, 18 Jun 2012 14:59:27 -0400 Subject: [Python-ideas] Just __main__ In-Reply-To: <20120618180526.GE21902@mcnabbs.org> (Andrew McNabb's message of "Mon, 18 Jun 2012 12:05:26 -0600") References: <4FDF5481.7050403@stoneleaf.us> <20120618180526.GE21902@mcnabbs.org> Message-ID: <87txy8cupc.fsf@destructor.i-did-not-set--mail-host-address--so-tickle-me> Andrew McNabb writes: > I'm not sure if a "__main__" function would add enough value, but I > think it would add more value than a "__main__" boolean. +1 . From bruce at leapyear.org Mon Jun 18 21:39:05 2012 From: bruce at leapyear.org (Bruce Leban) Date: Mon, 18 Jun 2012 12:39:05 -0700 Subject: [Python-ideas] Just __main__ In-Reply-To: <20120618180526.GE21902@mcnabbs.org> References: <4FDF5481.7050403@stoneleaf.us> <20120618180526.GE21902@mcnabbs.org> Message-ID: On Mon, Jun 18, 2012 at 11:05 AM, Andrew McNabb wrote: > > I agree that having a boolean called "__main__" wouldn't add much value, > but I believe that recognizing a function called "__main__" could > potentially add a bit more value. > > After executing the body of a script, the interpreter would > automatically call the "__main__" function if it exists, and exit with > its return value. > The special value of __name__ and the proposed __main__() function are both a bit magic. However, when I write if __name__ == '__main__' it's at least clear that that if statement *will* be executed. It's just a question of when the condition is true and if I don't know I can find out fairly easily. (As I did the first time I saw it and probably other people on this list did too.) On the other hand, it's not at all obvious that a function named __main__ will be executed automagically. This will increase the python learning curve, because people will need to learn both the old method and the new method, especially since code that is compatible with multiple python versions will need to continue to use the old method. It saves one or two lines: if __name__ == '__main__': main() A __main__ boolean, that saves even less typing, and does not seem worth adding either. -1 for both --- Bruce Follow me: http://www.twitter.com/Vroo http://www.vroospeak.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From massimo.dipierro at gmail.com Mon Jun 18 21:58:38 2012 From: massimo.dipierro at gmail.com (Massimo DiPierro) Date: Mon, 18 Jun 2012 14:58:38 -0500 Subject: [Python-ideas] Just __main__ In-Reply-To: References: <4FDF5481.7050403@stoneleaf.us> <20120618180526.GE21902@mcnabbs.org> Message-ID: <9283F87E-DD4B-4750-96A2-B3DC1E39C167@gmail.com> how about a decorator that have the same effect as calling the function is __name__=='__main__'? On Jun 18, 2012, at 2:39 PM, Bruce Leban wrote: > > On Mon, Jun 18, 2012 at 11:05 AM, Andrew McNabb wrote: > > I agree that having a boolean called "__main__" wouldn't add much value, > but I believe that recognizing a function called "__main__" could > potentially add a bit more value. > > After executing the body of a script, the interpreter would > automatically call the "__main__" function if it exists, and exit with > its return value. > > The special value of __name__ and the proposed __main__() function are both a bit magic. However, when I write if __name__ == '__main__' it's at least clear that that if statement *will* be executed. It's just a question of when the condition is true and if I don't know I can find out fairly easily. (As I did the first time I saw it and probably other people on this list did too.) On the other hand, it's not at all obvious that a function named __main__ will be executed automagically. > > This will increase the python learning curve, because people will need to learn both the old method and the new method, especially since code that is compatible with multiple python versions will need to continue to use the old method. It saves one or two lines: > > if __name__ == '__main__': main() > > A __main__ boolean, that saves even less typing, and does not seem worth adding either. > > -1 for both > > --- Bruce > Follow me: http://www.twitter.com/Vroo http://www.vroospeak.com > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From amcnabb at mcnabbs.org Mon Jun 18 22:09:33 2012 From: amcnabb at mcnabbs.org (Andrew McNabb) Date: Mon, 18 Jun 2012 14:09:33 -0600 Subject: [Python-ideas] Just __main__ In-Reply-To: References: <4FDF5481.7050403@stoneleaf.us> <20120618180526.GE21902@mcnabbs.org> Message-ID: <20120618200932.GH21902@mcnabbs.org> On Mon, Jun 18, 2012 at 12:39:05PM -0700, Bruce Leban wrote: > > The special value of __name__ and the proposed __main__() function are both > a bit magic. However, when I write if __name__ == '__main__' it's at least > clear that that if statement *will* be executed. It's just a question of > when the condition is true and if I don't know I can find out fairly > easily. (As I did the first time I saw it and probably other people on this > list did too.) On the other hand, it's not at all obvious that a function > named __main__ will be executed automagically. Given that C, Java, and numerous other languages automagically execute a function called "main", I would argue that a "__main__" function would actually be _less_ surprising than "if __name__ == '__main__'" for most new Python users. > This will increase the python learning curve, because people will need to > learn both the old method and the new method, especially since code that is > compatible with multiple python versions will need to continue to use the > old method. It saves one or two lines: > > if __name__ == '__main__': main() If the only difference is saving a few lines, I agree that it probably isn't worth it. However, it also allows for a richer interactive mode as I mentioned previously, so the benefit may not be limited to the neglible number of lines saved. -- Andrew McNabb http://www.mcnabbs.org/andrew/ PGP Fingerprint: 8A17 B57C 6879 1863 DE55 8012 AB4D 6098 8826 6868 From storchaka at gmail.com Mon Jun 18 22:14:04 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Mon, 18 Jun 2012 23:14:04 +0300 Subject: [Python-ideas] Just __main__ In-Reply-To: References: <4FDF5481.7050403@stoneleaf.us> Message-ID: On 18.06.12 20:25, Mike Graham wrote: > That's nonsense. Of cause. This is a reductio ad absurdum. From solipsis at pitrou.net Mon Jun 18 22:13:50 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 18 Jun 2012 22:13:50 +0200 Subject: [Python-ideas] Just __main__ References: <4FDF5481.7050403@stoneleaf.us> <20120618180526.GE21902@mcnabbs.org> <20120618200932.GH21902@mcnabbs.org> Message-ID: <20120618221350.39e7992c@pitrou.net> On Mon, 18 Jun 2012 14:09:33 -0600 Andrew McNabb wrote: > On Mon, Jun 18, 2012 at 12:39:05PM -0700, Bruce Leban wrote: > > > > The special value of __name__ and the proposed __main__() function are both > > a bit magic. However, when I write if __name__ == '__main__' it's at least > > clear that that if statement *will* be executed. It's just a question of > > when the condition is true and if I don't know I can find out fairly > > easily. (As I did the first time I saw it and probably other people on this > > list did too.) On the other hand, it's not at all obvious that a function > > named __main__ will be executed automagically. > > Given that C, Java, and numerous other languages automagically execute a > function called "main", I would argue that a "__main__" function would > actually be _less_ surprising than "if __name__ == '__main__'" for most > new Python users. Yes, a __main__ function would be reasonable, especially now that we have __main__.py files in packages. Massimo's suggestion of a decorator, OTOH, sounds useless: how would it help in any way? Regards Antoine. From ethan at stoneleaf.us Mon Jun 18 22:24:28 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 18 Jun 2012 13:24:28 -0700 Subject: [Python-ideas] Just __main__ In-Reply-To: <9283F87E-DD4B-4750-96A2-B3DC1E39C167@gmail.com> References: <4FDF5481.7050403@stoneleaf.us> <20120618180526.GE21902@mcnabbs.org> <9283F87E-DD4B-4750-96A2-B3DC1E39C167@gmail.com> Message-ID: <4FDF8E7C.5090600@stoneleaf.us> Massimo DiPierro wrote: > how about a decorator that have the same effect as calling the function > is __name__=='__main__'? I believe several have been written... something like (untested): def main(automagically_run): if __name__ == 'main': automagically_run() return automagically_run # assuming SystemExit wasn't raised ;) ~Ethan~ From ethan at stoneleaf.us Mon Jun 18 22:38:31 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 18 Jun 2012 13:38:31 -0700 Subject: [Python-ideas] Just __main__ In-Reply-To: <20120618221350.39e7992c@pitrou.net> References: <4FDF5481.7050403@stoneleaf.us> <20120618180526.GE21902@mcnabbs.org> <20120618200932.GH21902@mcnabbs.org> <20120618221350.39e7992c@pitrou.net> Message-ID: <4FDF91C7.1010103@stoneleaf.us> Antoine Pitrou wrote: > On Mon, 18 Jun 2012 14:09:33 -0600 > Andrew McNabb wrote: >> On Mon, Jun 18, 2012 at 12:39:05PM -0700, Bruce Leban wrote: >>> The special value of __name__ and the proposed __main__() function are both >>> a bit magic. However, when I write if __name__ == '__main__' it's at least >>> clear that that if statement *will* be executed. It's just a question of >>> when the condition is true and if I don't know I can find out fairly >>> easily. (As I did the first time I saw it and probably other people on this >>> list did too.) On the other hand, it's not at all obvious that a function >>> named __main__ will be executed automagically. >> Given that C, Java, and numerous other languages automagically execute a >> function called "main", I would argue that a "__main__" function would >> actually be _less_ surprising than "if __name__ == '__main__'" for most >> new Python users. > > Yes, a __main__ function would be reasonable, especially now that we > have __main__.py files in packages. > > Massimo's suggestion of a decorator, OTOH, sounds useless: how would it > help in any way? I've actually tried the @main decorator approach, and found it not worth the trouble -- I went back to 'if __name__ == "__main__"'. ~Ethan~ From steve at pearwood.info Mon Jun 18 23:26:16 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 19 Jun 2012 07:26:16 +1000 Subject: [Python-ideas] Just __main__ In-Reply-To: <20120618200932.GH21902@mcnabbs.org> References: <4FDF5481.7050403@stoneleaf.us> <20120618180526.GE21902@mcnabbs.org> <20120618200932.GH21902@mcnabbs.org> Message-ID: <4FDF9CF8.40105@pearwood.info> Andrew McNabb wrote: > On Mon, Jun 18, 2012 at 12:39:05PM -0700, Bruce Leban wrote: >> The special value of __name__ and the proposed __main__() function are both >> a bit magic. However, when I write if __name__ == '__main__' it's at least >> clear that that if statement *will* be executed. It's just a question of >> when the condition is true and if I don't know I can find out fairly >> easily. (As I did the first time I saw it and probably other people on this >> list did too.) On the other hand, it's not at all obvious that a function >> named __main__ will be executed automagically. > > Given that C, Java, and numerous other languages automagically execute a > function called "main", I would argue that a "__main__" function would > actually be _less_ surprising than "if __name__ == '__main__'" for most > new Python users. What makes you think that "most" new users will be experienced in C or Java? I think it is more likely that the majority of new users will have no experience in programming at all, or that their primary experience will be in PHP or Javascript. But we're all just guessing really. I don't think any of us know what languages most current Python users came from, let alone what future ones will come from. But as a matter of principle, I would prefer to assume that new users come in with as few preconceived ideas as possible, rather than assuming that they expect Python to be just like . -- Steven From ned at nedbatchelder.com Mon Jun 18 23:35:03 2012 From: ned at nedbatchelder.com (Ned Batchelder) Date: Mon, 18 Jun 2012 17:35:03 -0400 Subject: [Python-ideas] Just __main__ In-Reply-To: <20120618200932.GH21902@mcnabbs.org> References: <4FDF5481.7050403@stoneleaf.us> <20120618180526.GE21902@mcnabbs.org> <20120618200932.GH21902@mcnabbs.org> Message-ID: <4FDF9F07.9080707@nedbatchelder.com> On 6/18/2012 4:09 PM, Andrew McNabb wrote: > On Mon, Jun 18, 2012 at 12:39:05PM -0700, Bruce Leban wrote: >> The special value of __name__ and the proposed __main__() function are both >> a bit magic. However, when I write if __name__ == '__main__' it's at least >> clear that that if statement *will* be executed. It's just a question of >> when the condition is true and if I don't know I can find out fairly >> easily. (As I did the first time I saw it and probably other people on this >> list did too.) On the other hand, it's not at all obvious that a function >> named __main__ will be executed automagically. > Given that C, Java, and numerous other languages automagically execute a > function called "main", I would argue that a "__main__" function would > actually be _less_ surprising than "if __name__ == '__main__'" for most > new Python users. But a __main__ function misses the whole point: that a module can be importable and runnable, and the if statement detects the difference. If you simply want a function that is always invoked as the main, then just invoke it: def main(): blah blah main() No need for special names at all. --Ned. -------------- next part -------------- An HTML attachment was scrubbed... URL: From matt at whoosh.ca Tue Jun 19 00:39:14 2012 From: matt at whoosh.ca (Matt Chaput) Date: Mon, 18 Jun 2012 18:39:14 -0400 Subject: [Python-ideas] Just __main__ In-Reply-To: <4FDF9F07.9080707@nedbatchelder.com> References: <4FDF5481.7050403@stoneleaf.us> <20120618180526.GE21902@mcnabbs.org> <20120618200932.GH21902@mcnabbs.org> <4FDF9F07.9080707@nedbatchelder.com> Message-ID: <4FDFAE12.1020301@whoosh.ca> > But a __main__ function misses the whole point: that a module can be > importable and runnable, and the if statement detects the difference. If > you simply want a function that is always invoked as the main, then just > invoke it: > > def main(): > blah blah > > main() > > No need for special names at all. I'm afraid you're the one who's missed the point... the interpreter would only call __main__() if __name__ == "__main__" Some people will cry "magic", but to me this is about what makes sense when you explain it to someone, and I think __main__() makes more sense (especially to someone with experience in other languages) than "if __name__ == "__main__"" Matt From ned at nedbatchelder.com Tue Jun 19 02:37:56 2012 From: ned at nedbatchelder.com (Ned Batchelder) Date: Mon, 18 Jun 2012 20:37:56 -0400 Subject: [Python-ideas] Just __main__ In-Reply-To: <4FDFAE12.1020301@whoosh.ca> References: <4FDF5481.7050403@stoneleaf.us> <20120618180526.GE21902@mcnabbs.org> <20120618200932.GH21902@mcnabbs.org> <4FDF9F07.9080707@nedbatchelder.com> <4FDFAE12.1020301@whoosh.ca> Message-ID: <4FDFC9E4.502@nedbatchelder.com> On 6/18/2012 6:39 PM, Matt Chaput wrote: >> But a __main__ function misses the whole point: that a module can be >> importable and runnable, and the if statement detects the difference. If >> you simply want a function that is always invoked as the main, then just >> invoke it: >> >> def main(): >> blah blah >> >> main() >> >> No need for special names at all. > > I'm afraid you're the one who's missed the point... the interpreter > would only call __main__() if __name__ == "__main__" > > Some people will cry "magic", but to me this is about what makes sense > when you explain it to someone, and I think __main__() makes more > sense (especially to someone with experience in other languages) than > "if __name__ == "__main__"" > I understand the proposal now, and yes, it is "magic". Explicit is better than implicit. I like this explanation: "When you run a Python program, all the statements are run, from top to bottom." better than, "When you run a Python program, all the statements are run, from top to bottom, and then if there is a __main__ function (which there need not be), then it is invoked." Python is full of constructs that are simpler than other languages, which when used in conventional ways, act similar to other languages. No need to complicate things to make it easier for C programmers to understand. There's a lot they need to get used to in Python, and "if __name__ == '__main__':" is not difficult. --Ned. > Matt > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > From techtonik at gmail.com Tue Jun 19 09:01:25 2012 From: techtonik at gmail.com (anatoly techtonik) Date: Tue, 19 Jun 2012 10:01:25 +0300 Subject: [Python-ideas] Just __main__ In-Reply-To: <878vfkblsk.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4FDF5481.7050403@stoneleaf.us> <878vfkblsk.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Mon, Jun 18, 2012 at 7:57 PM, Stephen J. Turnbull wrote: > Ethan Furman writes: > ?> anatoly techtonik wrote: > ?> > How about global __main__ as a boolean? > > -1 > > Saves typing, yes, but otherwise there's no point. ?It would need just > as much explanation, for one thing. It would be more convincing to have a solid counter argument for -1, or else I am inclined to count 'no point' arguments as -0. > ?> you would have: > ?> > ?> ? ?if __main__: > ?> ? ? ?... > > Would it be writable? The same way as __name__. Yes. From techtonik at gmail.com Tue Jun 19 09:52:36 2012 From: techtonik at gmail.com (anatoly techtonik) Date: Tue, 19 Jun 2012 10:52:36 +0300 Subject: [Python-ideas] Just __main__ In-Reply-To: <87txy8cupc.fsf@destructor.i-did-not-set--mail-host-address--so-tickle-me> References: <4FDF5481.7050403@stoneleaf.us> <20120618180526.GE21902@mcnabbs.org> <87txy8cupc.fsf@destructor.i-did-not-set--mail-host-address--so-tickle-me> Message-ID: On Mon, Jun 18, 2012 at 9:59 PM, Jeremiah Dodds wrote: > Andrew McNabb writes: > >> I'm not sure if a "__main__" function would add enough value, but I >> think it would add more value than a "__main__" boolean. > > +1 . My first thoughts is that __main__() as a function is bad for Python, and here is why. In C, Java and other compiled languages so-called main() function is the primary execution entrypoint. With no main() there was no way to instruct compiler what should be run first, so logically it will just start with the first function at the top (and if you remember early compliers - you can only call functions that are already defined, that means written above yours). Code exection always started with main(). It was the first application byte to start with when program was loaded to memory by OS. In Python execution of a program code starts before the if __name__ == '__main__' is encountered (and it's awesome feature of a scripting language to start execution immediately). With automagical __main__() function it will also start before. That's why __main__() will never be the substitution for the classical C style entrypoint. In Python entrypoint is a module (entrypoint namespace). __name__ is equal to '__main__' not only in a script, but also in console. And __main__ as a flag in this namespace correctly reflects this semantic - "Is this a main namespace? True". A value of __name__ in console doesn't. So, __main__() function is not equivalent to C/Java entrypoint. However, a function like this may play an important role to mark the end of the "import phase" or "initialization phase". A high level concept that is extremely useful for web applications/servers/frameworks, who need to know when an application processes can be more effectively forked. Here is one more problem - when module is executed as a script, it loses its __name__, which becomes equal to '__main__'. I don't know if it ever caused any problems with imports or consistency in object space, or with static imports - it will be interesting to know any outcomes. What if module __name__ always meant module name? But that's another thread. As for __main__ - in this case instead of boolean it could be the name of the entrypoint module, and the check would be if __name__ == __main__ without quotes. From stephen at xemacs.org Tue Jun 19 09:58:30 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 19 Jun 2012 16:58:30 +0900 Subject: [Python-ideas] Just __main__ In-Reply-To: References: <4FDF5481.7050403@stoneleaf.us> <878vfkblsk.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <874nq7bumx.fsf@uwakimon.sk.tsukuba.ac.jp> anatoly techtonik writes: > It would be more convincing to have a solid counter argument for > -1, "Not every three-line function needs to be a builtin." "Explicit is better than implicit." "Simple is better than complex." "There should be one (and preferably only one) obvious way to do it." > or else I am inclined to count 'no point' arguments as -0. Feel free; it doesn't matter to me, and I don't much matter to the decision, either. Not to mention that you don't do the counting. The people who will actually make a decision on this don't need it spelled out, though, and your ideas would get better reception from Those Whose Opinions Really Count if you would filter them through the Zen before posting. From techtonik at gmail.com Tue Jun 19 09:56:30 2012 From: techtonik at gmail.com (anatoly techtonik) Date: Tue, 19 Jun 2012 10:56:30 +0300 Subject: [Python-ideas] Just __main__ In-Reply-To: References: <4FDF5481.7050403@stoneleaf.us> <20120618180526.GE21902@mcnabbs.org> <87txy8cupc.fsf@destructor.i-did-not-set--mail-host-address--so-tickle-me> Message-ID: On Tue, Jun 19, 2012 at 10:52 AM, anatoly techtonik wrote: > I don't know if it ever caused any problems with imports or consistency in object > space, or with static imports - it will be interesting to know any > outcomes. s/static imports/static analysis/ From storchaka at gmail.com Tue Jun 19 10:14:47 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 19 Jun 2012 11:14:47 +0300 Subject: [Python-ideas] Just __main__ In-Reply-To: <20120618180526.GE21902@mcnabbs.org> References: <4FDF5481.7050403@stoneleaf.us> <20120618180526.GE21902@mcnabbs.org> Message-ID: On 18.06.12 21:05, Andrew McNabb wrote: > It might make sense to have "python -i" not call the "__main__" > function, making it easier to interact with a script after the time that > its methods and global variables are all defined but before the time > that it enters __main__. python -i -c "from SCRIPT import *" From simon.sapin at kozea.fr Tue Jun 19 11:41:03 2012 From: simon.sapin at kozea.fr (Simon Sapin) Date: Tue, 19 Jun 2012 11:41:03 +0200 Subject: [Python-ideas] Just __main__ In-Reply-To: References: <4FDF5481.7050403@stoneleaf.us> <20120618180526.GE21902@mcnabbs.org> <87txy8cupc.fsf@destructor.i-did-not-set--mail-host-address--so-tickle-me> Message-ID: <4FE0492F.9040908@kozea.fr> Le 19/06/2012 09:52, anatoly techtonik a ?crit : > Here is one more problem - when module is executed as a script, it > loses its __name__, which becomes equal to '__main__'. PEP 395 "Qualified Names for Modules" tries to address this. http://www.python.org/dev/peps/pep-0395/ Regards, -- Simon Sapin From ubershmekel at gmail.com Tue Jun 19 13:27:59 2012 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Tue, 19 Jun 2012 14:27:59 +0300 Subject: [Python-ideas] Just __main__ In-Reply-To: <4FE0492F.9040908@kozea.fr> References: <4FDF5481.7050403@stoneleaf.us> <20120618180526.GE21902@mcnabbs.org> <87txy8cupc.fsf@destructor.i-did-not-set--mail-host-address--so-tickle-me> <4FE0492F.9040908@kozea.fr> Message-ID: On Tue, Jun 19, 2012 at 12:41 PM, Simon Sapin wrote: > Le 19/06/2012 09:52, anatoly techtonik a ?crit : > > Here is one more problem - when module is executed as a script, it >> loses its __name__, which becomes equal to '__main__'. >> > > PEP 395 "Qualified Names for Modules" tries to address this. > > http://www.python.org/dev/**peps/pep-0395/ > > I agree that python does not need any magic __main__ function. The __main__ boolean is streets ahead in readability though. Yuval -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexandre.zani at gmail.com Tue Jun 19 17:02:37 2012 From: alexandre.zani at gmail.com (Alexandre Zani) Date: Tue, 19 Jun 2012 08:02:37 -0700 Subject: [Python-ideas] Just __main__ In-Reply-To: References: <4FDF5481.7050403@stoneleaf.us> <20120618180526.GE21902@mcnabbs.org> <87txy8cupc.fsf@destructor.i-did-not-set--mail-host-address--so-tickle-me> <4FE0492F.9040908@kozea.fr> Message-ID: -1 on a __main__ function. Seems like unnecessarily confusing magic. -0 on a __main__ boolean. It just doesn't seem to add much value on top of __name__ == '__main__'. On Tue, Jun 19, 2012 at 4:27 AM, Yuval Greenfield wrote: > On Tue, Jun 19, 2012 at 12:41 PM, Simon Sapin wrote: >> >> Le 19/06/2012 09:52, anatoly techtonik a ?crit : >> >>> Here is one more problem - when module is executed as a script, it >>> loses its __name__, which becomes equal to '__main__'. >> >> >> PEP 395 "Qualified Names for Modules" tries to address this. >> >> http://www.python.org/dev/peps/pep-0395/ >> > > I agree that python does not need any magic __main__ function. The __main__ > boolean is streets ahead in readability though. > > Yuval > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > From barry at python.org Wed Jun 20 21:06:08 2012 From: barry at python.org (Barry Warsaw) Date: Wed, 20 Jun 2012 15:06:08 -0400 Subject: [Python-ideas] Add adaptive-load salt-mandatory hashing functions? References: <43D1A0EB-A850-4F67-9B6F-D852DE23B0F8@masklinn.net> <4FD4AC72.3070206@kozea.fr> <5E974769-E9F0-4F0D-BE45-CC2BD55E8902@masklinn.net> <4FD4E1A1.6030409@kozea.fr> <8C7C9869-CB87-4E98-AD4E-AACD83375676@masklinn.net> <4FD4E66B.7080802@kozea.fr> <25BE8DE9-B5DF-48B9-B4DF-E6C82033C3E1@masklinn.net> Message-ID: <20120620150608.7f2ffbdb@resist.wooz.org> On Jun 15, 2012, at 07:07 PM, Eli Collins wrote: >The reason I see a need for such a function is that all existing password >hashing libraries (passlib, cryptacular, flufl.password, >django.contrib.auth.hashers, etc) have had to roll their own pure-python >pbkdf2 implementations, to varying degrees of speed. And speed is paramount >for pbkdf2 usage, since security depends on squeezing as many rounds / second >out of the implementation as possible. To be honest, if I'd known about passlib I probably would never have written flufl.password. Extra +1 goodness for passlib's Python 3 support! I'm going to migrate my own applications to passlib and if that goes well, I'll start the process of deprecating flufl.password. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From sven at marnach.net Thu Jun 21 19:04:37 2012 From: sven at marnach.net (Sven Marnach) Date: Thu, 21 Jun 2012 18:04:37 +0100 Subject: [Python-ideas] Just __main__ In-Reply-To: References: Message-ID: <20120621170437.GB4153@bagheera> anatoly techtonik schrieb am Mon, 18. Jun 2012, um 18:26:59 +0300: > How about global __main__ as a boolean? Currently, __main__ is the name of a module. You can do import __main__ to import this module. After this import, __main__ evaluates to True as a Boolean expression. I don't think it's a good idea to overload the meaning of the __main__. Cheers, Sven From steve at pearwood.info Fri Jun 22 03:20:16 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 22 Jun 2012 11:20:16 +1000 Subject: [Python-ideas] Just __main__ In-Reply-To: <20120621170437.GB4153@bagheera> References: <20120621170437.GB4153@bagheera> Message-ID: <4FE3C850.9060508@pearwood.info> Sven Marnach wrote: > anatoly techtonik schrieb am Mon, 18. Jun 2012, um 18:26:59 +0300: >> How about global __main__ as a boolean? > > Currently, __main__ is the name of a module. You can do > > import __main__ > > to import this module. After this import, __main__ evaluates to True > as a Boolean expression. > > I don't think it's a good idea to overload the meaning of the > __main__. Well caught! I think that kills this proposal dead. -- Steven From kim at mvps.org Mon Jun 25 14:17:29 2012 From: kim at mvps.org (=?ISO-8859-1?Q?Kim_Gr=E4sman?=) Date: Mon, 25 Jun 2012 14:17:29 +0200 Subject: [Python-ideas] BackupFile Message-ID: Hello, I'm new here, so forgive me if this has been discussed before or is off-topic. I came up with a mechanism that I thought might be useful in the Python standard library -- a scope-bound self-restoring backup file. I came to this na?ve implementation; -- class BackupError(Exception): pass class Backup: def __init__(self, path): if not os.path.exists(path) or os.path.isdir(path): raise BackupError("%s must be a valid file path" % path) self.path = path self.backup_path = None def __enter__(self): self.backup() def __exit__(self, type, value, traceback): self.restore() def _generate_backup_path(self): tempdir = tempfile.mkdtemp() basename = os.path.basename(self.path) return os.path.join(tempdir, basename) def backup(self): backup_path = self._generate_backup_path() shutil.copy(self.path, backup_path) self.backup_path = backup_path def restore(self): if self.backup_path: # Write backup back onto original shutil.copy(self.backup_path, self.path) shutil.rmtree(os.path.dirname(self.backup_path)) self.backup_path = None -- Backups are intended to be scope-bound like so: with Backup(settings_file): rewrite_settings(settings_file) do_something_else() I even managed to use it with the @contextmanager attribute, to allow this: with rewrite_settings(settings_file): do_something_else() So, open questions; - Would something like this be useful outside of my office? - Any suggestions for better names? - This feels like it belongs in the tempfile module, would you agree? - What's lacking in the implementation? Have I done something decidedly non-Pythonic? Thanks, - Kim From masklinn at masklinn.net Mon Jun 25 14:33:36 2012 From: masklinn at masklinn.net (Masklinn) Date: Mon, 25 Jun 2012 14:33:36 +0200 Subject: [Python-ideas] BackupFile In-Reply-To: References: Message-ID: <6EE0385E-F940-4FE1-B604-718B0979B925@masklinn.net> On 2012-06-25, at 14:17 , Kim Gr?sman wrote: > > - Would something like this be useful outside of my office? I'm not sure I correctly understand the purpose of this, and if I do it seems to be kind-of a hack for "fixing" kind-of crummy code: is it correct that the goal is to temporarily edit a file (and restore it later) to change the behavior of *other* pieces of code reading the same file? So essentially dynamically scoping the content of a file? I find the idea rather troublesome/problematic, as it's completely blind to (and unsafe under) concurrent access, and will be tricky to handle cleanly wrt filesystem caches and commits. The initial mail hinted at atomic file replacement *or* backuping a file and restoring the backup on error, something along the lines of: with Backup(settings_file): alter_file() alter_file_2() alter_file_3() # altered file with Backup(settings_file): alter_file() alter_file_2() raise Exception("boom") alter_file_3() # old file is back in the same way e.g. Emacs will keep "~" files around during edition. That could have been a ~+1 for me, but the behavior as I understood it (understanding which may be incorrect, again) I'd be ?1 on, it seems too dangerous and too tied to other issues in the code. From mikegraham at gmail.com Mon Jun 25 15:42:22 2012 From: mikegraham at gmail.com (Mike Graham) Date: Mon, 25 Jun 2012 09:42:22 -0400 Subject: [Python-ideas] BackupFile In-Reply-To: References: Message-ID: On Mon, Jun 25, 2012 at 8:17 AM, Kim Gr?sman wrote: > Hello, > > I'm new here, so forgive me if this has been discussed before or is off-topic. > > I came up with a mechanism that I thought might be useful in the > Python standard library -- a scope-bound self-restoring backup file. I > came to this na?ve implementation; > > -- > class BackupError(Exception): > ? pass > > class Backup: > ? def __init__(self, path): > ? ? ? if not os.path.exists(path) or os.path.isdir(path): > ? ? ? ? ? raise BackupError("%s must be a valid file path" % path) > > ? ? ? self.path = path > ? ? ? self.backup_path = None > > ? def __enter__(self): > ? ? ? self.backup() > > ? def __exit__(self, type, value, traceback): > ? ? ? self.restore() > > ? def _generate_backup_path(self): > ? ? ? tempdir = tempfile.mkdtemp() > ? ? ? basename = os.path.basename(self.path) > ? ? ? return os.path.join(tempdir, basename) > > ? def backup(self): > ? ? ? backup_path = self._generate_backup_path() > ? ? ? shutil.copy(self.path, backup_path) > ? ? ? self.backup_path = backup_path > > ? def restore(self): > ? ? ? if self.backup_path: > ? ? ? ? ? # Write backup back onto original > ? ? ? ? ? shutil.copy(self.backup_path, self.path) > ? ? ? ? ? shutil.rmtree(os.path.dirname(self.backup_path)) > ? ? ? ? ? self.backup_path = None > -- > > Backups are intended to be scope-bound like so: > > ?with Backup(settings_file): > ? ?rewrite_settings(settings_file) > ? ?do_something_else() > > I even managed to use it with the @contextmanager attribute, to allow this: > > ?with rewrite_settings(settings_file): > ? ?do_something_else() > > So, open questions; > > - Would something like this be useful outside of my office? > - Any suggestions for better names? > - This feels like it belongs in the tempfile module, would you agree? > - What's lacking in the implementation? Have I done something > decidedly non-Pythonic? > > Thanks, > - Kim I like the basic idea, but if we do something like this, it would be useful to have read access to the old version of the file while you are writing out the new version that might become permanent. If I was to implement something like this, I'd use a "right a temporary file then copy it overwriting the old one when I'm done" approach rather than a "back up the file" approach so that if the process dies for a reason Python can't clean up after (like due to SIGKILL), the half-written file doesn't remain. I don't really like the name Backup but I can't think of a better name at the moment. Mike From lists at cheimes.de Mon Jun 25 15:59:40 2012 From: lists at cheimes.de (Christian Heimes) Date: Mon, 25 Jun 2012 15:59:40 +0200 Subject: [Python-ideas] BackupFile In-Reply-To: References: Message-ID: Am 25.06.2012 14:17, schrieb Kim Gr?sman: > Hello, > > I'm new here, so forgive me if this has been discussed before or is off-topic. > > I came up with a mechanism that I thought might be useful in the > Python standard library -- a scope-bound self-restoring backup file. I > came to this na?ve implementation; Are you aiming for atomic file rollover backed by a temporary file? That's the common way to safely overwrite an existing file. It works differently than your code. * Create a temporary file with O_CREAT | O_EXCL in the same directory as the file you like to replace * Write data to new file * Call sync() on the file as well as fdatasync() and fsync() on the file descriptor * close the file * use atomic rename to replace the old file with the new file (IIRC won't work atomically on Windows) I've some code laying around somewhere that implements a RolloverFile similar to tempfile.NamedTemporaryFile. Christian From masklinn at masklinn.net Mon Jun 25 16:23:00 2012 From: masklinn at masklinn.net (Masklinn) Date: Mon, 25 Jun 2012 16:23:00 +0200 Subject: [Python-ideas] BackupFile In-Reply-To: References: Message-ID: <1841F93D-1936-4C4D-8319-CC0A2618654E@masklinn.net> On 2012-06-25, at 15:59 , Christian Heimes wrote: > Am 25.06.2012 14:17, schrieb Kim Gr?sman: >> Hello, >> >> I'm new here, so forgive me if this has been discussed before or is off-topic. >> >> I came up with a mechanism that I thought might be useful in the >> Python standard library -- a scope-bound self-restoring backup file. I >> came to this na?ve implementation; > > Are you aiming for atomic file rollover backed by a temporary file? No, see my mail and his confirmation, it's a shim to dynamically (scope-wise) rewrite sections of a configuration file (and undo the rewrites thereafter) because that's the sole way to configure a third-party library. From kim at mvps.org Mon Jun 25 16:39:33 2012 From: kim at mvps.org (=?ISO-8859-1?Q?Kim_Gr=E4sman?=) Date: Mon, 25 Jun 2012 16:39:33 +0200 Subject: [Python-ideas] BackupFile In-Reply-To: References: <6EE0385E-F940-4FE1-B604-718B0979B925@masklinn.net> Message-ID: +python-ideas ---------- Forwarded message ---------- From: Kim Gr?sman Date: Mon, Jun 25, 2012 at 3:21 PM Subject: Re: [Python-ideas] BackupFile To: Masklinn Hi Masklinn, Thanks for your response! On Mon, Jun 25, 2012 at 2:33 PM, Masklinn wrote: > On 2012-06-25, at 14:17 , Kim Gr?sman wrote: >> >> - Would something like this be useful outside of my office? > > I'm not sure I correctly understand the purpose of this, and if I do it > seems to be kind-of a hack for "fixing" kind-of crummy code: is it > correct that the goal is to temporarily edit a file (and restore it > later) to change the behavior of *other* pieces of code reading the same > file? > > So essentially dynamically scoping the content of a file? Yes, that's it. I use it to adapt the behavior of third-party code I can only affect through configuration files. > I find the idea rather troublesome/problematic, as it's completely > blind to (and unsafe under) concurrent access, and will be tricky to > handle cleanly wrt filesystem caches and commits. Good point. I use this in a controlled environment, where I know nobody else is using the file. Multiple concurrent users would break this completely... > The initial mail hinted at atomic file replacement *or* backuping a file > and restoring the backup on error, something along the lines of: > > ? ?with Backup(settings_file): > ? ? ? ?alter_file() > ? ? ? ?alter_file_2() > ? ? ? ?alter_file_3() > ? ?# altered file Nope, not this. > ? ?with Backup(settings_file): > ? ? ? ?alter_file() > ? ? ? ?alter_file_2() > ? ? ? ?raise Exception("boom") > ? ? ? ?alter_file_3() > ? ?# old file is back This is what I was aiming for, except old file would be unconditionally restored. > in the same way e.g. Emacs will keep "~" files around during edition. That > could have been a ~+1 for me, but the behavior as I understood it > (understanding which may be incorrect, again) I'd be -1 on, it seems too > dangerous and too tied to other issues in the code. Yeah, I think the concurrency aspect of it makes it easy to misuse, so it's probably not a good fit for the standard library. - Kim From kim at mvps.org Mon Jun 25 17:03:12 2012 From: kim at mvps.org (=?ISO-8859-1?Q?Kim_Gr=E4sman?=) Date: Mon, 25 Jun 2012 17:03:12 +0200 Subject: [Python-ideas] BackupFile In-Reply-To: References: Message-ID: Hi Mike, On Mon, Jun 25, 2012 at 3:41 PM, Mike Graham wrote: > > I like the basic idea, but if we do something like this, it would be > useful to have read access to the old version of the file while you > are writing out the new version that might become permanent. Thanks, though this sounds like another mechanism than the one I'm aiming for :-) I want to replace an existing file temporarily, and then restore it no matter what. > If I was to implement something like this, I'd use a "right a > temporary file then copy it overwriting the old one when I'm done" > approach rather than a "back up the file" approach so that if the > process dies for a reason Python can't clean up after (like due to > SIGKILL), the half-written file doesn't remain. This is a very valid concern -- if the process dies unexpectedly I'd leave the file replaced and the original in some temporary directory. Not sure if there's a way around that, probably not. > I don't really like the name Backup but I can't think of a better name > at the moment. Me neither. Thanks, - Kim From kim at mvps.org Mon Jun 25 17:04:36 2012 From: kim at mvps.org (=?ISO-8859-1?Q?Kim_Gr=E4sman?=) Date: Mon, 25 Jun 2012 17:04:36 +0200 Subject: [Python-ideas] BackupFile In-Reply-To: References: Message-ID: Hi Christian, On Mon, Jun 25, 2012 at 3:59 PM, Christian Heimes wrote: > > Are you aiming for atomic file rollover backed by a temporary file? > That's the common way to safely overwrite an existing file. It works > differently than your code. Oops, I need to be clearer. This is not what I wanted to do. See other responses. Thanks! - Kim From lists at cheimes.de Mon Jun 25 17:21:57 2012 From: lists at cheimes.de (Christian Heimes) Date: Mon, 25 Jun 2012 17:21:57 +0200 Subject: [Python-ideas] BackupFile In-Reply-To: References: Message-ID: Am 25.06.2012 17:03, schrieb Kim Gr?sman: > This is a very valid concern -- if the process dies unexpectedly I'd > leave the file replaced and the original in some temporary directory. > Not sure if there's a way around that, probably not. Your algorithm doesn't take SIGKILL, SIGSEV or server crash into account. I don't see a chance to compensate for these problems. How about you fix the 3rd party code instead? -1 for addition of broken code. Sorry ;) Christian From ethan at stoneleaf.us Mon Jun 25 17:21:11 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 25 Jun 2012 08:21:11 -0700 Subject: [Python-ideas] BackupFile In-Reply-To: References: Message-ID: <4FE881E7.8030506@stoneleaf.us> Kim Gr?sman wrote: > On Mon, Jun 25, 2012 at 3:41 PM, Mike Graham wrote: >> I don't really like the name Backup but I can't think of a better name >> at the moment. > > Me neither. How about FileRollback, ModifyThenRestore, NowYouSeeItNowYouDont, or StupidThirdPartyProgramThatOnlyAllowsConfigThroughFiles ? Tongue-partly-in-cheek'ly yours, ~Ethan~ From kim at mvps.org Mon Jun 25 20:46:57 2012 From: kim at mvps.org (=?ISO-8859-1?Q?Kim_Gr=E4sman?=) Date: Mon, 25 Jun 2012 20:46:57 +0200 Subject: [Python-ideas] BackupFile In-Reply-To: References: Message-ID: On Mon, Jun 25, 2012 at 5:21 PM, Christian Heimes wrote: > Am 25.06.2012 17:03, schrieb Kim Gr?sman: >> This is a very valid concern -- if the process dies unexpectedly I'd >> leave the file replaced and the original in some temporary directory. >> Not sure if there's a way around that, probably not. > > Your algorithm doesn't take SIGKILL, SIGSEV or server crash into > account. I don't see a chance to compensate for these problems. How > about you fix the 3rd party code instead? > > -1 for addition of broken code. Duly noted :-) It's simple enough and works well in my narrow context, so I'll just keep it to myself. Cheers, - Kim From lists at cheimes.de Mon Jun 25 20:50:08 2012 From: lists at cheimes.de (Christian Heimes) Date: Mon, 25 Jun 2012 20:50:08 +0200 Subject: [Python-ideas] BackupFile In-Reply-To: References: Message-ID: <4FE8B2E0.5020807@cheimes.de> Am 25.06.2012 20:46, schrieb Kim Gr?sman: > Duly noted :-) > > It's simple enough and works well in my narrow context, so I'll just > keep it to myself. I'd use a similar approach in your place. Practicality beats purity. Or beat the author of the broken lib with a big stick. :) Christian From tjreedy at udel.edu Mon Jun 25 22:33:07 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 25 Jun 2012 16:33:07 -0400 Subject: [Python-ideas] BackupFile In-Reply-To: References: Message-ID: On 6/25/2012 8:17 AM, Kim Gr?sman wrote: > Hello, > > I'm new here, so forgive me if this has been discussed before or is off-topic. > > I came up with a mechanism that I thought might be useful in the > Python standard library -- a scope-bound self-restoring backup file. I > came to this na?ve implementation; > > -- > class BackupError(Exception): > pass > > class Backup: > def __init__(self, path): > if not os.path.exists(path) or os.path.isdir(path): > raise BackupError("%s must be a valid file path" % path) > > self.path = path > self.backup_path = None > > def __enter__(self): > self.backup() > > def __exit__(self, type, value, traceback): > self.restore() > > def _generate_backup_path(self): > tempdir = tempfile.mkdtemp() > basename = os.path.basename(self.path) > return os.path.join(tempdir, basename) > > def backup(self): > backup_path = self._generate_backup_path() > shutil.copy(self.path, backup_path) > self.backup_path = backup_path > > def restore(self): > if self.backup_path: > # Write backup back onto original > shutil.copy(self.backup_path, self.path) > shutil.rmtree(os.path.dirname(self.backup_path)) > self.backup_path = None > -- > > Backups are intended to be scope-bound like so: > > with Backup(settings_file): > rewrite_settings(settings_file) > do_something_else() > > I even managed to use it with the @contextmanager attribute, to allow this: > > with rewrite_settings(settings_file): > do_something_else() > > So, open questions; > > - Would something like this be useful outside of my office? > - Any suggestions for better names? > - This feels like it belongs in the tempfile module, would you agree? > - What's lacking in the implementation? Have I done something > decidedly non-Pythonic? It seems to me that what you actually *want* to do, given your other responses, is to make a temporary altered copy of the settings file and get the programs to use the *copy*. That way, other users would see the original undistrubed and a crash would at worst leave the copy undeleted. (Whether you want to copy alterations back is a different matter.) I presume the problem is that the program has the name of the settings file hard-coded. One possibility might be to run the program in a virtual environment with its temporary copy. (But I have 0 experience with that. I only know that venv has been added to 3.3.) -- Terry Jan Reedy From christopherreay at gmail.com Tue Jun 26 01:19:35 2012 From: christopherreay at gmail.com (Christopher Reay) Date: Tue, 26 Jun 2012 01:19:35 +0200 Subject: [Python-ideas] BackupFile In-Reply-To: References: Message-ID: It seems to me it would be easier to patch the 3rd party library code and submit the patch to them, than to do this. There are other ways to manipulate the file system to achieve what you are attempting.. but somewhere along the line you would have to interact with another program. If you taught the shell to clean up after your act, then this could be achieved in the event of a power failure. You could even write a wrapper shell for Python. I think perhaps the case is too niche for that kind of solution -- Be prepared to have your predictions come true -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Tue Jun 26 01:40:46 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 25 Jun 2012 16:40:46 -0700 Subject: [Python-ideas] BackupFile In-Reply-To: References: Message-ID: <4FE8F6FE.8040308@stoneleaf.us> Christopher Reay wrote: > It seems to me it would be easier to patch the 3rd party library code > and submit the patch to them, than to do this. Not every third-party library is patchable. ~Ethan~ From kim at mvps.org Tue Jun 26 07:21:01 2012 From: kim at mvps.org (=?ISO-8859-1?Q?Kim_Gr=E4sman?=) Date: Tue, 26 Jun 2012 07:21:01 +0200 Subject: [Python-ideas] BackupFile In-Reply-To: References: Message-ID: Hi Terry, and all, On Mon, Jun 25, 2012 at 10:33 PM, Terry Reedy wrote: > > It seems to me that what you actually *want* to do, given your other > responses, is to make a temporary altered copy of the settings file and get > the programs to use the *copy*. That way, other users would see the original > undistrubed and a crash would at worst leave the copy undeleted. (Whether > you want to copy alterations back is a different matter.) I presume the > problem is that the program has the name of the settings file hard-coded. > One possibility might be to run the program in a virtual environment with > its temporary copy. (But I have 0 experience with that. I only know that > venv has been added to 3.3.) Thanks for all your alternative strategies! In this case, the third party is a combination of Python, shell script, and executable binaries in at least three different processes, and I'm pretty happy with the modify-do work-restore model for this batch script. I appreciate the input on the suggested idea, it gave me some new error modes to worry about, even if most of them don't apply for this specific case. - Kim From techtonik at gmail.com Tue Jun 26 10:03:06 2012 From: techtonik at gmail.com (anatoly techtonik) Date: Tue, 26 Jun 2012 11:03:06 +0300 Subject: [Python-ideas] itertools.chunks(iterable, size, fill=None) Message-ID: Now that Python 3 is all about iterators (which is a user killer feature for Python according to StackOverflow - http://stackoverflow.com/questions/tagged/python) would it be nice to introduce more first class functions to work with them? One function to be exact to split string into chunks. itertools.chunks(iterable, size, fill=None) Which is the 33th most voted Python question on SO - http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks-in-python/312464 P.S. CC'ing to python-dev@ to notify about the thread in python-ideas. From g.brandl at gmx.net Tue Jun 26 10:39:04 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 26 Jun 2012 10:39:04 +0200 Subject: [Python-ideas] itertools.chunks(iterable, size, fill=None) In-Reply-To: References: Message-ID: On 26.06.2012 10:03, anatoly techtonik wrote: > Now that Python 3 is all about iterators (which is a user killer > feature for Python according to StackOverflow - > http://stackoverflow.com/questions/tagged/python) would it be nice to > introduce more first class functions to work with them? One function > to be exact to split string into chunks. > > itertools.chunks(iterable, size, fill=None) > > Which is the 33th most voted Python question on SO - > http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks-in-python/312464 +1. This is already a recipe in the itertools docs (see grouper() on http://docs.python.org/library/itertools#recipes), but it is so often requested (and used) that it is a very good candidate for a stdlib function. Georg From taleinat at gmail.com Tue Jun 26 12:34:54 2012 From: taleinat at gmail.com (Tal Einat) Date: Tue, 26 Jun 2012 13:34:54 +0300 Subject: [Python-ideas] itertools.chunks(iterable, size, fill=None) In-Reply-To: References: Message-ID: On Tue, Jun 26, 2012 at 11:03 AM, anatoly techtonik wrote: > Now that Python 3 is all about iterators (which is a user killer > feature for Python according to StackOverflow - > http://stackoverflow.com/questions/tagged/python) would it be nice to > introduce more first class functions to work with them? One function > to be exact to split string into chunks. > > ? ?itertools.chunks(iterable, size, fill=None) > > Which is the 33th most voted Python question on SO - > http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks-in-python/312464 +1 When working with iterators I have needed this often, and have implemented a similar utility function in many projects. As an example, this is a basic building block in my RunnincCalcs[1] module. - Tal Einat [1] http://bitbucket.org/taleinat/runningcalcs/src/5bf8816d944b/RunningCalcs.py#cl-38 From jsbueno at python.org.br Tue Jun 26 14:42:01 2012 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Tue, 26 Jun 2012 09:42:01 -0300 Subject: [Python-ideas] itertools.chunks(iterable, size, fill=None) In-Reply-To: References: Message-ID: On 26 June 2012 07:34, Tal Einat wrote: > On Tue, Jun 26, 2012 at 11:03 AM, anatoly techtonik wrote: >> >> ? ?itertools.chunks(iterable, size, fill=None) >> What about tertools.chunks(iterable, size=None, separator=None, fill=None) Requiring at leas one of size or separator to be set? This would also work for "for x in text.split('\n')" case. js -><- From simon.sapin at kozea.fr Tue Jun 26 14:58:35 2012 From: simon.sapin at kozea.fr (Simon Sapin) Date: Tue, 26 Jun 2012 14:58:35 +0200 Subject: [Python-ideas] itertools.chunks(iterable, size, fill=None) In-Reply-To: References: Message-ID: <4FE9B1FB.6080401@kozea.fr> Le 26/06/2012 14:42, Joao S. O. Bueno a ?crit : > itertools.chunks(iterable, size=None, separator=None, fill=None) > > Requiring at leas one of size or separator to be set? > > This would also work for "for x in text.split('\n')" case. I think that splitting an iterable on some separators or on a chunck size are two completely different functions. Having the same function do either is a bit confusing and I don?t see the benefit. Or is there an use case in passing both parameters? What would it do then, end the chunck after `size` elements or at `separator`, whichever comes first? Regards, -- Simon Sapin From jsbueno at python.org.br Fri Jun 29 13:29:05 2012 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Fri, 29 Jun 2012 08:29:05 -0300 Subject: [Python-ideas] itertools.chunks(iterable, size, fill=None) In-Reply-To: <4d42453e-9e79-4d26-a550-504fae1a72d6@googlegroups.com> References: <4FE9B1FB.6080401@kozea.fr> <4d42453e-9e79-4d26-a550-504fae1a72d6@googlegroups.com> Message-ID: On 29 June 2012 05:55, Michele Lacchia wrote: > + 1 for the original proposal! I don't think splitting belongs to potential > itertools.chunks > > Il giorno marted? 26 giugno 2012 14:58:35 UTC+2, Simon Sapin ha scritto: >> >> Le 26/06/2012 14:42, Joao S. O. Bueno a ?crit : >> > itertools.chunks(iterable, size=None, separator=None, fill=None) >> > >> > Requiring at leas one of size or separator to be set? >> > >> > This would also work for "for x in text.split('\n')" ?case. >> >> I think that splitting an iterable on some separators or on a chunck >> size are two completely different functions. Having the same function do >> either is a bit confusing and I don?t see the benefit. >> >> Or is there an use case in passing both parameters? What would it do >> then, end the chunck after `size` elements or at `separator`, whichever >> comes first? Indeed - these are orthogonal features - but I think the ability to split on a separator as an interator, if not as important as chunks, is missing as well. Maybe add both? js -><- >> >> Regards, >> -- >> Simon Sapin >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas From sturla at molden.no Fri Jun 29 18:59:23 2012 From: sturla at molden.no (Sturla Molden) Date: Fri, 29 Jun 2012 18:59:23 +0200 Subject: [Python-ideas] BackupFile In-Reply-To: References: Message-ID: <4FEDDEEB.201@molden.no> On 25.06.2012 14:17, Kim Gr?sman wrote: > I came up with a mechanism that I thought might be useful > in the Python standard library -- a scope-bound self-restoring > backup file. > with Backup(settings_file): > rewrite_settings(settings_file) > do_something_else() Are you reinventing the transactional database? If you need atomic commit and rollback, I am sure you can find a database that will take care of that (even Sqlite if you look in Python's standard library). Sturla From g.brandl at gmx.net Fri Jun 29 22:32:49 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 29 Jun 2012 22:32:49 +0200 Subject: [Python-ideas] itertools.chunks(iterable, size, fill=None) In-Reply-To: References: Message-ID: On 26.06.2012 10:03, anatoly techtonik wrote: > Now that Python 3 is all about iterators (which is a user killer > feature for Python according to StackOverflow - > http://stackoverflow.com/questions/tagged/python) would it be nice to > introduce more first class functions to work with them? One function > to be exact to split string into chunks. > > itertools.chunks(iterable, size, fill=None) > > Which is the 33th most voted Python question on SO - > http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks-in-python/312464 > > P.S. CC'ing to python-dev@ to notify about the thread in python-ideas. > Anatoly, so far there were no negative votes -- would you care to go another step and propose a patch? Georg From christopherreay at gmail.com Fri Jun 29 22:56:06 2012 From: christopherreay at gmail.com (Christopher Reay) Date: Fri, 29 Jun 2012 21:56:06 +0100 Subject: [Python-ideas] BackupFile In-Reply-To: <4FEDDEEB.201@molden.no> References: <4FEDDEEB.201@molden.no> Message-ID: zope -> webdavfs ftw -- Be prepared to have your predictions come true -------------- next part -------------- An HTML attachment was scrubbed... URL: From christopherreay at gmail.com Fri Jun 29 23:01:31 2012 From: christopherreay at gmail.com (Christopher Reay) Date: Fri, 29 Jun 2012 22:01:31 +0100 Subject: [Python-ideas] BackupFile In-Reply-To: References: <4FEDDEEB.201@molden.no> Message-ID: or ftpfs ftm On 29 June 2012 21:56, Christopher Reay wrote: > zope -> webdavfs ftw > > > > -- > > Be prepared to have your predictions come true > > > -- Be prepared to have your predictions come true -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikegraham at gmail.com Fri Jun 29 23:36:48 2012 From: mikegraham at gmail.com (Mike Graham) Date: Fri, 29 Jun 2012 17:36:48 -0400 Subject: [Python-ideas] itertools.chunks(iterable, size, fill=None) In-Reply-To: References: Message-ID: On Fri, Jun 29, 2012 at 4:32 PM, Georg Brandl wrote: > so far there were no negative votes As far as I know, Raymond Hettinger is the itertools maintainer and he has repeatedly objected to this idea in the past (e.g. http://bugs.python.org/issue6021 ). Hopefully we can get his input again. Mike From fiatjaf at yahoo.com.br Sat Jun 30 15:59:54 2012 From: fiatjaf at yahoo.com.br (fiatjaf at yahoo.com.br) Date: Sat, 30 Jun 2012 10:59:54 -0300 Subject: [Python-ideas] the optional "as" statement inside "if" statements Message-ID: the idea is to make an variable assignment at the same time that the existence of that variable -- which is being returned by a function -- is made. suppose we are returning a variable from the method 'get' from the 'request' object and them making some stuff with it, but that stuff we will only do if it exists, if not, we'll just pass, instead of writing: variable = self.request.get('variable') if variable: print variable we could write if self.request.get('variable') as variable: print variable seems stupid (or not?), but with lots of variables to process, this pre-assignment could be very unpleasant -- especially if, as the in the example case, very little use will be made of the tested variable. also, the "as" expression already exists and is very pythonic. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ironfroggy at gmail.com Sat Jun 30 16:16:47 2012 From: ironfroggy at gmail.com (Calvin Spealman) Date: Sat, 30 Jun 2012 10:16:47 -0400 Subject: [Python-ideas] the optional "as" statement inside "if" statements In-Reply-To: References: Message-ID: On Sat, Jun 30, 2012 at 9:59 AM, wrote: > the idea is to make an variable assignment at the same time that the > existence of that variable -- which is being returned by a function -- is > made. > > suppose we are returning a variable from the method 'get' from the 'request' > object and them making some stuff with it, but that stuff we will only do if > it exists, if not, we'll just pass, instead of writing: > > variable = self.request.get('variable') > if variable: > ? ?print variable > > we could write > > if self.request.get('variable') as variable: > ? ?print variable > > seems stupid (or not?), but with lots of variables to process, this > pre-assignment could be very unpleasant -- especially if, as the in the > example case, very little use will be made of the tested variable. > > also, the "as" expression already exists and is very pythonic. This is probably the best solution to the problem that would fit in the language, but I'm not convinced doing it at all fits very well. +0 > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy From zachary.ware+pyideas at gmail.com Sat Jun 30 16:20:14 2012 From: zachary.ware+pyideas at gmail.com (Zachary Ware) Date: Sat, 30 Jun 2012 09:20:14 -0500 Subject: [Python-ideas] the optional "as" statement inside "if" statements In-Reply-To: References: Message-ID: On Jun 30, 2012 9:00 AM, wrote: > > the idea is to make an variable assignment at the same time that the existence of that variable -- which is being returned by a function -- is made. > > suppose we are returning a variable from the method 'get' from the 'request' object and them making some stuff with it, but that stuff we will only do if it exists, if not, we'll just pass, instead of writing: > > variable = self.request.get('variable') > if variable: > print variable > > we could write > > if self.request.get('variable') as variable: > print variable > > seems stupid (or not?), but with lots of variables to process, this pre-assignment could be very unpleasant -- especially if, as the in the example case, very little use will be made of the tested variable. > > also, the "as" expression already exists and is very pythonic. > I like it! I've found myself annoyed by writing an if statement using the result of a function call, then realizing "oh wait, I need a reference to that value" and have to go back and rewrite. This would eliminate that and, to my mind, it flows very nicely. +1 from me. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Sat Jun 30 16:23:45 2012 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sat, 30 Jun 2012 10:23:45 -0400 Subject: [Python-ideas] Happy leap second Message-ID: Even though many have hoped that the authorities would stop fiddling with our clocks, today a leap second will be inserted in UTC. Systems using Olson/IANA timezone database have a way to deal with this without adjusting their clocks, but few systems are configured that way: $ TZ=right/UTC date -d @1341100824 Sat Jun 30 23:59:60 UTC 2012 (1341100824 is the number of seconds since epoch including the leap seconds.) Python's time module works fine with the "right" timezones: >>> import time >>> print(time.strftime('%T', time.localtime(1341100824))) 23:59:60 but the datetime module clips the leap second down to the previous second: >>> from datetime import datetime >>> from datetime import datetime >>> print(datetime.fromtimestamp(1341100824).strftime('%T')) 23:59:59 >>> print datetime.fromtimestamp(1341100823).strftime('%T') 23:59:59 BDFL has been resisting adding support for leap seconds to the datetime module [1], but as the clocks become more accurate and synchronization requirements become stricter, we may want to revisit this issue. [1] http://mail.python.org/pipermail/python-ideas/2010-June/007307.html From guido at python.org Sat Jun 30 16:57:50 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 30 Jun 2012 07:57:50 -0700 Subject: [Python-ideas] Happy leap second In-Reply-To: References: Message-ID: POSIX timestamps don't have leap seconds. Convince POSIX to change that and Python will follow suit. On Sat, Jun 30, 2012 at 7:23 AM, Alexander Belopolsky wrote: > Even though many have hoped that the authorities would stop fiddling > with our clocks, today a leap second will be inserted in UTC. > Systems using Olson/IANA timezone database have a way to deal with > this without adjusting their clocks, but few systems are configured > that way: > > $ TZ=right/UTC date -d @1341100824 > Sat Jun 30 23:59:60 UTC 2012 > > (1341100824 is the number of seconds since epoch including the leap seconds.) > > Python's time module works fine with the "right" timezones: > >>>> import time >>>> print(time.strftime('%T', time.localtime(1341100824))) > 23:59:60 > > but the datetime module clips the leap second down to the previous second: > >>>> from datetime import datetime >>>> from datetime import datetime >>>> print(datetime.fromtimestamp(1341100824).strftime('%T')) > 23:59:59 >>>> print datetime.fromtimestamp(1341100823).strftime('%T') > 23:59:59 > > BDFL has been resisting adding support for leap seconds to the > datetime module [1], but as the clocks become more accurate and > synchronization requirements become stricter, we may want to revisit > this issue. > > [1] http://mail.python.org/pipermail/python-ideas/2010-June/007307.html > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- --Guido van Rossum (python.org/~guido) From ncoghlan at gmail.com Sat Jun 30 17:06:39 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 1 Jul 2012 01:06:39 +1000 Subject: [Python-ideas] the optional "as" statement inside "if" statements In-Reply-To: References: Message-ID: On Sat, Jun 30, 2012 at 11:59 PM, wrote: > the idea is to make an variable assignment at the same time that the > existence of that variable -- which is being returned by a function -- is > made. > > suppose we are returning a variable from the method 'get' from the 'request' > object and them making some stuff with it, but that stuff we will only do if > it exists, if not, we'll just pass, instead of writing: > > variable = self.request.get('variable') > if variable: > ? ?print variable > > we could write > > if self.request.get('variable') as variable: > ? ?print variable > > seems stupid (or not?), but with lots of variables to process, this > pre-assignment could be very unpleasant -- especially if, as the in the > example case, very little use will be made of the tested variable. > > also, the "as" expression already exists and is very pythonic. This proposal has been considered and rejected many times. It's not general enough - it *only* works for those cases where the value to be retained *and* the interesting condition are the same. Consider the simple case of a value that may be either None (not interesting) or a number (interesting). Since the interesting values include "0", which evaluates as False along with None, this limited form of embedded assignment syntax would not help. Embedded assignment in C isn't that limited., but nobody has yet volunteered to take the radical step of proposing "(X as Y)" as a general embedded assignment syntax. I suggest anyone consider such an idea do a *lot* of research in the python-ideas archives first, though (as the idea has seen plenty of discussion). It is not as obviously flawed as the if-and-while statement only variant, but it would still involve being rather persuasive to make such a significant change to the language. You're also unlikely to get much in the way of core developer feedback until after the 3.3 release in August. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From alexander.belopolsky at gmail.com Sat Jun 30 17:18:07 2012 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sat, 30 Jun 2012 11:18:07 -0400 Subject: [Python-ideas] Happy leap second In-Reply-To: References: Message-ID: On Sat, Jun 30, 2012 at 10:57 AM, Guido van Rossum wrote: > POSIX timestamps don't have leap seconds. Convince POSIX to change > that and Python will follow suit. POSIX (time_t) timestamps are mostly irrelevant for the users of the datetime module. POSIX type that is closest to datetime.datetime is struct tm and it does have leap seconds: """ The header shall declare the structure tm, which shall include at least the following members: int tm_sec Seconds [0,60]. ... """ - http://pubs.opengroup.org/onlinepubs/009696699/basedefs/time.h.html Note that that POSIX does require that a round-trip through time_t (localtime(mktime(x))) converts hh:59:60 to (hh+1):00:00, but datetime.timestamp() can still do the same if we make second=60 valid. From fiatjaf at yahoo.com.br Sat Jun 30 17:46:08 2012 From: fiatjaf at yahoo.com.br (fiatjaf at yahoo.com.br) Date: Sat, 30 Jun 2012 12:46:08 -0300 Subject: [Python-ideas] the optional "as" statement inside "if" statements In-Reply-To: References: Message-ID: thank you two for the responses. I'm a newbie here and I didn't find the archives (yes, I'm stupid, and I didn't search well). I have no hope of being persuasive, I only thought that if I introduced the idea other people would like it instantenously, but if the idea is good, obviously someone had already thought of it, so it makes me happy. I'll look for the archives and keep watching the development and see what I get from this. On Sat, Jun 30, 2012 at 12:06 PM, Nick Coghlan wrote: > On Sat, Jun 30, 2012 at 11:59 PM, wrote: > > the idea is to make an variable assignment at the same time that the > > existence of that variable -- which is being returned by a function -- is > > made. > > > > suppose we are returning a variable from the method 'get' from the > 'request' > > object and them making some stuff with it, but that stuff we will only > do if > > it exists, if not, we'll just pass, instead of writing: > > > > variable = self.request.get('variable') > > if variable: > > print variable > > > > we could write > > > > if self.request.get('variable') as variable: > > print variable > > > > seems stupid (or not?), but with lots of variables to process, this > > pre-assignment could be very unpleasant -- especially if, as the in the > > example case, very little use will be made of the tested variable. > > > > also, the "as" expression already exists and is very pythonic. > > This proposal has been considered and rejected many times. It's not > general enough - it *only* works for those cases where the value to be > retained *and* the interesting condition are the same. > > Consider the simple case of a value that may be either None (not > interesting) or a number (interesting). Since the interesting values > include "0", which evaluates as False along with None, this limited > form of embedded assignment syntax would not help. > > Embedded assignment in C isn't that limited., but nobody has yet > volunteered to take the radical step of proposing "(X as Y)" as a > general embedded assignment syntax. I suggest anyone consider such an > idea do a *lot* of research in the python-ideas archives first, though > (as the idea has seen plenty of discussion). It is not as obviously > flawed as the if-and-while statement only variant, but it would still > involve being rather persuasive to make such a significant change to > the language. > > You're also unlikely to get much in the way of core developer feedback > until after the 3.3 release in August. > > Cheers, > Nick. > > > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > -------------- next part -------------- An HTML attachment was scrubbed... URL: From christopherreay at gmail.com Sat Jun 30 17:54:55 2012 From: christopherreay at gmail.com (Christopher Reay) Date: Sat, 30 Jun 2012 17:54:55 +0200 Subject: [Python-ideas] the optional "as" statement inside "if" statements In-Reply-To: References: Message-ID: The only hope for a large archive like this one is to wait long enough to make sure you dont re hash the really regular ideas. ... ponders... Do I have time to read the archives? Do people mind adminiing the repetitive ideas? -- Be prepared to have your predictions come true -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Sat Jun 30 18:03:10 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 30 Jun 2012 19:03:10 +0300 Subject: [Python-ideas] isascii()/islatin1()/isbmp() Message-ID: As shown in issue #15016 [1], there is a use cases when it is useful to determine that string can be encoded in ASCII or Latin1. In working with Tk or Windows console applications can be useful to determine that string can be encoded in UCS2. C API provides interface for this, but at Python level it is not available. I propose to add to strings class new methods: isascii(), islatin1() and isbmp() (in addition to such methods as isalpha() or isdigit()). The implementation will be trivial. Pro: The current trick with trying to encode has O(n) complexity and has overhead of exception raising/catching. Contra: In most cases after determining characters range we still need to encode a string with the appropriate encoding. New methods will complicate already overloaded strings class. Objections? [1] http://bugs.python.org/issue15016 From ncoghlan at gmail.com Sat Jun 30 18:05:26 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 1 Jul 2012 02:05:26 +1000 Subject: [Python-ideas] the optional "as" statement inside "if" statements In-Reply-To: References: Message-ID: On Sun, Jul 1, 2012 at 1:54 AM, Christopher Reay wrote: > The only hope for a large archive like this one is to wait long enough to > make sure you dont re hash the really regular ideas. > > ... ponders... Do I have time to read the archives? Do people mind adminiing > the repetitive ideas? It's more a matter of working out how to point Google (or the search engine of your choice) at the archives in a useful way. In this case: https://www.google.com/search?q=inurl%3Apython-ideas%20site%3Amail.python.org%20embedded%20assignment Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Sat Jun 30 18:14:23 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 1 Jul 2012 02:14:23 +1000 Subject: [Python-ideas] isascii()/islatin1()/isbmp() In-Reply-To: References: Message-ID: On Sun, Jul 1, 2012 at 2:03 AM, Serhiy Storchaka wrote: > As shown in issue #15016 [1], there is a use cases when it is useful to > determine that string can be encoded in ASCII or Latin1. In working with Tk > or Windows console applications can be useful to determine that string can > be encoded in UCS2. C API provides interface for this, but at Python level > it is not available. > > I propose to add to strings class new methods: isascii(), islatin1() and > isbmp() (in addition to such methods as isalpha() or isdigit()). The > implementation will be trivial. Why not just expose max_code_point directly instead of adding three new methods? Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From guido at python.org Sat Jun 30 18:29:58 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 30 Jun 2012 09:29:58 -0700 Subject: [Python-ideas] Happy leap second In-Reply-To: References: Message-ID: On Sat, Jun 30, 2012 at 8:18 AM, Alexander Belopolsky wrote: > On Sat, Jun 30, 2012 at 10:57 AM, Guido van Rossum wrote: >> POSIX timestamps don't have leap seconds. Convince POSIX to change >> that and Python will follow suit. > > POSIX (time_t) timestamps are mostly irrelevant for the users of the > datetime module. ?POSIX type that is closest to datetime.datetime is > struct tm and it does have leap seconds: > > """ > The header shall declare the structure tm, which shall > include at least the following members: > > int ? ?tm_sec ? Seconds [0,60]. > ... > """ ?- http://pubs.opengroup.org/onlinepubs/009696699/basedefs/time.h.html > > Note that that POSIX does require that a round-trip through time_t > (localtime(mktime(x))) converts hh:59:60 to (hh+1):00:00, but > datetime.timestamp() can still do the same if we make second=60 valid. The roundtrip requirement is telling though -- they have no way to actually represent a leap second in the underlying clock (which is a POSIX timestamp). -- --Guido van Rossum (python.org/~guido) From matt at whoosh.ca Sat Jun 30 18:34:02 2012 From: matt at whoosh.ca (Matt Chaput) Date: Sat, 30 Jun 2012 12:34:02 -0400 Subject: [Python-ideas] isascii()/islatin1()/isbmp() In-Reply-To: References: Message-ID: <6188BD20-3B39-4F80-9ED4-B9CBFFEFB837@whoosh.ca> > Why not just expose max_code_point directly instead of adding three new methods? +1 I accidentally sent my reply directly to Serhiy, but basically I said that I could really use this in my search library when I'm trying to write efficient compressed indexes, but all I need is to know the maximum char code (or the number of bytes per char). I've been meaning to ask about this for a while. Matt From storchaka at gmail.com Sat Jun 30 18:41:59 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 30 Jun 2012 19:41:59 +0300 Subject: [Python-ideas] isascii()/islatin1()/isbmp() In-Reply-To: References: Message-ID: On 30.06.12 19:14, Nick Coghlan wrote: > Why not just expose max_code_point directly instead of adding three new methods? I think it will be easier to use. You do not have to remember that the maximum ASCII code is 127. This is similar to the old is*() methods. From solipsis at pitrou.net Sat Jun 30 18:43:16 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 30 Jun 2012 18:43:16 +0200 Subject: [Python-ideas] isascii()/islatin1()/isbmp() References: Message-ID: <20120630184316.5de2ce5e@pitrou.net> On Sun, 1 Jul 2012 02:14:23 +1000 Nick Coghlan wrote: > On Sun, Jul 1, 2012 at 2:03 AM, Serhiy Storchaka wrote: > > As shown in issue #15016 [1], there is a use cases when it is useful to > > determine that string can be encoded in ASCII or Latin1. In working with Tk > > or Windows console applications can be useful to determine that string can > > be encoded in UCS2. C API provides interface for this, but at Python level > > it is not available. > > > > I propose to add to strings class new methods: isascii(), islatin1() and > > isbmp() (in addition to such methods as isalpha() or isdigit()). The > > implementation will be trivial. > > Why not just expose max_code_point directly instead of adding three new methods? Because it's really an implementation detail. We don't want to carry around such a legacy. Besides, we don't know the max code point for sure, only an upper bound of it (and, implicitly, also a lower bound). So while I'm -0 on the methods (calling encode() is as simple), I'm -1 on max_code_point. Regards Antoine. From christopherreay at gmail.com Sat Jun 30 18:44:28 2012 From: christopherreay at gmail.com (Christopher Reay) Date: Sat, 30 Jun 2012 18:44:28 +0200 Subject: [Python-ideas] isascii()/islatin1()/isbmp() In-Reply-To: References: Message-ID: Well, there would be constants. What about both the methods and the max_code_point, and use it as an excuse to explain again that encodings exists, and point to the encodings docs? -- Be prepared to have your predictions come true -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Sat Jun 30 19:02:47 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 30 Jun 2012 20:02:47 +0300 Subject: [Python-ideas] isascii()/islatin1()/isbmp() In-Reply-To: <20120630184316.5de2ce5e@pitrou.net> References: <20120630184316.5de2ce5e@pitrou.net> Message-ID: On 30.06.12 19:43, Antoine Pitrou wrote: > Because it's really an implementation detail. We don't want to carry > around such a legacy. > Besides, we don't know the max code point for sure, only an upper bound > of it (and, implicitly, also a lower bound). > > So while I'm -0 on the methods (calling encode() is as simple), I'm -1 > on max_code_point. Thanks, Antoine. This objection also just occurred to me. We cannot guarantee that isascii() always will be O(1). Several enchantments have already been rejected for this reason. If an extension author wants to take advantage of CPython, he should use CPython's C API. From alexander.belopolsky at gmail.com Sat Jun 30 19:17:38 2012 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sat, 30 Jun 2012 13:17:38 -0400 Subject: [Python-ideas] Happy leap second In-Reply-To: References: Message-ID: On Sat, Jun 30, 2012 at 12:29 PM, Guido van Rossum wrote: .. > The roundtrip requirement is telling though -- they have no way to > actually represent a leap second in the underlying clock (which is a > POSIX timestamp). This correct: POSIX gettimeofday() cannot produce accurate UTC time during the leap second, but this does not mean that a python program should not be able to keep UTC time as accurately as the underlying hardware allows. Systems synchronized with official time using NTP, get notifications about leap seconds up to a day in advance and can prepare for a second during which NTP time stops. (As far as I understand, few systems actually stop their clocks or roll them back on a leap seconds - most slow the clocks down in various incompatible ways.) For example, during the leap second a software clock can use clock_gettime() (or Python's new time.monotonic()) function to get actual time. For better worse, legal time throughout the world is based on UTC and once every couple of years there is a second that has to be communicated as hh:mm:60. Today we are fortunate that it is inserted during the time when most of the world markets are closed, but next time we may see a lot of lawsuits between traders arguing over whose orders should have been filled first. While few systems report accurate UTC time during a leap second, there is no technological limitation that would prevent most systems from implementing it. One can even implement such UTC clock in python, but valid times produced by such clock cannot be stored in datetime objects. From benjamin at python.org Sat Jun 30 19:20:21 2012 From: benjamin at python.org (Benjamin Peterson) Date: Sat, 30 Jun 2012 17:20:21 +0000 (UTC) Subject: [Python-ideas] isascii()/islatin1()/isbmp() References: Message-ID: Nick Coghlan writes: > Why not just expose max_code_point directly instead of adding > three new methods? All of these proposals rely on the *current* implementation of CPython unicode (at least for their efficiency). Let's not pollute the language with features that will be bad on others implementations or even ours in the future. Regards, Benjamin From guido at python.org Sat Jun 30 20:04:51 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 30 Jun 2012 11:04:51 -0700 Subject: [Python-ideas] Happy leap second In-Reply-To: References: Message-ID: There's no reason why you need to use date time objects for such extreme use cases. --Guido van Rossum (sent from Android phone) On Jun 30, 2012 10:17 AM, "Alexander Belopolsky" < alexander.belopolsky at gmail.com> wrote: > On Sat, Jun 30, 2012 at 12:29 PM, Guido van Rossum > wrote: > .. > > The roundtrip requirement is telling though -- they have no way to > > actually represent a leap second in the underlying clock (which is a > > POSIX timestamp). > > This correct: POSIX gettimeofday() cannot produce accurate UTC time > during the leap second, but this does not mean that a python program > should not be able to keep UTC time as accurately as the underlying > hardware allows. Systems synchronized with official time using NTP, > get notifications about leap seconds up to a day in advance and can > prepare for a second during which NTP time stops. (As far as I > understand, few systems actually stop their clocks or roll them back > on a leap seconds - most slow the clocks down in various incompatible > ways.) For example, during the leap second a software clock can use > clock_gettime() (or Python's new time.monotonic()) function to get > actual time. > > For better worse, legal time throughout the world is based on UTC and > once every couple of years there is a second that has to be > communicated as hh:mm:60. Today we are fortunate that it is inserted > during the time when most of the world markets are closed, but next > time we may see a lot of lawsuits between traders arguing over whose > orders should have been filled first. While few systems report > accurate UTC time during a leap second, there is no technological > limitation that would prevent most systems from implementing it. One > can even implement such UTC clock in python, but valid times produced > by such clock cannot be stored in datetime objects. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From christopherreay at gmail.com Sat Jun 30 20:38:17 2012 From: christopherreay at gmail.com (Christopher Reay) Date: Sat, 30 Jun 2012 19:38:17 +0100 Subject: [Python-ideas] the optional "as" statement inside "if" statements In-Reply-To: References: Message-ID: How many times have you told people that? -- Be prepared to have your predictions come true -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Sat Jun 30 23:09:36 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 30 Jun 2012 17:09:36 -0400 Subject: [Python-ideas] itertools.chunks(iterable, size, fill=None) In-Reply-To: References: Message-ID: <4FEF6B10.9040409@udel.edu> On 6/29/2012 4:32 PM, Georg Brandl wrote: > On 26.06.2012 10:03, anatoly techtonik wrote: >> Now that Python 3 is all about iterators (which is a user killer >> feature for Python according to StackOverflow - >> http://stackoverflow.com/questions/tagged/python) would it be nice to >> introduce more first class functions to work with them? One function >> to be exact to split string into chunks. Nothing special about strings. >> itertools.chunks(iterable, size, fill=None) This is a renaming of itertools.grouper in 9.1.2. Itertools Recipes. You should have mentioned this. I think of 'blocks' rather than 'chunks', but I notice several SO questions with 'chunk(s)' in the title. >> Which is the 33th most voted Python question on SO - >> http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks-in-python/312464 I am curious how you get that number. I do note that there are about 15 other Python SO questions that seem to be variations on the theme. There might be more if 'blocks' and 'groups' were searched for. > Anatoly, so far there were no negative votes -- would you care to go > another step and propose a patch? That is because Raymond H. is not reading either list right now ;-) Hence the Cc:. Also because I did not yet respond to a vague, very incomplete idea. From Raymond's first message on http://bugs.python.org/issue6021 , add grouper: "This has been rejected before. * It is not a fundamental itertool primitive. The recipes section in the docs shows a clean, fast implementation derived from zip_longest(). * There is some debate on a correct API for odd lengths. Some people want an exception, some want fill-in values, some want truncation, and some want a partially filled-in tuple. The alone is reason enough not to set one behavior in stone. * There is an issue with having too many itertools. The module taken as a whole becomes more difficult to use as new tools are added." --- This is not to say that the question should not be re-considered. Given the StackOverflow experience in addition to that of the tracker and python-list (and maybe python-ideas), a special exception might be made in relation to points 1 and 3. --- It regard to point 2: many 'proposals', including Anatoly's, neglect this detail. But the function has to do *something* when seqlen % grouplen != 0. So an 'idea' is not really a concrete programmable proposal until 'something' is specified. Exception -- not possible for an itertool until the end of the iteration (see below). To raise immediately for sequences, one could wrap grouper. def exactgrouper(sequence, k): # untested if len(sequence) % k: raise ValueError('Sequence length {} must be a multiple of group length {}'.format(len(sequence), k) else: return itertools.grouper(sequence, k) Of course, sequences can also be directly sequentially sliced (but should the result be an iterable or sequence of blocks?). But we do not have a seqtools module and I do not think there should be another method added to the seq protocol. Fill -- grouper always does this, with a default of None. Truncate, Remainder -- grouper (zip_longest) cannot directly do this and no recipes are given in the itertools docs. (More could be, see below.) Discussions on python-list gives various implementations either for sequences or iterables. For the latter, one approach is "it = iter(iterable)" followed by repeated islice of the first n items. Another is to use a sentinal for the 'fill' to detect a final incomplete block (tuple for grouper). def grouper_x(n, iterable): # untested sentinal = object() for g in grouper(n, iterable, sentinal): if g[-1] != sentinal: yield g else: # pass to truncate # yield g[:g.index(sentinal) for remainer # raise ValueError for delayed exception --- The above discussion of point 2 touches on point 4, which Raymond neglected in the particular message above but which has come up before: What are the allowed input and output types? An idea is not a programmable proposal until the domain, range, and mapping are specified. Possible inputs are a specific sequence (string, for instance), any sequence, any iterable. Possible outputs are a sequence or iterator of sequence or iterator. The various python-list and stackoverflow posts questions asks for various combinations. zip_longest and hence grouper takes any iterable and returns an iterator of tuples. (An iterator of maps might be more useful as a building block.) This is not what one usually wants with string input, for instance, nor with range input. To illustrate: import itertools as it def grouper(n, iterable, fillvalue=None): "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx" args = [iter(iterable)] * n return it.zip_longest(*args, fillvalue=fillvalue) print(*(grouper(3, 'ABCDEFG', 'x'))) # probably not wanted print(*(''.join(g) for g in grouper(3, 'ABCDEFG', 'x'))) # ('A', 'B', 'C') ('D', 'E', 'F') ('G', 'x', 'x') ABC DEF Gxx -- What to do? One could easily write 20 different functions. So more thought is needed before adding anything. -1 on the idea as is. For the doc, I think it would be helpful here and in most module subchapters if there were a subchapter table of contents at the top (under 9.1 in this case). Even though just 2 lines here (currently, but see below), it would let people know that there *is* a recipes section. After the appropriate tables, mention that there are example uses in the recipe section. Possibly add similar tables in the recipe section. Another addition could be a new subsection on grouping (chunking) that would discuss post-processing of grouper (as discussed above), as well as other recipes, including ones specific to strings and sequences. It would essentially be a short how-to. Call it 9.1.3 "Grouping, Blocking, or Chunking Sequences and Iterables". The synonyms will help external searching. A toc would let people who have found this doc know to look for this at the bottom. -- Terry Jan Reedy