From jab at math.brown.edu Mon Jan 2 00:10:19 2017 From: jab at math.brown.edu (jab at math.brown.edu) Date: Mon, 2 Jan 2017 00:10:19 -0500 Subject: [Python-ideas] incremental hashing in __hash__ In-Reply-To: References: <5863F223.3040906@stoneleaf.us> <20161229081959.GA3887@ando.pearwood.info> Message-ID: On Sat, Dec 31, 2016 at 5:39 PM, wrote: > a set hashing algorithm is exposed as collections.Set._hash() in > _collections_abc.py, which can be passed an iterable (By which I meant to say, "which can be passed a set-like iterable such as a Keys- or ItemsView of an existing mapping, so that the hash can be computed from the existing data rather than copying it all into a new frozenset." Should have been more precise.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From jab at math.brown.edu Wed Jan 4 16:38:05 2017 From: jab at math.brown.edu (jab at math.brown.edu) Date: Wed, 4 Jan 2017 16:38:05 -0500 Subject: [Python-ideas] incremental hashing in __hash__ In-Reply-To: References: <5863F223.3040906@stoneleaf.us> <20161229081959.GA3887@ando.pearwood.info> Message-ID: Instead of the proposals like "hash.from_iterable()", would it make sense to allow tuple.__hash__() to accept any iterable, when called as a classmethod? (And similarly with frozenset.__hash__(), so that the fast C implementation of that algorithm could be used, rather than the slow collections.Set._hash() implementation. Then the duplicated implementation in _collections_abc.py's Set._hash() could be removed completely, delegating to frozenset.__hash__() instead.) Would this API more cleanly communicate the algorithm being used and the implementation, while making a smaller increase in API surface area compared to introducing a new function? -------------- next part -------------- An HTML attachment was scrubbed... URL: From turnbull.stephen.fw at u.tsukuba.ac.jp Wed Jan 4 18:39:52 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Thu, 5 Jan 2017 08:39:52 +0900 Subject: [Python-ideas] incremental hashing in __hash__ In-Reply-To: References: <5863F223.3040906@stoneleaf.us> <20161229081959.GA3887@ando.pearwood.info> Message-ID: <22637.34760.62367.414268@turnbull.sk.tsukuba.ac.jp> jab at math.brown.edu writes: > Instead of the proposals like "hash.from_iterable()", would it make sense > to allow tuple.__hash__() to accept any iterable, when called as a > classmethod? [...] > Would this API more cleanly communicate the algorithm being used and the > implementation, while making a smaller increase in API surface area > compared to introducing a new function? I don't understand what you're proposing. There are three problems with the API. First, the "obvious" meaning of "tuple.__hash__(iterable)" is "hash(tuple(iterable))", and that's the obvious spelling. Second, there's no reason for a dunder method to be polymorphic; this is hardly discoverable. Third, dunders are normally not called from user code (including class implementations, although they're less ugly there), suggesting that there should be a helper for this. I'm unclear on how the function is supposed to be implemented, since presumably tuple.__hash__ knows about the tuple data structure's implementation (memory layout), but it can't know about an arbitrary iterable. Steve From steve at pearwood.info Wed Jan 4 19:31:08 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 5 Jan 2017 11:31:08 +1100 Subject: [Python-ideas] incremental hashing in __hash__ In-Reply-To: References: <20161229081959.GA3887@ando.pearwood.info> Message-ID: <20170105003107.GJ3887@ando.pearwood.info> On Wed, Jan 04, 2017 at 04:38:05PM -0500, jab at math.brown.edu wrote: > Instead of the proposals like "hash.from_iterable()", would it make sense > to allow tuple.__hash__() to accept any iterable, when called as a > classmethod? The public API for calculating the hash of something is to call the hash() builtin function on some object, e.g. to call tuple.__hash__ you write hash((a, b, c)). The __hash__ dunder method is implementation, not interface, and normally shouldn't be called directly. Unless I'm missing something obvious, your proposal would require the caller to call the dunder methods directly: class X: def __hash__(self): return tuple.__hash__(iter(self)) I consider that a poor interface design. But even if we decide to make an exception in this case, tuple.__hash__ is currently an ordinary instance method right now. There's probably code that relies on that fact and expects that: tuple.__hash__((a, b, c)) is currently the same as (a, b, c).__hash__() (Starting with the hash() builtin itself, I expect, although that is easy enough to fix if needed.) Your proposal will break backwards compatibility, as it requires a change in semantics: (1) (a, b, c).__hash__() must keep the current behaviour, which means behaving like a bound instance method; (2) But tuple.__hash__ will no longer return an unbound method (actually a function object, but the difference is unimportant) and instead will return something that behaves like a bound class method. Here's an implementation which does this: http://code.activestate.com/recipes/577030-dualmethod-descriptor/ so such a thing is possible. But it breaks backwards-compatability and introduces something which I consider to be an unclean API (calling a dunder method directly). Unless there's a *really* strong advantage to tuple.__hash__(...) over hash.from_iterable(...) (or equivalent), I would be against this change. > (And similarly with frozenset.__hash__(), so that the fast C > implementation of that algorithm could be used, rather than the slow > collections.Set._hash() implementation. Then the duplicated implementation > in _collections_abc.py's Set._hash() could be removed completely, > delegating to frozenset.__hash__() instead.) This is a good point. Until now, I've been assuming that hash.from_iterable should consider order. But frozenset shows us that sometimes the hash should *not* consider order. This hints that perhaps the hash.from_iterable() should have its own optional dunder method. Or maybe we need two functions: an ordered version and an unordered version. Hmmm... just tossing out a wild idea here... let's get rid of the dunder method part of your suggestion, and add new public class methods to tuple and frozenset: tuple.hash_from_iter(iterable) frozenset.hash_from_iter(iterable) That gets rid of all the objections about backwards compatibility, since these are new methods. They're not dunder names, so there are no objections to being used as part of the public API. A possible objection is the question, is this functionality *actually* important enough to bother? Another possible objection: are these methods part of the sequence/set API? If not, do they really belong on the tuple/frozenset? Maybe they belong elsewhere? > Would this API more cleanly communicate the algorithm being used and the > implementation, No. If you want to communicate the algorithm being used, write some documentation. Seriously, the public API doesn't communicate the algorithm used for the implementation. How can it? We can keep the same interface and change the implementation, or change the interface and keep the implementation. The two are (mostly) independent. > while making a smaller increase in API surface area > compared to introducing a new function? It's difficult to quantify "API surface area". On the one hand, we have the addition of one or two new functions or methods. Contrast with: * introducing a new kind of method into the built-ins (one which behaves like a classmethod when called from the class, and like an instance method when called from an instance); * changing tuple.__hash__ from an ordinary method to one of the above special methods; * and likewise for frozenset.__hash__; * change __hash__ from "only used as implementation, not as interface" to "sometimes used as interface". To me, adding one or two new methods/functions is the smaller, or at least less disruptive, change. -- Steve From mistersheik at gmail.com Wed Jan 4 20:04:04 2017 From: mistersheik at gmail.com (Neil Girdhar) Date: Wed, 4 Jan 2017 17:04:04 -0800 (PST) Subject: [Python-ideas] incremental hashing in __hash__ In-Reply-To: <20170105003107.GJ3887@ando.pearwood.info> References: <20161229081959.GA3887@ando.pearwood.info> <20170105003107.GJ3887@ando.pearwood.info> Message-ID: <573aa206-348c-43e1-b42d-13665062cf76@googlegroups.com> Couldn't you add __hash__ to collections.abc.Iterable ? Essentially, expose __hash__ there; then all iterables automatically have a default hash that hashes their ordered contents. On Wednesday, January 4, 2017 at 7:37:26 PM UTC-5, Steven D'Aprano wrote: > > On Wed, Jan 04, 2017 at 04:38:05PM -0500, j... at math.brown.edu > wrote: > > Instead of the proposals like "hash.from_iterable()", would it make > sense > > to allow tuple.__hash__() to accept any iterable, when called as a > > classmethod? > > The public API for calculating the hash of something is to call the > hash() builtin function on some object, e.g. to call tuple.__hash__ you > write hash((a, b, c)). The __hash__ dunder method is implementation, not > interface, and normally shouldn't be called directly. > > Unless I'm missing something obvious, your proposal would require the > caller to call the dunder methods directly: > > class X: > def __hash__(self): > return tuple.__hash__(iter(self)) > > I consider that a poor interface design. > > But even if we decide to make an exception in this case, tuple.__hash__ > is currently an ordinary instance method right now. There's probably > code that relies on that fact and expects that: > > tuple.__hash__((a, b, c)) > > is currently the same as > > (a, b, c).__hash__() > > > (Starting with the hash() builtin itself, I expect, although that is > easy enough to fix if needed.) Your proposal will break backwards > compatibility, as it requires a change in semantics: > > (1) (a, b, c).__hash__() must keep the current behaviour, which > means behaving like a bound instance method; > > (2) But tuple.__hash__ will no longer return an unbound method (actually > a function object, but the difference is unimportant) and instead will > return something that behaves like a bound class method. > > Here's an implementation which does this: > > http://code.activestate.com/recipes/577030-dualmethod-descriptor/ > > so such a thing is possible. But it breaks backwards-compatability and > introduces something which I consider to be an unclean API (calling a > dunder method directly). Unless there's a *really* strong advantage to > > tuple.__hash__(...) > > over > > hash.from_iterable(...) > > (or equivalent), I would be against this change. > > > > > (And similarly with frozenset.__hash__(), so that the fast C > > implementation of that algorithm could be used, rather than the slow > > collections.Set._hash() implementation. Then the duplicated > implementation > > in _collections_abc.py's Set._hash() could be removed completely, > > delegating to frozenset.__hash__() instead.) > > This is a good point. Until now, I've been assuming that > hash.from_iterable should consider order. But frozenset shows us that > sometimes the hash should *not* consider order. > > This hints that perhaps the hash.from_iterable() should have its own > optional dunder method. Or maybe we need two functions: an ordered > version and an unordered version. > > Hmmm... just tossing out a wild idea here... let's get rid of the dunder > method part of your suggestion, and add new public class methods to > tuple and frozenset: > > tuple.hash_from_iter(iterable) > frozenset.hash_from_iter(iterable) > > > That gets rid of all the objections about backwards compatibility, since > these are new methods. They're not dunder names, so there are no > objections to being used as part of the public API. > > A possible objection is the question, is this functionality *actually* > important enough to bother? > > Another possible objection: are these methods part of the sequence/set > API? If not, do they really belong on the tuple/frozenset? Maybe they > belong elsewhere? > > > > > Would this API more cleanly communicate the algorithm being used and the > > implementation, > > No. If you want to communicate the algorithm being used, write some > documentation. > > Seriously, the public API doesn't communicate the algorithm used for the > implementation. How can it? We can keep the same interface and change > the implementation, or change the interface and keep the implementation. > The two are (mostly) independent. > > > > > while making a smaller increase in API surface area > > compared to introducing a new function? > > It's difficult to quantify "API surface area". On the one hand, we have > the addition of one or two new functions or methods. Contrast with: > > * introducing a new kind of method into the built-ins (one which > behaves like a classmethod when called from the class, and like > an instance method when called from an instance); > > * changing tuple.__hash__ from an ordinary method to one of the > above special methods; > > * and likewise for frozenset.__hash__; > > * change __hash__ from "only used as implementation, not as > interface" to "sometimes used as interface". > > > To me, adding one or two new methods/functions is the smaller, or at > least less disruptive, change. > > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python... at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Thu Jan 5 03:57:33 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 5 Jan 2017 08:57:33 +0000 Subject: [Python-ideas] incremental hashing in __hash__ In-Reply-To: <20170105003107.GJ3887@ando.pearwood.info> References: <20161229081959.GA3887@ando.pearwood.info> <20170105003107.GJ3887@ando.pearwood.info> Message-ID: On 5 January 2017 at 00:31, Steven D'Aprano wrote: > This is a good point. Until now, I've been assuming that > hash.from_iterable should consider order. But frozenset shows us that > sometimes the hash should *not* consider order. > > This hints that perhaps the hash.from_iterable() should have its own > optional dunder method. Or maybe we need two functions: an ordered > version and an unordered version. > > Hmmm... just tossing out a wild idea here... let's get rid of the dunder > method part of your suggestion, and add new public class methods to > tuple and frozenset: > > tuple.hash_from_iter(iterable) > frozenset.hash_from_iter(iterable) > > > That gets rid of all the objections about backwards compatibility, since > these are new methods. They're not dunder names, so there are no > objections to being used as part of the public API. > > A possible objection is the question, is this functionality *actually* > important enough to bother? > > Another possible objection: are these methods part of the sequence/set > API? If not, do they really belong on the tuple/frozenset? Maybe they > belong elsewhere? At this point I'd be inclined to say that a 3rd party hashing_utils module would be a reasonable place to thrash out these design decisions before committing to a permanent design in the stdlib. The popularity of such a module would also give a level of indication as to whether this is an important optimisation in practice. Paul From matt at getpattern.com Thu Jan 5 04:00:37 2017 From: matt at getpattern.com (Matt Gilson) Date: Thu, 5 Jan 2017 01:00:37 -0800 Subject: [Python-ideas] incremental hashing in __hash__ In-Reply-To: <573aa206-348c-43e1-b42d-13665062cf76@googlegroups.com> References: <20161229081959.GA3887@ando.pearwood.info> <20170105003107.GJ3887@ando.pearwood.info> <573aa206-348c-43e1-b42d-13665062cf76@googlegroups.com> Message-ID: But, I think that the problem with adding `__hash__` to collections.abc.Iterable is that not all iterables are immutable -- And if they aren't immutable, then allowing them to be hashed is likely to be a pretty bad idea... I'm still having a hard time being convinced that this is very much of an optimization at all ... If you start hashing tuples that are large enough that memory is a concern, then that's going to also take a *really* long time and probably be prohibitive anyway. Just for kicks, I decided to throw together a simple script to time how much penalty you pay for hashing a tuple: class F(object): def __init__(self, arg): self.arg = arg def __hash__(self): return hash(tuple(self.arg)) class T(object): def __init__(self, arg): self.arg = tuple(arg) def __hash__(self): return hash(self.arg) class C(object): def __init__(self, arg): self.arg = tuple(arg) self._hash = None def __hash__(self): if self._hash is None: self._hash = hash(tuple(self.arg)) return self._hash import timeit print(timeit.timeit('hash(f)', 'from __main__ import F; f = F(list(range(500)))')) print(timeit.timeit('hash(t)', 'from __main__ import T; t = T(list(range(500)))')) print(timeit.timeit('hash(c)', 'from __main__ import C; c = C(list(range(500)))')) results = [] for i in range(1, 11): n = i * 100 t1 = timeit.timeit('hash(f)', 'from __main__ import F; f = F(list(range(%d)))' % i) t2 = timeit.timeit('hash(t)', 'from __main__ import T; t = T(list(range(%d)))' % i) results.append(t1/t2) print(results) F is going to create a new tuple each time and then hash it. T already has a tuple, so we'll only pay the cost of hashing a tuple, not the cost of constructing a tuple and C caches the hash value and re-uses it once it is known. C is the winner by a factor of 10 or more (no surprise there). But the real interesting thing is that the the ratio of the timing results from hashing `F` vs. `T` is relatively constant in the range of my test (up to 1000 elements) and that ratio's value is approximately 1.3. For most applications, that seems reasonable. If you really need a speed-up, then I suppose you could recode the thing in Cython and see what happens, but I doubt that will be frequently necessary. If you _do_ code it up in Cython, put it up on Pypi and see if people use it... On Wed, Jan 4, 2017 at 5:04 PM, Neil Girdhar wrote: > Couldn't you add __hash__ to collections.abc.Iterable ? Essentially, > expose __hash__ there; then all iterables automatically have a default hash > that hashes their ordered contents. > > On Wednesday, January 4, 2017 at 7:37:26 PM UTC-5, Steven D'Aprano wrote: >> >> On Wed, Jan 04, 2017 at 04:38:05PM -0500, j... at math.brown.edu wrote: >> > Instead of the proposals like "hash.from_iterable()", would it make >> sense >> > to allow tuple.__hash__() to accept any iterable, when called as a >> > classmethod? >> >> The public API for calculating the hash of something is to call the >> hash() builtin function on some object, e.g. to call tuple.__hash__ you >> write hash((a, b, c)). The __hash__ dunder method is implementation, not >> interface, and normally shouldn't be called directly. >> >> Unless I'm missing something obvious, your proposal would require the >> caller to call the dunder methods directly: >> >> class X: >> def __hash__(self): >> return tuple.__hash__(iter(self)) >> >> I consider that a poor interface design. >> >> But even if we decide to make an exception in this case, tuple.__hash__ >> is currently an ordinary instance method right now. There's probably >> code that relies on that fact and expects that: >> >> tuple.__hash__((a, b, c)) >> >> is currently the same as >> >> (a, b, c).__hash__() >> >> >> (Starting with the hash() builtin itself, I expect, although that is >> easy enough to fix if needed.) Your proposal will break backwards >> compatibility, as it requires a change in semantics: >> >> (1) (a, b, c).__hash__() must keep the current behaviour, which >> means behaving like a bound instance method; >> >> (2) But tuple.__hash__ will no longer return an unbound method (actually >> a function object, but the difference is unimportant) and instead will >> return something that behaves like a bound class method. >> >> Here's an implementation which does this: >> >> http://code.activestate.com/recipes/577030-dualmethod-descriptor/ >> >> so such a thing is possible. But it breaks backwards-compatability and >> introduces something which I consider to be an unclean API (calling a >> dunder method directly). Unless there's a *really* strong advantage to >> >> tuple.__hash__(...) >> >> over >> >> hash.from_iterable(...) >> >> (or equivalent), I would be against this change. >> >> >> >> > (And similarly with frozenset.__hash__(), so that the fast C >> > implementation of that algorithm could be used, rather than the slow >> > collections.Set._hash() implementation. Then the duplicated >> implementation >> > in _collections_abc.py's Set._hash() could be removed completely, >> > delegating to frozenset.__hash__() instead.) >> >> This is a good point. Until now, I've been assuming that >> hash.from_iterable should consider order. But frozenset shows us that >> sometimes the hash should *not* consider order. >> >> This hints that perhaps the hash.from_iterable() should have its own >> optional dunder method. Or maybe we need two functions: an ordered >> version and an unordered version. >> >> Hmmm... just tossing out a wild idea here... let's get rid of the dunder >> method part of your suggestion, and add new public class methods to >> tuple and frozenset: >> >> tuple.hash_from_iter(iterable) >> frozenset.hash_from_iter(iterable) >> >> >> That gets rid of all the objections about backwards compatibility, since >> these are new methods. They're not dunder names, so there are no >> objections to being used as part of the public API. >> >> A possible objection is the question, is this functionality *actually* >> important enough to bother? >> >> Another possible objection: are these methods part of the sequence/set >> API? If not, do they really belong on the tuple/frozenset? Maybe they >> belong elsewhere? >> >> >> >> > Would this API more cleanly communicate the algorithm being used and >> the >> > implementation, >> >> No. If you want to communicate the algorithm being used, write some >> documentation. >> >> Seriously, the public API doesn't communicate the algorithm used for the >> implementation. How can it? We can keep the same interface and change >> the implementation, or change the interface and keep the implementation. >> The two are (mostly) independent. >> >> >> >> > while making a smaller increase in API surface area >> > compared to introducing a new function? >> >> It's difficult to quantify "API surface area". On the one hand, we have >> the addition of one or two new functions or methods. Contrast with: >> >> * introducing a new kind of method into the built-ins (one which >> behaves like a classmethod when called from the class, and like >> an instance method when called from an instance); >> >> * changing tuple.__hash__ from an ordinary method to one of the >> above special methods; >> >> * and likewise for frozenset.__hash__; >> >> * change __hash__ from "only used as implementation, not as >> interface" to "sometimes used as interface". >> >> >> To me, adding one or two new methods/functions is the smaller, or at >> least less disruptive, change. >> >> >> >> -- >> Steve >> _______________________________________________ >> Python-ideas mailing list >> Python... at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- [image: pattern-sig.png] Matt Gilson // SOFTWARE ENGINEER E: matt at getpattern.com // P: 603.892.7736 We?re looking for beta testers. Go here to sign up! -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Thu Jan 5 04:29:33 2017 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 5 Jan 2017 10:29:33 +0100 Subject: [Python-ideas] incremental hashing in __hash__ In-Reply-To: References: Message-ID: <9c43ef05-12e1-b590-7040-efd2238b716e@egenix.com> On 28.12.2016 04:13, jab at math.brown.edu wrote: > Suppose you have implemented an immutable Position type to represent > the state of a game played on an MxN board, where the board size can > grow quite large. > ... > > According to https://docs.python.org/3/reference/datamodel.html#object.__hash__ > : > > > """ > it is advised to mix together the hash values of the components of the > object that also play a part in comparison of objects by packing them > into a tuple and hashing the tuple. Example: > > def __hash__(self): > return hash((self.name, self.nick, self.color)) > > """ > > > Applying this advice to the use cases above would require creating an > arbitrarily large tuple in memory before passing it to hash(), which > is then just thrown away. It would be preferable if there were a way > to pass multiple values to hash() in a streaming fashion, such that > the overall hash were computed incrementally, without building up a > large object in memory first. I think there's a misunderstanding here: the hash(obj) built-in merely interfaces to the obj.__hash__() method (or the tp_hash slot for C types) and returns whatever these methods give. It doesn't implement any logic by itself. If you would like to implement a more efficient hash algorithm for your types, just go ahead and write them as .__hash__() method or tp_hash slot method and you're done. The example from the docs is just to showcase an example of how such a hash function should work, i.e. to mix in all relevant data attributes. In your case, you'd probably use a simple for loop to calculate the hash without creating tuples or any other temporary structures. Here's the hash implementation tuples use as an example /* The addend 82520, was selected from the range(0, 1000000) for generating the greatest number of prime multipliers for tuples upto length eight: 1082527, 1165049, 1082531, 1165057, 1247581, 1330103, 1082533, 1330111, 1412633, 1165069, 1247599, 1495177, 1577699 Tests have shown that it's not worth to cache the hash value, see issue #9685. */ static Py_hash_t tuplehash(PyTupleObject *v) { Py_uhash_t x; /* Unsigned for defined overflow behavior. */ Py_hash_t y; Py_ssize_t len = Py_SIZE(v); PyObject **p; Py_uhash_t mult = _PyHASH_MULTIPLIER; x = 0x345678UL; p = v->ob_item; while (--len >= 0) { y = PyObject_Hash(*p++); if (y == -1) return -1; x = (x ^ y) * mult; /* the cast might truncate len; that doesn't change hash stability */ mult += (Py_hash_t)(82520UL + len + len); } x += 97531UL; if (x == (Py_uhash_t)-1) x = -2; return x; } As you can see, there's some magic going on there to make sure that the hash values behave well when used as "keys" for the dictionary implementation (which is their main purpose in Python). You are free to create your own hash implementation. The only characteristic to pay attention to is to have objects which compare equal give the same hash value. This is needed to be able to map such objects to the same dictionary slots. There should be no need to have a special hash function which works on iterables. As long as those iterable objects define their own .__hash__() method or tp_slot, the hash() built-in (and Python's dict implementation) will use these and, if needed, those methods can then use an approach to build hash values using iterators on the object's internal data along similar lines as the above tuple implementation. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jan 05 2017) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From mistersheik at gmail.com Thu Jan 5 08:26:58 2017 From: mistersheik at gmail.com (Neil Girdhar) Date: Thu, 05 Jan 2017 13:26:58 +0000 Subject: [Python-ideas] incremental hashing in __hash__ In-Reply-To: References: <20161229081959.GA3887@ando.pearwood.info> <20170105003107.GJ3887@ando.pearwood.info> <573aa206-348c-43e1-b42d-13665062cf76@googlegroups.com> Message-ID: On Thu, Jan 5, 2017 at 4:00 AM Matt Gilson wrote: > But, I think that the problem with adding `__hash__` to > collections.abc.Iterable is that not all iterables are immutable -- And if > they aren't immutable, then allowing them to be hashed is likely to be a > pretty bad idea... > Good point. A better option is to add collections.abc.ImmutableIterable that derives from Iterable and provides __hash__. Since tuple inherits from it, it can choose to delegate up. Then I think everyone is happy. > > I'm still having a hard time being convinced that this is very much of an > optimization at all ... > > If you start hashing tuples that are large enough that memory is a > concern, then that's going to also take a *really* long time and probably > be prohibitive anyway. Just for kicks, I decided to throw together a > simple script to time how much penalty you pay for hashing a tuple: > > class F(object): > def __init__(self, arg): > self.arg = arg > > def __hash__(self): > return hash(tuple(self.arg)) > > > class T(object): > def __init__(self, arg): > self.arg = tuple(arg) > > def __hash__(self): > return hash(self.arg) > > > class C(object): > def __init__(self, arg): > self.arg = tuple(arg) > self._hash = None > > def __hash__(self): > if self._hash is None: > self._hash = hash(tuple(self.arg)) > return self._hash > > import timeit > > print(timeit.timeit('hash(f)', 'from __main__ import F; f = > F(list(range(500)))')) > print(timeit.timeit('hash(t)', 'from __main__ import T; t = > T(list(range(500)))')) > print(timeit.timeit('hash(c)', 'from __main__ import C; c = > C(list(range(500)))')) > > results = [] > for i in range(1, 11): > n = i * 100 > t1 = timeit.timeit('hash(f)', 'from __main__ import F; f = > F(list(range(%d)))' % i) > t2 = timeit.timeit('hash(t)', 'from __main__ import T; t = > T(list(range(%d)))' % i) > results.append(t1/t2) > print(results) > > > F is going to create a new tuple each time and then hash it. T already > has a tuple, so we'll only pay the cost of hashing a tuple, not the cost of > constructing a tuple and C caches the hash value and re-uses it once it is > known. C is the winner by a factor of 10 or more (no surprise there). But > the real interesting thing is that the the ratio of the timing results from > hashing `F` vs. `T` is relatively constant in the range of my test (up to > 1000 elements) and that ratio's value is approximately 1.3. For most > applications, that seems reasonable. If you really need a speed-up, then I > suppose you could recode the thing in Cython and see what happens, but I > doubt that will be frequently necessary. If you _do_ code it up in Cython, > put it up on Pypi and see if people use it... > > > On Wed, Jan 4, 2017 at 5:04 PM, Neil Girdhar > wrote: > > Couldn't you add __hash__ to collections.abc.Iterable ? Essentially, > expose __hash__ there; then all iterables automatically have a default hash > that hashes their ordered contents. > > On Wednesday, January 4, 2017 at 7:37:26 PM UTC-5, Steven D'Aprano wrote: > > On Wed, Jan 04, 2017 at 04:38:05PM -0500, j... at math.brown.edu wrote: > > Instead of the proposals like "hash.from_iterable()", would it make > sense > > to allow tuple.__hash__() to accept any iterable, when called as a > > classmethod? > > The public API for calculating the hash of something is to call the > hash() builtin function on some object, e.g. to call tuple.__hash__ you > write hash((a, b, c)). The __hash__ dunder method is implementation, not > interface, and normally shouldn't be called directly. > > Unless I'm missing something obvious, your proposal would require the > caller to call the dunder methods directly: > > class X: > def __hash__(self): > return tuple.__hash__(iter(self)) > > I consider that a poor interface design. > > But even if we decide to make an exception in this case, tuple.__hash__ > is currently an ordinary instance method right now. There's probably > code that relies on that fact and expects that: > > tuple.__hash__((a, b, c)) > > is currently the same as > > (a, b, c).__hash__() > > > (Starting with the hash() builtin itself, I expect, although that is > easy enough to fix if needed.) Your proposal will break backwards > compatibility, as it requires a change in semantics: > > (1) (a, b, c).__hash__() must keep the current behaviour, which > means behaving like a bound instance method; > > (2) But tuple.__hash__ will no longer return an unbound method (actually > a function object, but the difference is unimportant) and instead will > return something that behaves like a bound class method. > > Here's an implementation which does this: > > http://code.activestate.com/recipes/577030-dualmethod-descriptor/ > > so such a thing is possible. But it breaks backwards-compatability and > introduces something which I consider to be an unclean API (calling a > dunder method directly). Unless there's a *really* strong advantage to > > tuple.__hash__(...) > > over > > hash.from_iterable(...) > > (or equivalent), I would be against this change. > > > > > (And similarly with frozenset.__hash__(), so that the fast C > > implementation of that algorithm could be used, rather than the slow > > collections.Set._hash() implementation. Then the duplicated > implementation > > in _collections_abc.py's Set._hash() could be removed completely, > > delegating to frozenset.__hash__() instead.) > > This is a good point. Until now, I've been assuming that > hash.from_iterable should consider order. But frozenset shows us that > sometimes the hash should *not* consider order. > > This hints that perhaps the hash.from_iterable() should have its own > optional dunder method. Or maybe we need two functions: an ordered > version and an unordered version. > > Hmmm... just tossing out a wild idea here... let's get rid of the dunder > method part of your suggestion, and add new public class methods to > tuple and frozenset: > > tuple.hash_from_iter(iterable) > frozenset.hash_from_iter(iterable) > > > That gets rid of all the objections about backwards compatibility, since > these are new methods. They're not dunder names, so there are no > objections to being used as part of the public API. > > A possible objection is the question, is this functionality *actually* > important enough to bother? > > Another possible objection: are these methods part of the sequence/set > API? If not, do they really belong on the tuple/frozenset? Maybe they > belong elsewhere? > > > > > Would this API more cleanly communicate the algorithm being used and the > > implementation, > > No. If you want to communicate the algorithm being used, write some > documentation. > > Seriously, the public API doesn't communicate the algorithm used for the > implementation. How can it? We can keep the same interface and change > the implementation, or change the interface and keep the implementation. > The two are (mostly) independent. > > > > > while making a smaller increase in API surface area > > compared to introducing a new function? > > It's difficult to quantify "API surface area". On the one hand, we have > the addition of one or two new functions or methods. Contrast with: > > * introducing a new kind of method into the built-ins (one which > behaves like a classmethod when called from the class, and like > an instance method when called from an instance); > > * changing tuple.__hash__ from an ordinary method to one of the > above special methods; > > * and likewise for frozenset.__hash__; > > * change __hash__ from "only used as implementation, not as > interface" to "sometimes used as interface". > > > To me, adding one or two new methods/functions is the smaller, or at > least less disruptive, change. > > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python... at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > > > > -- > > [image: pattern-sig.png] > > Matt Gilson // SOFTWARE ENGINEER > > E: matt at getpattern.com // P: 603.892.7736 <(603)%20892-7736> > > We?re looking for beta testers. Go here > to sign up! > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Thu Jan 5 08:28:30 2017 From: mistersheik at gmail.com (Neil Girdhar) Date: Thu, 05 Jan 2017 13:28:30 +0000 Subject: [Python-ideas] incremental hashing in __hash__ In-Reply-To: <9c43ef05-12e1-b590-7040-efd2238b716e@egenix.com> References: <9c43ef05-12e1-b590-7040-efd2238b716e@egenix.com> Message-ID: The point is that the OP doesn't want to write his own hash function, but wants Python to provide a standard way of hashing an iterable. Today, the standard way is to convert to tuple and call hash on that. That may not be efficient. FWIW from a style perspective, I agree with OP. On Thu, Jan 5, 2017 at 4:30 AM M.-A. Lemburg wrote: > On 28.12.2016 04:13, jab at math.brown.edu wrote: > > Suppose you have implemented an immutable Position type to represent > > the state of a game played on an MxN board, where the board size can > > grow quite large. > > ... > > > > According to > https://docs.python.org/3/reference/datamodel.html#object.__hash__ > > : > > > > > > """ > > it is advised to mix together the hash values of the components of the > > object that also play a part in comparison of objects by packing them > > into a tuple and hashing the tuple. Example: > > > > def __hash__(self): > > return hash((self.name, self.nick, self.color)) > > > > """ > > > > > > Applying this advice to the use cases above would require creating an > > arbitrarily large tuple in memory before passing it to hash(), which > > is then just thrown away. It would be preferable if there were a way > > to pass multiple values to hash() in a streaming fashion, such that > > the overall hash were computed incrementally, without building up a > > large object in memory first. > > I think there's a misunderstanding here: the hash(obj) built-in > merely interfaces to the obj.__hash__() method (or the tp_hash slot > for C types) and returns whatever these methods give. > > It doesn't implement any logic by itself. > > If you would like to implement a more efficient hash algorithm > for your types, just go ahead and write them as .__hash__() > method or tp_hash slot method and you're done. > > The example from the docs is just to showcase an example of > how such a hash function should work, i.e. to mix in all > relevant data attributes. > > In your case, you'd probably use a simple for loop to calculate > the hash without creating tuples or any other temporary > structures. > > Here's the hash implementation tuples use as an example > > /* The addend 82520, was selected from the range(0, 1000000) for > generating the greatest number of prime multipliers for tuples > upto length eight: > > 1082527, 1165049, 1082531, 1165057, 1247581, 1330103, 1082533, > 1330111, 1412633, 1165069, 1247599, 1495177, 1577699 > > Tests have shown that it's not worth to cache the hash value, see > issue #9685. > */ > > static Py_hash_t > tuplehash(PyTupleObject *v) > { > Py_uhash_t x; /* Unsigned for defined overflow behavior. */ > Py_hash_t y; > Py_ssize_t len = Py_SIZE(v); > PyObject **p; > Py_uhash_t mult = _PyHASH_MULTIPLIER; > x = 0x345678UL; > p = v->ob_item; > while (--len >= 0) { > y = PyObject_Hash(*p++); > if (y == -1) > return -1; > x = (x ^ y) * mult; > /* the cast might truncate len; that doesn't change hash > stability */ > mult += (Py_hash_t)(82520UL + len + len); > } > x += 97531UL; > if (x == (Py_uhash_t)-1) > x = -2; > return x; > } > > As you can see, there's some magic going on there to make > sure that the hash values behave well when used as "keys" > for the dictionary implementation (which is their main > purpose in Python). > > You are free to create your own hash implementation. > The only characteristic to pay attention to is to have > objects which compare equal give the same hash value. > This is needed to be able to map such objects to the same > dictionary slots. > > There should be no need to have a special hash function which > works on iterables. As long as those iterable objects define > their own .__hash__() method or tp_slot, the hash() built-in > (and Python's dict implementation) will use these and, if needed, > those methods can then use an approach to build hash values > using iterators on the object's internal data along similar > lines as the above tuple implementation. > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Experts (#1, Jan 05 2017) > >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ > >>> Python Database Interfaces ... http://products.egenix.com/ > >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ > ________________________________________________________________________ > > ::: We implement business ideas - efficiently in both time and costs ::: > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > http://www.malemburg.com/ > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/XcuC01a8SYs/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Thu Jan 5 09:01:17 2017 From: random832 at fastmail.com (Random832) Date: Thu, 05 Jan 2017 09:01:17 -0500 Subject: [Python-ideas] incremental hashing in __hash__ In-Reply-To: References: <20161229081959.GA3887@ando.pearwood.info> <20170105003107.GJ3887@ando.pearwood.info> <573aa206-348c-43e1-b42d-13665062cf76@googlegroups.com> Message-ID: <1483624877.1889993.838209577.47C64B41@webmail.messagingengine.com> On Thu, Jan 5, 2017, at 04:00, Matt Gilson wrote: > But, I think that the problem with adding `__hash__` to > collections.abc.Iterable is that not all iterables are immutable -- And > if > they aren't immutable, then allowing them to be hashed is likely to be a > pretty bad idea... Why? This should never cause an interpreter-crashing bug, because user-defined types can have bad hash methods anyway. And without that, the reason for not applying the "consenting adults" principle and allowing people to add mutable objects to a *short-lived* dict without intending to change them while the dict is in use has never been clear to me. I think mutable types not having a hash method was a mistake in the first place. From victor.stinner at gmail.com Thu Jan 5 10:38:22 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 5 Jan 2017 16:38:22 +0100 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode Message-ID: Hi, Nick Coghlan asked me to review his PEP 538 "Coercing the legacy C locale to C.UTF-8": https://www.python.org/dev/peps/pep-0538/ Nick wants to change the default behaviour. I'm not sure that I'm brave enough to follow this direction, so I proposed my old "-X utf8" command line idea as a new PEP: add a new UTF-8 mode, *disabled by default*. These 2 PEPs are the follow-up of the Windows PEP 529 (Change Windows filesystem encoding to UTF-8) and the issue #19977 (Use "surrogateescape" error handler for sys.stdin and sys.stdout on UNIX for the C locale). The topic (switching to UTF-8 on UNIX) is actively discussed on: http://bugs.python.org/issue28180 Read the PEP online (HTML): https://www.python.org/dev/peps/pep-0540/ Victor PEP: 540 Title: Add a new UTF-8 mode Version: $Revision$ Last-Modified: $Date$ Author: Victor Stinner Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 5-January-2016 Python-Version: 3.7 Abstract ======== Add a new UTF-8 mode, opt-in option to use UTF-8 for operating system data instead of the locale encoding. Add ``-X utf8`` command line option and ``PYTHONUTF8`` environment variable. Context ======= Locale and operating system data -------------------------------- Python uses the ``LC_CTYPE`` locale to decide how to encode and decode data from/to the operating system: * file content * command line arguments: ``sys.argv`` * standard streams: ``sys.stdin``, ``sys.stdout``, ``sys.stderr`` * environment variables: ``os.environ`` * filenames: ``os.listdir(str)`` for example * pipes: ``subprocess.Popen`` using ``subprocess.PIPE`` for example * error messages * name of a timezone * user name, terminal name: ``os``, ``grp`` and ``pwd`` modules * host name, UNIX socket path: see the ``socket`` module * etc. At startup, Python calls ``setlocale(LC_CTYPE, "")`` to use the user ``LC_CTYPE`` locale and then store the locale encoding, ``sys.getfilesystemencoding()``. In the whole lifetime of a Python process, the same encoding and error handler are used to encode and decode data from/to the operating system. .. note:: In some corner case, the *current* ``LC_CTYPE`` locale must be used instead of ``sys.getfilesystemencoding()``. For example, the ``time`` module uses the *current* ``LC_CTYPE`` locale to decode timezone names. The POSIX locale and its encoding --------------------------------- The following environment variables are used to configure the locale, in this preference order: * ``LC_ALL``, most important variable * ``LC_CTYPE`` * ``LANG`` The POSIX locale,also known as "the C locale", is used: * if the first set variable is set to ``"C"`` * if all these variables are unset, for example when a program is started in an empty environment. The encoding of the POSIX locale must be ASCII or a superset of ASCII. On Linux, the POSIX locale uses the ASCII encoding. On FreeBSD and Solaris, ``nl_langinfo(CODESET)`` announces an alias of the ASCII encoding, whereas ``mbstowcs()`` and ``wcstombs()`` functions use the ISO 8859-1 encoding (Latin1) in practice. The problem is that ``os.fsencode()`` and ``os.fsdecode()`` use ``locale.getpreferredencoding()`` codec. For example, if command line arguments are decoded by ``mbstowcs()`` and encoded back by ``os.fsencode()``, an ``UnicodeEncodeError`` exception is raised instead of retrieving the original byte string. To fix this issue, Python now checks since Python 3.4 if ``mbstowcs()`` really uses the ASCII encoding if the the ``LC_CTYPE`` uses the the POSIX locale and ``nl_langinfo(CODESET)`` returns ``"ASCII"`` (or an alias to ASCII). If not (the effective encoding is not ASCII), Python uses its own ASCII codec instead of using ``mbstowcs()`` and ``wcstombs()`` functions for operating system data. See the `POSIX locale (2016 Edition) `_. C.UTF-8 and C.utf8 locales -------------------------- Some operating systems provide a variant of the POSIX locale using the UTF-8 encoding: * Fedora 25: ``"C.utf8"`` or ``"C.UTF-8"`` * Debian (eglibc 2.13-1, 2011): ``"C.UTF-8"`` * HP-UX: ``"C.utf8"`` It was proposed to add a ``C.UTF-8`` locale to glibc: `glibc C.UTF-8 proposal `_. Popularity of the UTF-8 encoding -------------------------------- Python 3 uses UTF-8 by default for Python source files. On Mac OS X, Windows and Android, Python always use UTF-8 for operating system data instead of the locale encoding. For Windows, see the `PEP 529: Change Windows filesystem encoding to UTF-8 `_. On Linux, UTF-8 became the defacto standard encoding by default, replacing legacy encodings like ISO 8859-1 or ShiftJIS. For example, using different encodings for filenames and standard streams is likely to create mojibake, so UTF-8 is now used *everywhere*. The UTF-8 encoding is the default encoding of XML and JSON file format. In January 2017, UTF-8 was used in `more than 88% of web pages `_ (HTML, Javascript, CSS, etc.). See `utf8everywhere.org `_ for more general information on the UTF-8 codec. .. note:: Some applications and operating systems (especially Windows) use Byte Order Markers (BOM) to indicate the used Unicode encoding: UTF-7, UTF-8, UTF-16-LE, etc. BOM are not well supported and rarely used in Python. Old data stored in different encodings and surrogateescape ---------------------------------------------------------- Even if UTF-8 became the defacto standard, there are still systems in the wild which don't use UTF-8. And there are a lot of data stored in different encodings. For example, an old USB key using the ext3 filesystem with filenames encoded to ISO 8859-1. The Linux kernel and the libc don't decode filenames: a filename is used as a raw array of bytes. The common solution to support any filename is to store filenames as bytes and don't try to decode them. When displayed to stdout, mojibake is displayed if the filename and the terminal don't use the same encoding. Python 3 promotes Unicode everywhere including filenames. A solution to support filenames not decodable from the locale encoding was found: the ``surrogateescape`` error handler (`PEP 393 `_), store undecodable bytes as surrogate characters. This error handler is used by default for operating system data, by ``os.fsdecode()`` and ``os.fsencode()`` for example (except on Windows which uses the ``strict`` error handler). Standard streams ---------------- Python uses the locale encoding for standard streams: stdin, stdout and stderr. The ``strict`` error handler is used by stdin and stdout to prevent mojibake. The ``backslashreplace`` error handler is used by stderr to avoid Unicode encode error when displaying non-ASCII text. It is especially useful when the POSIX locale is used, because this locale usually uses the ASCII encoding. The problem is that operating system data like filenames are decoded using the ``surrogateescape`` error handler (PEP 393). Displaying a filename to stdout raises an Unicode encode error if the filename contains an undecoded byte stored as a surrogate character. Python 3.6 now uses ``surrogateescape`` for stdin and stdout if the POSIX locale is used: `issue #19977 `_. The idea is to passthrough operating system data even if it means mojibake, because most UNIX applications work like that. Most UNIX applications store filenames as bytes, usually simply because bytes are first-citizen class in the used programming language, whereas Unicode is badly supported. .. note:: The encoding and/or the error handler of standard streams can be overriden with the ``PYTHONIOENCODING`` environment variable. Proposal ======== Add a new UTF-8 mode, opt-in option to use UTF-8 for operating system data instead of the locale encoding: * Add ``-X utf8`` command line option * Add ``PYTHONUTF8=1`` environment variable Add also a strict UTF-8 mode, enabled by ``-X utf8=strict`` or ``PYTHONUTF8=strict``. The UTF-8 mode changes the default encoding and error handler used by open(), os.fsdecode(), os.fsencode(), sys.stdin, sys.stdout and sys.stderr: ============================ ======================= ======================= ====================== ====================== Function Default, other locales Default, POSIX locale UTF-8 UTF-8 Strict ============================ ======================= ======================= ====================== ====================== open() locale/strict locale/strict UTF-8/surrogateescape UTF-8/strict os.fsdecode(), os.fsencode() locale/surrogateescape locale/surrogateescape UTF-8/surrogateescape UTF-8/strict sys.stdin locale/strict locale/surrogateescape UTF-8/surrogateescape UTF-8/strict sys.stdout locale/strict locale/surrogateescape UTF-8/surrogateescape UTF-8/strict sys.stderr locale/backslashreplace locale/backslashreplace UTF-8/backslashreplace UTF-8/backslashreplace ============================ ======================= ======================= ====================== ====================== The UTF-8 mode is disabled by default to keep hard Unicode errors when encoding or decoding operating system data failed, and to keep the backward compatibility. The user is responsible to enable explicitly the UTF-8 mode, and so is better prepared for mojibake than if the UTF-8 mode would be enabled *by default*. The UTF-8 mode should be used on systems known to be configured with UTF-8 where most applications speak UTF-8. It prevents Unicode errors if the user overrides a locale *by mistake* or if a Python program is started with no locale configured (and so with the POSIX locale). Most UNIX applications handle operating system data as bytes, so ``LC_ALL``, ``LC_CTYPE`` and ``LANG`` environment variables have a limited impact on how these data are handled by the application. The Python UTF-8 mode should help to make Python more interoperable with the other UNIX applications in the system assuming that *UTF-8* is used everywhere and that users *expect* UTF-8. Ignoring ``LC_ALL``, ``LC_CTYPE`` and ``LANG`` environment variables in Python is more convenient, since they are more commonly misconfigured *by mistake* (configured to use an encoding different than UTF-8, whereas the system uses UTF-8), rather than being misconfigured by intent. Backward Compatibility ====================== Since the UTF-8 mode is disabled by default, it has no impact on the backward compatibility. The new UTF-8 mode must be enabled explicitly. Alternatives ============ Always use UTF-8 ---------------- Python already always use the UTF-8 encoding on Mac OS X, Android and Windows. Since UTF-8 became the defacto encoding, it makes sense to always use it on all platforms with any locale. The risk is to introduce mojibake if the locale uses a different encoding, especially for locales other than the POSIX locale. Force UTF-8 for the POSIX locale -------------------------------- An alternative to always using UTF-8 in any case is to only use UTF-8 when the ``LC_CTYPE`` locale is the POSIX locale. The `PEP 538: Coercing the legacy C locale to C.UTF-8 `_ of Nick Coghlan proposes to implement that using the ``C.UTF-8`` locale. Related Work ============ Perl has a ``-C`` command line option and a ``PERLUNICODE`` environment varaible to force UTF-8: see `perlrun `_. It is possible to configure UTF-8 per standard stream, on input and output streams, etc. Copyright ========= This document has been placed in the public domain. From p.f.moore at gmail.com Thu Jan 5 10:58:42 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 5 Jan 2017 15:58:42 +0000 Subject: [Python-ideas] incremental hashing in __hash__ In-Reply-To: References: <9c43ef05-12e1-b590-7040-efd2238b716e@egenix.com> Message-ID: On 5 January 2017 at 13:28, Neil Girdhar wrote: > The point is that the OP doesn't want to write his own hash function, but > wants Python to provide a standard way of hashing an iterable. Today, the > standard way is to convert to tuple and call hash on that. That may not be > efficient. FWIW from a style perspective, I agree with OP. The debate here regarding tuple/frozenset indicates that there may not be a "standard way" of hashing an iterable (should order matter?). Although I agree that assuming order matters is a reasonable assumption to make in the absence of any better information. Hashing is low enough level that providing helpers in the stdlib is not unreasonable. It's not obvious (to me, at least) that it's a common enough need to warrant it, though. Do we have any information on how often people implement their own __hash__, or how often hash(tuple(my_iterable)) would be an acceptable hash, except for the cost of creating the tuple? The OP's request is the only time this has come up as a requirement, to my knowledge. Hence my suggestion to copy the tuple implementation, modify it to work with general iterables, and publish it as a 3rd party module - its usage might give us an idea of how often this need arises. (The other option would be for someone to do some analysis of published code). Assuming it is a sufficiently useful primitive to add, then we can debate naming. But I'd prefer it to be named in such a way that it makes it clear that it's a low-level helper for people writing their own __hash__ function, and not some sort of variant of hashing (which hash.from_iterable implies to me). Paul From victor.stinner at gmail.com Thu Jan 5 11:50:37 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 5 Jan 2017 17:50:37 +0100 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: References: Message-ID: > https://www.python.org/dev/peps/pep-0540/ I read the PEP 538, PEP 540, and issues related to switching to UTF-8. At least, I can say one thing: people have different points of view :-) To understand why people disagree, I tried to categorize the different point of views and Python expectations: "UNIX mode": Python 2 developers and long UNIX users expect that their code "just works". They like Python 3 features, but Python 3 annoy them with various encoding errors. The expectation is to be able to read data encoded to various incompatible encodings and write it into stdout or a text file. In short, mojibake is not a bug but a feature! "Strict Unicode mode" for real Unicode fans: Python 3 is strict and it's a good thing! Strict codec helps to detect very early bugs in the code. These developers understand very well Unicode and are able to fix complex encoding issues. Mojibake is a no-no for them. Python 3.6 is not exactly in the first or the later category: "it depends". To read data from the operating system, Python 3.6 behaves in "UNIX mode": os.listdir() *does* return invalid filenames, it uses a funny encoding using surrogates. To write data back to the operating system, Python 3.6 wears its "Unicode nazi" hat and becomes strict. It's no more possible to write data from from the operating system back to the operating system. Writing a filename read from os.listdir() into stdout or into a text file fails with an encode error. Subtle behaviour: since Python 3.6, with the POSIX locale, Python 3.6 uses the "UNIX mode" but only to write into stdout. It's possible to write a filename into stdout, but not into a text file. In its current shame, my PEP 540 leaves Python default unchanged, but adds two modes: UTF-8 and UTF-8 strict. The UTF-8 mode is more or less the UNIX mode generalized for all inputs and outputs: mojibake is a feature, just pass bytes unchanged. The UTF-8 strict mode is more extreme that the current "Strict Unicode mode" since it fails on *decoding* data from the operating system. Now that I have a better view of what we have and what we want, the question is if the default behaviour should be changed and if yes, how. Nick's PEP 538 does exactly move to the "UNIX mode" (open() doesn't use surrogateescape) nor the "Strict Unicode mode" (fsdecode() still uses surrogateescape), it's still in a grey area. Maybe Nick can elaborate the use case or update his PEP? I guess that all users and most developers are more in the "UNIX mode" camp. *If* we want to change the default, I suggest to use the "UNIX mode" by default. The question is if someone relies/likes on the current Python 3.6 behaviour: reading "just works", writing is strict. If you like this behaviour, what do you think of the tiny Python 3.6 change: use surrogateescape for stdout when the locale is POSIX. Victor From victor.stinner at gmail.com Thu Jan 5 12:10:04 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 5 Jan 2017 18:10:04 +0100 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: References: Message-ID: 2017-01-05 17:50 GMT+01:00 Victor Stinner : > In its current shame, my PEP 540 leaves Python default unchanged, but > adds two modes: UTF-8 and UTF-8 strict. The UTF-8 mode is more or less > the UNIX mode generalized for all inputs and outputs: mojibake is a > feature, just pass bytes unchanged. The UTF-8 strict mode is more > extreme that the current "Strict Unicode mode" since it fails on > *decoding* data from the operating system. > (...) > Now that I have a better view of what we have and what we want, the > question is if the default behaviour should be changed and if yes, > how. A common request is that "Python just works" without having to pass a command line option or set an environment variable. Maybe the default behaviour should be left unchanged, but the behaviour with the POSIX locale should change. Maybe we can enable the UTF-8 mode (or "UNIX mode") of the PEP 540 when the POSIX locale is used? Said differently, the UTF-8 would not only be enabled by -X utf8 and PYTHONUTF8=1, but also enabled by the common LANG=C and when Python is started in an empty environment (no env var). Victor From matt at getpattern.com Thu Jan 5 12:32:12 2017 From: matt at getpattern.com (Matt Gilson) Date: Thu, 5 Jan 2017 09:32:12 -0800 Subject: [Python-ideas] incremental hashing in __hash__ In-Reply-To: References: <9c43ef05-12e1-b590-7040-efd2238b716e@egenix.com> Message-ID: I agree with Paul -- I'm not convinced that this is common enough or that the benefits are big enough to warrant something builtin. However, I did decide to dust off some of my old skills and I threw together a simple gist to see how hard it would be to create something using Cython based on the CPython tuple hash algorithm. I don't know how well it works for arbitrary iterables without a `__length_hint__`, but seems to work as intended for iterables that have the length hint. https://gist.github.com/mgilson/129859a79487a483163980db25b709bf If you're interested, or want to pick this up and actually do something with it, feel free... Also, I haven't written anything using Cython for ages, so if this could be further optimized, feel free to let me know. On Thu, Jan 5, 2017 at 7:58 AM, Paul Moore wrote: > On 5 January 2017 at 13:28, Neil Girdhar wrote: > > The point is that the OP doesn't want to write his own hash function, but > > wants Python to provide a standard way of hashing an iterable. Today, > the > > standard way is to convert to tuple and call hash on that. That may not > be > > efficient. FWIW from a style perspective, I agree with OP. > > The debate here regarding tuple/frozenset indicates that there may not > be a "standard way" of hashing an iterable (should order matter?). > Although I agree that assuming order matters is a reasonable > assumption to make in the absence of any better information. > > Hashing is low enough level that providing helpers in the stdlib is > not unreasonable. It's not obvious (to me, at least) that it's a > common enough need to warrant it, though. Do we have any information > on how often people implement their own __hash__, or how often > hash(tuple(my_iterable)) would be an acceptable hash, except for the > cost of creating the tuple? The OP's request is the only time this has > come up as a requirement, to my knowledge. Hence my suggestion to copy > the tuple implementation, modify it to work with general iterables, > and publish it as a 3rd party module - its usage might give us an idea > of how often this need arises. (The other option would be for someone > to do some analysis of published code). > > Assuming it is a sufficiently useful primitive to add, then we can > debate naming. But I'd prefer it to be named in such a way that it > makes it clear that it's a low-level helper for people writing their > own __hash__ function, and not some sort of variant of hashing (which > hash.from_iterable implies to me). > > Paul > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- [image: pattern-sig.png] Matt Gilson // SOFTWARE ENGINEER E: matt at getpattern.com // P: 603.892.7736 We?re looking for beta testers. Go here to sign up! -------------- next part -------------- An HTML attachment was scrubbed... URL: From phd at phdru.name Thu Jan 5 12:16:38 2017 From: phd at phdru.name (Oleg Broytman) Date: Thu, 5 Jan 2017 18:16:38 +0100 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: References: Message-ID: <20170105171638.GA4217@phdru.name> Hi! On Thu, Jan 05, 2017 at 04:38:22PM +0100, Victor Stinner wrote: > Always use UTF-8 > ---------------- > > Python already always use the UTF-8 encoding on Mac OS X, Android and Windows. > Since UTF-8 became the defacto encoding, it makes sense to always use it on all > platforms with any locale. Please don't! I use different locales and encodings, sometimes it's utf-8, sometimes not - but I have properly configured LC_* settings and I prefer Python to follow my command. It'd be disgusting if Python starts to bend me to its preferences. > The risk is to introduce mojibake if the locale uses a different encoding, > especially for locales other than the POSIX locale. There is no such risk for me as I already have mojibake in my systems. Two most notable sources of mojibake are: 1) FTP servers - people create files (both names and content) in different encodings; w32 FTP clients usually send file names and content in cp1251 (Russian Windows encoding), sometimes in cp866 (Russian Windows OEM encoding). 2) MP3 tags and play lists - almost always cp1251. So whatever my personal encoding is - koi8-r or utf-8 - I have to deal with file names and content in different encodings. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From mistersheik at gmail.com Thu Jan 5 15:31:12 2017 From: mistersheik at gmail.com (Neil Girdhar) Date: Thu, 05 Jan 2017 20:31:12 +0000 Subject: [Python-ideas] incremental hashing in __hash__ In-Reply-To: References: <9c43ef05-12e1-b590-7040-efd2238b716e@egenix.com> Message-ID: On Thu, Jan 5, 2017 at 10:58 AM Paul Moore wrote: > On 5 January 2017 at 13:28, Neil Girdhar wrote: > > The point is that the OP doesn't want to write his own hash function, but > > wants Python to provide a standard way of hashing an iterable. Today, > the > > standard way is to convert to tuple and call hash on that. That may not > be > > efficient. FWIW from a style perspective, I agree with OP. > > The debate here regarding tuple/frozenset indicates that there may not > be a "standard way" of hashing an iterable (should order matter?). > Although I agree that assuming order matters is a reasonable > assumption to make in the absence of any better information. > That's another good point. In keeping with my abc proposal, why not add abstract base classes with __hash__: * ImmutableIterable, and * ImmutableSet. ImmutableSet inherits from ImmutableIterable, and overrides __hash__ in such a way that order is ignored. This presumably involves very little new code ? it's just a propagating up of the code that's already in set and tuple. The advantage is that instead of implementing __hash__ for your type, you declare your intention by inheriting from an abc and get an automatically-provided hash function. Hashing is low enough level that providing helpers in the stdlib is > not unreasonable. It's not obvious (to me, at least) that it's a > common enough need to warrant it, though. Do we have any information > on how often people implement their own __hash__, or how often > hash(tuple(my_iterable)) would be an acceptable hash, except for the > cost of creating the tuple? The OP's request is the only time this has > come up as a requirement, to my knowledge. Hence my suggestion to copy > the tuple implementation, modify it to work with general iterables, > and publish it as a 3rd party module - its usage might give us an idea > of how often this need arises. (The other option would be for someone > to do some analysis of published code). > > Assuming it is a sufficiently useful primitive to add, then we can > debate naming. But I'd prefer it to be named in such a way that it > makes it clear that it's a low-level helper for people writing their > own __hash__ function, and not some sort of variant of hashing (which > hash.from_iterable implies to me). > > Paul > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Thu Jan 5 18:35:09 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 6 Jan 2017 10:35:09 +1100 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: References: Message-ID: <20170105233509.GL3887@ando.pearwood.info> On Thu, Jan 05, 2017 at 04:38:22PM +0100, Victor Stinner wrote: [...] > Python 3 promotes Unicode everywhere including filenames. A solution to > support filenames not decodable from the locale encoding was found: the > ``surrogateescape`` error handler (`PEP 393 > `_), store undecodable bytes > as surrogate characters. PEP 393 is the Flexible String Respresentation. I think you want PEP 383, Non-decodable Bytes in System Character Interfaces. https://www.python.org/dev/peps/pep-0383/ > The problem is that operating system data like filenames are decoded > using the ``surrogateescape`` error handler (PEP 393). /s/393/283/ -- Steve From songofacandy at gmail.com Thu Jan 5 20:15:52 2017 From: songofacandy at gmail.com (INADA Naoki) Date: Fri, 6 Jan 2017 10:15:52 +0900 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: <20170105171638.GA4217@phdru.name> References: <20170105171638.GA4217@phdru.name> Message-ID: >> Always use UTF-8 >> ---------------- >> >> Python already always use the UTF-8 encoding on Mac OS X, Android and Windows. >> Since UTF-8 became the defacto encoding, it makes sense to always use it on all >> platforms with any locale. > > Please don't! I use different locales and encodings, sometimes it's > utf-8, sometimes not - but I have properly configured LC_* settings and > I prefer Python to follow my command. It'd be disgusting if Python > starts to bend me to its preferences. For stdio (including console), PYTHONIOENCODING can be used for supporting legacy system. e.g. `export PYTHONIOENCODING=$(locale charmap)` For commandline argument and filepath, UTF-8/surrogateescape can round trip. But mojibake may happens when pass the path to GUI. If we chose "Always use UTF-8 for fs encoding", I think PYTHONFSENCODING envvar should be added again. (It should be used from startup: decoding command line argument). > >> The risk is to introduce mojibake if the locale uses a different encoding, >> especially for locales other than the POSIX locale. > > There is no such risk for me as I already have mojibake in my > systems. Two most notable sources of mojibake are: > > 1) FTP servers - people create files (both names and content) in > different encodings; w32 FTP clients usually send file names and > content in cp1251 (Russian Windows encoding), sometimes in cp866 > (Russian Windows OEM encoding). > > 2) MP3 tags and play lists - almost always cp1251. > > So whatever my personal encoding is - koi8-r or utf-8 - I have to > deal with file names and content in different encodings. 3) unzip zip file sent by Windows. Windows user use no-ASCII filenames, and create legacy (no UTF-8) zip file very often. I think people using non UTF-8 should solve encoding issue by themselves. People should use ASCII or UTF-8 always if they don't want to see mojibake. > > Oleg. > -- > Oleg Broytman http://phdru.name/ phd at phdru.name > Programmers don't die, they just GOSUB without RETURN. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From victor.stinner at gmail.com Thu Jan 5 20:30:09 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 6 Jan 2017 02:30:09 +0100 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: <20170105233509.GL3887@ando.pearwood.info> References: <20170105233509.GL3887@ando.pearwood.info> Message-ID: 2017-01-06 0:35 GMT+01:00 Steven D'Aprano : >> Python 3 promotes Unicode everywhere including filenames. A solution to >> support filenames not decodable from the locale encoding was found: the >> ``surrogateescape`` error handler (`PEP 393 >> `_), store undecodable bytes >> as surrogate characters. > > PEP 393 is the Flexible String Respresentation. > > I think you want PEP 383, Non-decodable Bytes in System Character > Interfaces. Oops, fixed, thanks :-) Victor From victor.stinner at gmail.com Thu Jan 5 20:43:42 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 6 Jan 2017 02:43:42 +0100 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: References: Message-ID: Ok, I modified my PEP: the POSIX locale now enables the UTF-8 mode. 2017-01-05 18:10 GMT+01:00 Victor Stinner : > A common request is that "Python just works" without having to pass a > command line option or set an environment variable. Maybe the default > behaviour should be left unchanged, but the behaviour with the POSIX > locale should change. http://bugs.python.org/issue28180 asks to "change the default" to get a Python which "just works" without any kind of configuration, in the context of a Docker image (I don't any detail about the image yet). > Maybe we can enable the UTF-8 mode (or "UNIX mode") of the PEP 540 > when the POSIX locale is used? I read again other issues and I confirm that users are looking for a Python 3 which behaves like Python 2: simply don't bother them with encodings. I see the UTF-8 mode as an opportunity to answer to this request. Moreover, the most common cause of encoding issues is a program run with no locale variable set and so using the POSIX locale. So I modified my PEP 540: the POSIX locale now enables the UTF-8 mode. I had to update the "Backward Compatibility" section since the PEP now introduces a backward incompatible change (POSIX locale), but my bet is that the new behaviour is the one expected by users and that it cannot break applications. I moved my initial proposition as an alternative. I added a "Use Cases" section to explain in depth the "always work" behaviour, which I called the "UNIX mode" in my previous email. Latest version of the PEP: https://github.com/python/peps/blob/master/pep-0540.txt https://www.python.org/dev/peps/pep-0540/ will be updated shortly. Victor From victor.stinner at gmail.com Thu Jan 5 20:54:49 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 6 Jan 2017 02:54:49 +0100 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: References: <20170105171638.GA4217@phdru.name> Message-ID: 2017-01-06 2:15 GMT+01:00 INADA Naoki : >>> Always use UTF-8 (...) >> Please don't! (...) > > For stdio (including console), PYTHONIOENCODING can be used for > supporting legacy system. > e.g. `export PYTHONIOENCODING=$(locale charmap)` The problem with ignoring the locale by default and forcing UTF-8 is that Python works with many libraries which use the locale, not UTF-8. The PEP 538 also describes mojibake issues if Python is embedded in an application. > For commandline argument and filepath, UTF-8/surrogateescape can round trip. > But mojibake may happens when pass the path to GUI. Let's say that you have the filename b'nonascii\xff': it's decoded as 'nonascii\xdcff' by the UTF-8 mode. How do GUIs handle such filename? (I don't know the answer, it's a real question ;-)) > If we chose "Always use UTF-8 for fs encoding", I think > PYTHONFSENCODING envvar should be > added again. (It should be used from startup: decoding command line argument). Last time I implemented PYTHONFSENCODING, I had many major issues: https://mail.python.org/pipermail/python-dev/2010-October/104509.html Do you mean that these issues are now outdated and that you have an idea how to fix them? > 3) unzip zip file sent by Windows. Windows user use no-ASCII filenames, and > create legacy (no UTF-8) zip file very often. > > I think people using non UTF-8 should solve encoding issue by themselves. > People should use ASCII or UTF-8 always if they don't want to see mojibake. ZIP files are out the scope of the PEPs 538 and 540. Python cannot guess the encoding, so it was proposed to add an option to give to user the ability to specify an encoding: see https://bugs.python.org/issue10614 for example. But yeah, data encoded to encodings different than UTF-8 are still common, and it's not going to change shortly. Since many Windows applications use the ANSI code page, I easily imagine that many documents are encoded to various incompatible code pages... What I understood is that many users don't want Python to complain on data encoded to different incompatible encodings: process data as a stream of bytes or characters, it depends. Something closer to Python 2 (stream of bytes). That's what I try to describe in this section: https://www.python.org/dev/peps/pep-0540/#old-data-stored-in-different-encodings-and-surrogateescape Victor From turnbull.stephen.fw at u.tsukuba.ac.jp Thu Jan 5 21:10:31 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Fri, 6 Jan 2017 11:10:31 +0900 Subject: [Python-ideas] incremental hashing in __hash__ In-Reply-To: References: <9c43ef05-12e1-b590-7040-efd2238b716e@egenix.com> Message-ID: <22638.64663.95883.494736@turnbull.sk.tsukuba.ac.jp> Paul Moore writes: > The debate here regarding tuple/frozenset indicates that there may not > be a "standard way" of hashing an iterable (should order matter?). If part of the data structure, yes, if an implementation accident, no. > Although I agree that assuming order matters is a reasonable > assumption to make in the absence of any better information. I don't think so. Eg, with dicts now ordered by insertion, an order-dependent default hash for collections means a = {} b = {} a['1'] = 1 a['2'] = 2 b['2'] = 2 b['1'] = 1 hash(a) != hash(b) # modulo usual probability of collision (and modulo normally not hashing mutables). For the same reason I expect I'd disagree with Neil's proposal for an ImmutableWhatever default __hash__ although the hash comparison is "cheap", it's still a pessimization. Haven't thought that through, though. BTW, it occurs to me that now that dictionaries are versioned, in some cases it *may* make sense to hash dictionaries even though they are mutable, although the "hash" would need to somehow account for the version changing. Seems messy but maybe someone has an idea? Steve From turnbull.stephen.fw at u.tsukuba.ac.jp Thu Jan 5 21:10:36 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Fri, 6 Jan 2017 11:10:36 +0900 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: <20170105171638.GA4217@phdru.name> References: <20170105171638.GA4217@phdru.name> Message-ID: <22638.64668.809353.547149@turnbull.sk.tsukuba.ac.jp> Oleg Broytman writes: > On Thu, Jan 05, 2017 at 04:38:22PM +0100, Victor Stinner wrote: > > Since UTF-8 became the defacto encoding, it makes sense to always > > use it on all platforms with any locale. > > Please don't! I've quoted Victor out of context, and his other posts make me very doubtful that he considers this a serious alternative. That said, I'm +1 on "don't!" From turnbull.stephen.fw at u.tsukuba.ac.jp Thu Jan 5 21:10:45 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Fri, 6 Jan 2017 11:10:45 +0900 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: References: Message-ID: <22638.64677.113909.735049@turnbull.sk.tsukuba.ac.jp> Victor Stinner writes: > Python 3.6 is not exactly in the first or the later category: "it > depends". > > To read data from the operating system, Python 3.6 behaves in "UNIX > mode": os.listdir() *does* return invalid filenames, it uses a funny > encoding using surrogates. > > To write data back to the operating system, Python 3.6 wears its > "Unicode nazi" hat and becomes strict. It's no more possible to write > data from from the operating system back to the operating system. > Writing a filename read from os.listdir() into stdout or into a text > file fails with an encode error. > > Subtle behaviour: since Python 3.6, with the POSIX locale, Python 3.6 > uses the "UNIX mode" but only to write into stdout. It's possible to > write a filename into stdout, but not into a text file. The point of this, I suppose, is that piping to xargs works by default. I haven't read the PEPs (don't have time, mea culpa), but my ideal would be three options: --transparent -> errors=surrogateescape on input and output --postel -> errors=surrogateescape on input, =strict on output --unicode-me-harder -> errors=strict on input and output with --postel being default. Unix afficianados with lots of xargs use can use --transparent. Since people have different preferences, I guess there should be an envvar for this. Others probably should configure open() by open(). I'll try to get to the PEPs over the weekend but can't promise. Steve From victor.stinner at gmail.com Thu Jan 5 21:37:30 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 6 Jan 2017 03:37:30 +0100 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: <22638.64668.809353.547149@turnbull.sk.tsukuba.ac.jp> References: <20170105171638.GA4217@phdru.name> <22638.64668.809353.547149@turnbull.sk.tsukuba.ac.jp> Message-ID: 2017-01-06 3:10 GMT+01:00 Stephen J. Turnbull : > I've quoted Victor out of context, and his other posts make me very > doubtful that he considers this a serious alternative. That said, I'm > +1 on "don't!" The "always ignore locale and force UTF-8" option has supporters. For example, Nick Coghlan wrote a whole PEP, PEP 538, to support this. I want that my PEP is complete and so lists all famous alternatives. Victor From victor.stinner at gmail.com Thu Jan 5 21:42:26 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 6 Jan 2017 03:42:26 +0100 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: <22638.64677.113909.735049@turnbull.sk.tsukuba.ac.jp> References: <22638.64677.113909.735049@turnbull.sk.tsukuba.ac.jp> Message-ID: 2017-01-06 3:10 GMT+01:00 Stephen J. Turnbull : > The point of this, I suppose, is that piping to xargs works by > default. Please read the second version (latest) version of my PEP 540 which contains a new "Use Cases" section which helps to define issues and the behaviour of the different modes. > I haven't read the PEPs (don't have time, mea culpa), but my ideal > would be three options: > > --transparent -> errors=surrogateescape on input and output > --postel -> errors=surrogateescape on input, =strict on output > --unicode-me-harder -> errors=strict on input and output PEP 540: --postel is the default --transparent is the UTF-8 mode --unicode-me-harder is the UTF-8 configured to strict The POSIX locale enables --transparent. > with --postel being default. Unix afficianados with lots of xargs use > can use --transparent. Since people have different preferences, I > guess there should be an envvar for this. The PEP adds new -X utf8 command line option and PYTHONUTF8 environment variable to configure the UTF-8 mode. > Others probably should configure open() by open(). My PEP 540 does change the encoding used by open() by default: https://www.python.org/dev/peps/pep-0540/#encoding-and-error-handler Obviously, you can still explicitly set the encoding when calling open(). > I'll try to get to the PEPs over the weekend but can't promise. Please read at least the abstract of my PEP 540 ;-) Victor From ncoghlan at gmail.com Thu Jan 5 22:32:54 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 6 Jan 2017 13:32:54 +1000 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: References: <20170105171638.GA4217@phdru.name> <22638.64668.809353.547149@turnbull.sk.tsukuba.ac.jp> Message-ID: On 6 January 2017 at 12:37, Victor Stinner wrote: > 2017-01-06 3:10 GMT+01:00 Stephen J. Turnbull > : >> I've quoted Victor out of context, and his other posts make me very >> doubtful that he considers this a serious alternative. That said, I'm >> +1 on "don't!" > > The "always ignore locale and force UTF-8" option has supporters. For > example, Nick Coghlan wrote a whole PEP, PEP 538, to support this. Err, no, that's not what PEP 538 does. PEP 538 doesn't do *anything* if a locale is already properly configured - it only changes the locale if the current locale is "C". It's actually very similar to your PEP, except that instead of adding the ability to make CPython ignore the C level locale settings (which I think is a bad idea based on your own previous work in that area and on the way that CPython interacts with other C/C++ components in the same process and in subprocesses), it just *changes* those settings when we're pretty sure they're wrong. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From mistersheik at gmail.com Thu Jan 5 23:36:51 2017 From: mistersheik at gmail.com (Neil Girdhar) Date: Fri, 06 Jan 2017 04:36:51 +0000 Subject: [Python-ideas] incremental hashing in __hash__ In-Reply-To: <22638.64663.95883.494736@turnbull.sk.tsukuba.ac.jp> References: <9c43ef05-12e1-b590-7040-efd2238b716e@egenix.com> <22638.64663.95883.494736@turnbull.sk.tsukuba.ac.jp> Message-ID: On Thu, Jan 5, 2017 at 9:10 PM Stephen J. Turnbull < turnbull.stephen.fw at u.tsukuba.ac.jp> wrote: > Paul Moore writes: > > > The debate here regarding tuple/frozenset indicates that there may not > > be a "standard way" of hashing an iterable (should order matter?). > > If part of the data structure, yes, if an implementation accident, no. > > > Although I agree that assuming order matters is a reasonable > > assumption to make in the absence of any better information. > > I don't think so. Eg, with dicts now ordered by insertion, an > order-dependent default hash for collections means > > a = {} > b = {} > a['1'] = 1 > a['2'] = 2 > b['2'] = 2 > b['1'] = 1 > hash(a) != hash(b) # modulo usual probability of collision > > (and modulo normally not hashing mutables). For the same reason I > expect I'd disagree with Neil's proposal for an ImmutableWhatever > default __hash__ although the hash comparison is "cheap", it's still a > pessimization. Haven't thought that through, though. > I don't understand this? How is providing a default method in an abstract base class a pessimization? If it happens to be slower than the code in the current methods, it can still be overridden. > > BTW, it occurs to me that now that dictionaries are versioned, in some > cases it *may* make sense to hash dictionaries even though they are > mutable, although the "hash" would need to somehow account for the > version changing. Seems messy but maybe someone has an idea? > I think it's important to keep in mind that dictionaries are not versioned in Python. They happen to be versioned in CPython as an unexposed implementation detail. I don't think that such details should have any bearing on potential changes to Python. > Steve > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Thu Jan 5 23:49:22 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 6 Jan 2017 15:49:22 +1100 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: References: <20170105171638.GA4217@phdru.name> Message-ID: <20170106044922.GM3887@ando.pearwood.info> On Fri, Jan 06, 2017 at 02:54:49AM +0100, Victor Stinner wrote: > Let's say that you have the filename b'nonascii\xff': it's decoded as > 'nonascii\xdcff' by the UTF-8 mode. How do GUIs handle such filename? > (I don't know the answer, it's a real question ;-)) I ran this in Python 2.7 to create the file: open(b'/tmp/nonascii\xff-', 'w') and then confirmed the filename: [steve at ando tmp]$ ls -b nonascii* nonascii\377- Konquorer in KDE 3 displays it with *two* "missing character" glyphs (small hollow boxes) before the hyphen. The KDE "Open File" dialog box shows the file with two blank spaces before the hyphen. My interpretation of this is that the difference is due to using different fonts: the file name is shown the same way, but in one font the missing character is a small box and in the other it is a blank space. I cannot tell what KDE is using for the invalid character, if I copy it as text and paste it into a file I just get the original \xFF. The Geany text editor, which I think uses the same GUI toolkit as Gnome, shows the file with a single "missing glyph" character, this time a black diamond with a question mark in it. It looks like Geany (Gnome?) is displaying the invalid byte as U+FFFD, the Unicode "REPLACEMENT CHARACTER". So at least two Linux GUI environments are capable of dealing with filenames that are invalid UTF-8, in two different ways. Does this answer your question about GUIs? -- Steve From stephanh42 at gmail.com Fri Jan 6 01:22:52 2017 From: stephanh42 at gmail.com (Stephan Houben) Date: Fri, 6 Jan 2017 07:22:52 +0100 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: <20170106044922.GM3887@ando.pearwood.info> References: <20170105171638.GA4217@phdru.name> <20170106044922.GM3887@ando.pearwood.info> Message-ID: Hi all, One meta-question I have which may already have been discussed much earlier in this whole proposal series, is: How common is this problem? Because I have the impression that nowadays all Linux distributions are UTF-8 by default and you have to show some bloody-mindedness to end up with a POSIX locale. Docker was mentioned, is this not really an issue which should be solved at the Docker level? Since it would affect *all* applications which are doing something non-trivial with encodings? I realise there is some attractiveness in solving the issue "for Python", since that will reduce the amount of bug reports and get people off the chests of the maintainers, but to get this fixed in the wider Linux ecosystem it might be preferable to "Let them eat mojibake", to paraphrase what Marie-Antoinette never said. Stephan 2017-01-06 5:49 GMT+01:00 Steven D'Aprano : > On Fri, Jan 06, 2017 at 02:54:49AM +0100, Victor Stinner wrote: > > > Let's say that you have the filename b'nonascii\xff': it's decoded as > > 'nonascii\xdcff' by the UTF-8 mode. How do GUIs handle such filename? > > (I don't know the answer, it's a real question ;-)) > > I ran this in Python 2.7 to create the file: > > open(b'/tmp/nonascii\xff-', 'w') > > and then confirmed the filename: > > [steve at ando tmp]$ ls -b nonascii* > nonascii\377- > > Konquorer in KDE 3 displays it with *two* "missing character" glyphs > (small hollow boxes) before the hyphen. The KDE "Open File" dialog box > shows the file with two blank spaces before the hyphen. > > My interpretation of this is that the difference is due to using > different fonts: the file name is shown the same way, but in one font > the missing character is a small box and in the other it is a blank > space. > > I cannot tell what KDE is using for the invalid character, if I copy it > as text and paste it into a file I just get the original \xFF. > > The Geany text editor, which I think uses the same GUI toolkit as Gnome, > shows the file with a single "missing glyph" character, this time a > black diamond with a question mark in it. > > It looks like Geany (Gnome?) is displaying the invalid byte as U+FFFD, > the Unicode "REPLACEMENT CHARACTER". > > So at least two Linux GUI environments are capable of dealing with > filenames that are invalid UTF-8, in two different ways. > > Does this answer your question about GUIs? > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From turnbull.stephen.fw at u.tsukuba.ac.jp Fri Jan 6 02:07:06 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Fri, 6 Jan 2017 16:07:06 +0900 Subject: [Python-ideas] incremental hashing in __hash__ In-Reply-To: References: <9c43ef05-12e1-b590-7040-efd2238b716e@egenix.com> <22638.64663.95883.494736@turnbull.sk.tsukuba.ac.jp> Message-ID: <22639.16922.294573.514463@turnbull.sk.tsukuba.ac.jp> Neil Girdhar writes: > I don't understand this? How is providing a default method in an > abstract base class a pessimization? If it happens to be slower > than the code in the current methods, it can still be overridden. How often will people override until it's bitten them? How many people will not even notice until they've lost business due to slow response? If you don't have a default, that's much more obvious. Note that if there is a default, the collections are "large", and equality comparisons are "rare", this could be a substantial overhead. > > BTW, it occurs to me that now that dictionaries are versioned, in some > > cases it *may* make sense to hash dictionaries even though they are > > mutable, although the "hash" would need to somehow account for the > > version changing. Seems messy but maybe someone has an idea? > I think it's important to keep in mind that dictionaries are not versioned > in Python. They happen to be versioned in CPython as an unexposed > implementation detail. I don't think that such details should have any > bearing on potential changes to Python. AFAIK the use of the hash member for equality checking is an implementation detail too, although the language reference does mention that set, frozenset and dict are "hashed collections". The basic requirements on hashes are that (1) objects that compare equal must hash to the same value, and (2) the hash bucket must not change over an object's lifetime (this is the "messy" aspect that probably kills the idea -- you'd need to fix up all hashed collections that contain the object as a key). From mistersheik at gmail.com Fri Jan 6 02:26:56 2017 From: mistersheik at gmail.com (Neil Girdhar) Date: Fri, 06 Jan 2017 07:26:56 +0000 Subject: [Python-ideas] incremental hashing in __hash__ In-Reply-To: <22639.16922.294573.514463@turnbull.sk.tsukuba.ac.jp> References: <9c43ef05-12e1-b590-7040-efd2238b716e@egenix.com> <22638.64663.95883.494736@turnbull.sk.tsukuba.ac.jp> <22639.16922.294573.514463@turnbull.sk.tsukuba.ac.jp> Message-ID: On Fri, Jan 6, 2017 at 2:07 AM Stephen J. Turnbull < turnbull.stephen.fw at u.tsukuba.ac.jp> wrote: > Neil Girdhar writes: > > > I don't understand this? How is providing a default method in an > > abstract base class a pessimization? If it happens to be slower > > than the code in the current methods, it can still be overridden. > > How often will people override until it's bitten them? How many > people will not even notice until they've lost business due to slow > response? If you don't have a default, that's much more obvious. > Note that if there is a default, the collections are "large", and > equality comparisons are "rare", this could be a substantial overhead. > I still don't understand what you're talking about here. You're saying that we shouldn't provide a __hash__ in case the default hash happens to be slower than what the user wants and so by not providing it, we force the user to write a fast one? Doesn't that argument apply to all methods provided by abcs? > > > BTW, it occurs to me that now that dictionaries are versioned, in some > > > cases it *may* make sense to hash dictionaries even though they are > > > mutable, although the "hash" would need to somehow account for the > > > version changing. Seems messy but maybe someone has an idea? > > > I think it's important to keep in mind that dictionaries are not > versioned > > in Python. They happen to be versioned in CPython as an unexposed > > implementation detail. I don't think that such details should have any > > bearing on potential changes to Python. > > AFAIK the use of the hash member for equality checking is an > implementation detail too, although the language reference does > mention that set, frozenset and dict are "hashed collections". The > basic requirements on hashes are that (1) objects that compare equal > must hash to the same value, and (2) the hash bucket must not change > over an object's lifetime (this is the "messy" aspect that probably > kills the idea -- you'd need to fix up all hashed collections that contain the object as a key). > Are you saying that __hash__ is called by __eq__? -------------- next part -------------- An HTML attachment was scrubbed... URL: From songofacandy at gmail.com Fri Jan 6 02:21:21 2017 From: songofacandy at gmail.com (INADA Naoki) Date: Fri, 6 Jan 2017 16:21:21 +0900 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: References: Message-ID: LGTM. Some comments: I want UTF-8 mode is enabled by default (opt-out option) even if locale is not POSIX, like `PYTHONLEGACYWINDOWSFSENCODING`. Users depends on locale know what locale is and how to configure it. They can understand difference between locale mode and UTF-8 mode and they can opt-out UTF-8 mode. But many people lives in "UTF-8 everywhere" world, and don't know about locale. `-X utf8` option should be parsed before converting commandline arguments to wchar_t*. How about adding Py_UnixMain(int argc, char** argv) which is available only on Unix? I dislike wchar_t type and mbstowcs functions on Unix. (I love wchar_t on Windows, off course). I hope we can remove `wchar_t *wstr` from PyASCIIObject and deprecate all wchar_t APIs on Unix in the future. On Fri, Jan 6, 2017 at 10:43 AM, Victor Stinner wrote: > Ok, I modified my PEP: the POSIX locale now enables the UTF-8 mode. > > 2017-01-05 18:10 GMT+01:00 Victor Stinner : >> A common request is that "Python just works" without having to pass a >> command line option or set an environment variable. Maybe the default >> behaviour should be left unchanged, but the behaviour with the POSIX >> locale should change. > > http://bugs.python.org/issue28180 asks to "change the default" to get > a Python which "just works" without any kind of configuration, in the > context of a Docker image (I don't any detail about the image yet). > > >> Maybe we can enable the UTF-8 mode (or "UNIX mode") of the PEP 540 >> when the POSIX locale is used? > > I read again other issues and I confirm that users are looking for a > Python 3 which behaves like Python 2: simply don't bother them with > encodings. I see the UTF-8 mode as an opportunity to answer to this > request. > > Moreover, the most common cause of encoding issues is a program run > with no locale variable set and so using the POSIX locale. > > So I modified my PEP 540: the POSIX locale now enables the UTF-8 mode. > I had to update the "Backward Compatibility" section since the PEP now > introduces a backward incompatible change (POSIX locale), but my bet > is that the new behaviour is the one expected by users and that it > cannot break applications. > > I moved my initial proposition as an alternative. > > I added a "Use Cases" section to explain in depth the "always work" > behaviour, which I called the "UNIX mode" in my previous email. > > Latest version of the PEP: > https://github.com/python/peps/blob/master/pep-0540.txt > > https://www.python.org/dev/peps/pep-0540/ will be updated shortly. > > Victor > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From p.f.moore at gmail.com Fri Jan 6 03:59:39 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 6 Jan 2017 08:59:39 +0000 Subject: [Python-ideas] incremental hashing in __hash__ In-Reply-To: References: <9c43ef05-12e1-b590-7040-efd2238b716e@egenix.com> <22638.64663.95883.494736@turnbull.sk.tsukuba.ac.jp> <22639.16922.294573.514463@turnbull.sk.tsukuba.ac.jp> Message-ID: On 6 January 2017 at 07:26, Neil Girdhar wrote: > On Fri, Jan 6, 2017 at 2:07 AM Stephen J. Turnbull > wrote: >> >> Neil Girdhar writes: >> >> > I don't understand this? How is providing a default method in an >> > abstract base class a pessimization? If it happens to be slower >> > than the code in the current methods, it can still be overridden. >> >> How often will people override until it's bitten them? How many >> people will not even notice until they've lost business due to slow >> response? If you don't have a default, that's much more obvious. >> Note that if there is a default, the collections are "large", and >> equality comparisons are "rare", this could be a substantial overhead. > > > I still don't understand what you're talking about here. You're saying that > we shouldn't provide a __hash__ in case the default hash happens to be > slower than what the user wants and so by not providing it, we force the > user to write a fast one? Doesn't that argument apply to all methods > provided by abcs? The point here is that ABCs should provide defaults for methods where there is an *obvious* default. It's not at all clear that there's an obvious default for __hash__. Unless I missed a revision of your proposal, what you suggested was: > A better option is to add collections.abc.ImmutableIterable that derives from Iterable and provides __hash__. So what classes would derive from ImmutableIterable? Frozenset? A user-defined frozendict? There's no "obvious" default that would work for both those cases. And that's before we even get to the question of whether the default has the right performance characteristics, which is highly application-dependent. It's not clear to me if you expect ImmutableIterable to provide anything other than a default implementation of hash. If not, then how is making it an ABC any better than simply providing a helper function that people can use in their own __hash__ implementation? That would make it explicit what people are doing, and avoid any tendency towards people thinking they "should" inherit from ImmutableIterable and yet needing to override the only method it provides. Paul From mistersheik at gmail.com Fri Jan 6 04:02:23 2017 From: mistersheik at gmail.com (Neil Girdhar) Date: Fri, 06 Jan 2017 09:02:23 +0000 Subject: [Python-ideas] incremental hashing in __hash__ In-Reply-To: References: <9c43ef05-12e1-b590-7040-efd2238b716e@egenix.com> <22638.64663.95883.494736@turnbull.sk.tsukuba.ac.jp> <22639.16922.294573.514463@turnbull.sk.tsukuba.ac.jp> Message-ID: On Fri, Jan 6, 2017 at 3:59 AM Paul Moore wrote: > On 6 January 2017 at 07:26, Neil Girdhar wrote: > > On Fri, Jan 6, 2017 at 2:07 AM Stephen J. Turnbull > > wrote: > >> > >> Neil Girdhar writes: > >> > >> > I don't understand this? How is providing a default method in an > >> > abstract base class a pessimization? If it happens to be slower > >> > than the code in the current methods, it can still be overridden. > >> > >> How often will people override until it's bitten them? How many > >> people will not even notice until they've lost business due to slow > >> response? If you don't have a default, that's much more obvious. > >> Note that if there is a default, the collections are "large", and > >> equality comparisons are "rare", this could be a substantial overhead. > > > > > > I still don't understand what you're talking about here. You're saying > that > > we shouldn't provide a __hash__ in case the default hash happens to be > > slower than what the user wants and so by not providing it, we force the > > user to write a fast one? Doesn't that argument apply to all methods > > provided by abcs? > > The point here is that ABCs should provide defaults for methods where > there is an *obvious* default. It's not at all clear that there's an > obvious default for __hash__. > > Unless I missed a revision of your proposal, what you suggested was: > > Yeah, looks like you missed a revision. There were two emails. I suggested adding ImmutableIterable and ImmutableSet, and so there is an obvious implementation of __hash__ for both. > > A better option is to add collections.abc.ImmutableIterable that derives > from Iterable and provides __hash__. > > So what classes would derive from ImmutableIterable? Frozenset? A > user-defined frozendict? There's no "obvious" default that would work > for both those cases. And that's before we even get to the question of > whether the default has the right performance characteristics, which > is highly application-dependent. > > It's not clear to me if you expect ImmutableIterable to provide > anything other than a default implementation of hash. If not, then how > is making it an ABC any better than simply providing a helper function > that people can use in their own __hash__ implementation? That would > make it explicit what people are doing, and avoid any tendency towards > people thinking they "should" inherit from ImmutableIterable and yet > needing to override the only method it provides. > > Paul > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Fri Jan 6 04:50:11 2017 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 6 Jan 2017 10:50:11 +0100 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: References: <20170105171638.GA4217@phdru.name> <22638.64668.809353.547149@turnbull.sk.tsukuba.ac.jp> Message-ID: On 06.01.2017 04:32, Nick Coghlan wrote: > On 6 January 2017 at 12:37, Victor Stinner wrote: >> 2017-01-06 3:10 GMT+01:00 Stephen J. Turnbull >> : >>> I've quoted Victor out of context, and his other posts make me very >>> doubtful that he considers this a serious alternative. That said, I'm >>> +1 on "don't!" >> >> The "always ignore locale and force UTF-8" option has supporters. For >> example, Nick Coghlan wrote a whole PEP, PEP 538, to support this. > > Err, no, that's not what PEP 538 does. PEP 538 doesn't do *anything* > if a locale is already properly configured - it only changes the > locale if the current locale is "C". Victor: I think you are taking the UTF-8 idea a bit too far. Nick was trying to address the situation where the locale is set to "C", or rather not set at all (in which case the lib C defaults to the "C" locale). The latter is a fairly standard situation when piping data on Unix or when spawning processes which don't inherit the current OS environment. The problem with the "C" locale is that the encoding defaults to "ASCII" and thus does not allow Python to show its built-in Unicode support. Nick's PEP and the discussion on the ticket http://bugs.python.org/issue28180 are trying to address this particular situation, not enforce any particular encoding overriding the user's configured environment. So I think it would be better if you'd focus your PEP on the same situation: locale set to "C" or not set at all. BTW: You mention a locale "POSIX" in a few places. I have never seen this used in practice and wonder why we should even consider this in Python as possible work-around for a particular set of features. The locale setting in your environment does have a lot of influence on your user experience, so forcing people to set a "POSIX" locale doesn't sound like a good idea - if they have to go through the trouble of correctly setting up their environment for Python to correctly run, they would much more likely use the correct setting rather than a generic one like "POSIX", which is defined as alias for the "C" locale and not as a separate locale: http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html > It's actually very similar to your PEP, except that instead of adding > the ability to make CPython ignore the C level locale settings (which > I think is a bad idea based on your own previous work in that area and > on the way that CPython interacts with other C/C++ components in the > same process and in subprocesses), it just *changes* those settings > when we're pretty sure they're wrong. ... and this is taking the original intent of the ticket a little too far as well :-) The original request was to have the FS encoding default to UTF-8, in case the locale is not set or set to "C", with the reasoning being that this makes it easier to use Python in situations where you have exactly this situations (see above). Your PEP takes this approach further by fixing the locale setting to "C.UTF-8" in those two cases - intentionally, with all the implications this has on other parts of the C lib. The latter only has an effect on the C lib, if the "C.UTF-8" locale is available on the system, which it isn't on many systems, since C locales have to be explicitly generated. Without the "C.UTF-8" locale available, your PEP only affects the FS encoding, AFAICT, unless other parts of the application try to interpret the locale env settings as well and use their own logic for the interpretation. For the purpose of experimentation, I would find it better to start with just fixing the FS encoding in 3.7 and leaving the option to adjust the locale setting turned off per default. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jan 06 2017) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From p.f.moore at gmail.com Fri Jan 6 04:57:04 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 6 Jan 2017 09:57:04 +0000 Subject: [Python-ideas] incremental hashing in __hash__ In-Reply-To: References: <9c43ef05-12e1-b590-7040-efd2238b716e@egenix.com> <22638.64663.95883.494736@turnbull.sk.tsukuba.ac.jp> <22639.16922.294573.514463@turnbull.sk.tsukuba.ac.jp> Message-ID: On 6 January 2017 at 09:02, Neil Girdhar wrote: > > Yeah, looks like you missed a revision. There were two emails. I suggested > adding ImmutableIterable and ImmutableSet, and so there is an obvious > implementation of __hash__ for both. OK, sorry. The proposal is still getting more complicated, though, and I really don't see how it's better than having some low-level helper functions for people who need to build custom __hash__ implementations. The "one obvious way" to customise hashing is to implement __hash__, not to derive from a base class that says my class is an Immutable{Iterable,Set}. Paul From victor.stinner at gmail.com Fri Jan 6 06:24:49 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 6 Jan 2017 12:24:49 +0100 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: References: Message-ID: 2017-01-06 8:21 GMT+01:00 INADA Naoki : > I want UTF-8 mode is enabled by default (opt-out option) even if > locale is not POSIX, > like `PYTHONLEGACYWINDOWSFSENCODING`. You do, I don't :-) It shouldn't be hard to find very concrete issues from the mojibake issues described at: https://www.python.org/dev/peps/pep-0540/#expected-mojibake-issues IMHO there are 3 steps before being able to reach your dream: 1) add opt-in support for UTF-8 2) use UTF-8 if the locale is POSIX 3) UTF-8 is enabled by default I would prefer to begin with a first Python release at stage (1) or (2), wait for user complains, and later decide if we can move to (3). Right now, I didn't implement the PEP 540, so I wasn't able to experiment anything in practice yet. Well, at least it means that I have to elaborate the "Always use UTF-8" alternative of my PEP to explain why I consider that we are not ready to switch directly to his "obvious" option. > Users depends on locale know what locale is and how to configure it. It's not a matter of users, but a matter of code in the wild which uses directly C functions like mbstowcs() or wsctombs(). These functions use the current locale encoding, they are not aware of the new Python UTF-8 mode. > But many people lives in "UTF-8 everywhere" world, and don't know about locale. The PEP 540 was written to help users for very concrete cases. I'm repeating since Python 3.0 that users must learn how to configure their locale. Well, 8 years later, I keep getting exactly the same user complains: "Python doesn't work, it must just work!". It's really hard to decode bytes and later encode the text and prevenet any kind of encoding error. That's why no solution was proposed before. > `-X utf8` option should be parsed before converting commandline (...) Yeah, that's a though technical issue. I'm not sure right know how to implement this with a clean design. Maybe I will just try with a hack? :-) Victor From victor.stinner at gmail.com Fri Jan 6 06:52:51 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 6 Jan 2017 12:52:51 +0100 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: References: <20170105171638.GA4217@phdru.name> <22638.64668.809353.547149@turnbull.sk.tsukuba.ac.jp> Message-ID: 2017-01-06 10:50 GMT+01:00 M.-A. Lemburg : > Victor: I think you are taking the UTF-8 idea a bit too far. Hum, sorry, the PEP is still a draft, the rationale is far from perfect yet. Let me try to simplify the issue: users are unable to configure a locale for various reasons and expect that Python 3 must "just works", so never fail on encoding or decoding. Do you mean that you must try to fix this issue? Or that my approach is not the good one? > Nick was trying to address the situation where the locale is > set to "C", or rather not set at all (in which case the lib C > defaults to the "C" locale). The latter is a fairly standard > situation when piping data on Unix or when spawning processes > which don't inherit the current OS environment. In the second version of my PEP, Python 3.7 will basically "just work" with the POSIX locale (or C locale if you prefer). This locale enables the UTF-8 mode which forces UTF-8/surrogatescape, and this error handler prevents the most common encode/decode error (but not all of them!). When I read the different issues on the bug tracker, I understood that people have different opinions because they have different use cases and so different expectations. I tried to describe a few use cases to help to understand why we don't have the expectations: https://www.python.org/dev/peps/pep-0540/#replace-a-word-in-a-text I guess that "piping data on Unix" is represented by my "Replace a word in a text" example, right? It implements the "sed -e s/apple/orange/g" command using Python 3. Classical usage: cat input_file | sed -e s/apple/orange/g > output "UNIX users" don't want Unicode errors here. > The problem with the "C" locale is that the encoding defaults to > "ASCII" and thus does not allow Python to show its built-in > Unicode support. I don't think that it's the main annoying issues for users. User complain because basic functions like (1) "List a directory into stdout" or (2) "List a directory into a text file" fail badly: (1) https://www.python.org/dev/peps/pep-0540/#list-a-directory-into-stdout (2) https://www.python.org/dev/peps/pep-0540/#list-a-directory-into-a-text-file They don't really care of powerful Unicode features, but are bitten early just on writing data back to the disk, into a pipe, or something else. Python 3.6 tries to be nice with users when *getting* data, and it is very pedantic when you try to put the data somewhere. The only exception is that stdout now uses the surrogateescape error handler, but only with the POSIX locale. > Nick's PEP and the discussion on the ticket > http://bugs.python.org/issue28180 are trying to address this > particular situation, not enforce any particular encoding > overriding the user's configured environment. > > So I think it would be better if you'd focus your PEP on the > same situation: locale set to "C" or not set at all. I'm not sure that I understood: do you suggest to only modify the behaviour when the POSIX locale is used, but don't add any option to ignore the locale and force UTF-8? At least, I would like to get a UTF-8/strict mode which would require an option to enable it. About -X utf8, the idea is to write explicitly that you are sure that all inputs are encoded to UTF-8 and that you request to encode outputs to UTF-8. I guess that you are concerned by locales using encodings other than ASCII or UTF-8 like Latin1, ShiftJIS or something else? > BTW: You mention a locale "POSIX" in a few places. I have > never seen this used in practice and wonder why we should > even consider this in Python as possible work-around for > a particular set of features. The locale setting in your > environment does have a lot of influence on your user > experience, so forcing people to set a "POSIX" locale doesn't > sound like a good idea - if they have to go through the > trouble of correctly setting up their environment for Python > to correctly run, they would much more likely use the correct > setting rather than a generic one like "POSIX", which is > defined as alias for the "C" locale and not as a separate > locale: (...) Hum, the POSIX locale is the "C" locale in my PEP. I don't request users to force the POSIX locale. I propose to make Python nicer than users already *get* the POSIX locale for various reasons: * OS not correctly configured * SSH connection failing to set the locale * user using LANG=C to get messages in english * LANG=C used for a bad reason * program run in an empty environment * user locale set to a non-existent locale => the libc falls back on POSIX * etc. "LANG=C": "LC_ALL=C" is more correct, but it seems like LANG=C is more common than LC_ALL=C or LC_CTYPE=C in the wild. >> It's actually very similar to your PEP, except that instead of adding >> the ability to make CPython ignore the C level locale settings (which >> I think is a bad idea based on your own previous work in that area and >> on the way that CPython interacts with other C/C++ components in the >> same process and in subprocesses), it just *changes* those settings >> when we're pretty sure they're wrong. > > ... and this is taking the original intent of the ticket > a little too far as well :-) By ticket, do you mean a Python issue? By the way, I'm aware of these two issues: http://bugs.python.org/issue19846 http://bugs.python.org/issue28180 I'm sure that other issues were opened to request something similiar, but they got probably less feedback, and I was to lazy yet to find them all. > Without the "C.UTF-8" locale available, your PEP [538] only affects > the FS encoding, AFAICT, unless other parts of the application > try to interpret the locale env settings as well and use their > own logic for the interpretation. I decided to write the PEP 540 because only few operating systems provide C.UTF-8 or C.utf8. I'm trying to find a solution working on all UNIX and BSD systems. Maybe I'm wrong, and my approach (ignore the locale, rather than really "fixing" the locale) is plain wrong. Again, it's a very hard issue, I don't think that any perfect solution exists. Otherwise, we would already have fixed this issue 8 years ago! It's a matter of compromises and finding a practical design which works for most users. > For the purpose of experimentation, I would find it better > to start with just fixing the FS encoding in 3.7 and > leaving the option to adjust the locale setting turned off > per default. Sorry, what do you mean by "fixing the FS encoding"? I understand that it's basically my PEP 540 without -X utf8 and PYTHONUTF8, only with the UTF-8 mode enabled for the POSIX locale? By the way, Nick's PEP 538 doesn't mention surrogateescape. IMHO if we override or ignore the locale, it's safer to use surrogateescape. The Use Cases of my PEP 540 should help to understand why. Victor From victor.stinner at gmail.com Fri Jan 6 07:01:21 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 6 Jan 2017 13:01:21 +0100 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: References: <20170105171638.GA4217@phdru.name> <20170106044922.GM3887@ando.pearwood.info> Message-ID: 2017-01-06 7:22 GMT+01:00 Stephan Houben : > How common is this problem? Last 2 or 3 years, I don't recall having be bitten by such issue. On the bug tracker, new issues are opened infrequently. * http://bugs.python.org/issue19977 opened at 2013-12-13, closed at 2014-04-27 * http://bugs.python.org/issue19846 opened at 2013-11-30, closed as NOTABUG at 2015-05-17 22, but got new comments after it was closed * http://bugs.python.org/issue19847 opened at 2013-11-30, closed as NOTABUG at 2013-12-13 * http://bugs.python.org/issue28180 opened at 2016-09-16, still open Again, I don't think that this list is complete, I recall other similar issues. > I realise there is some attractiveness in solving the issue "for Python", > since that will reduce the amount of bug reports > and get people off the chests of the maintainers, but to get this fixed in > the wider Linux ecosystem it might be preferable to > "Let them eat mojibake", to paraphrase what Marie-Antoinette never said. What do you mean by "eating mojibake"? Users complain because their application is stopped by a Python exception. Currently, most Python 3 applications doesn't produce or display mojibake, since Python is strict on outputs. (One exception: stdout with the POSIX locale since Python 3.5). Victor From stephanh42 at gmail.com Fri Jan 6 09:28:58 2017 From: stephanh42 at gmail.com (Stephan Houben) Date: Fri, 6 Jan 2017 15:28:58 +0100 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: References: <20170105171638.GA4217@phdru.name> <20170106044922.GM3887@ando.pearwood.info> Message-ID: Hi Victor, 2017-01-06 13:01 GMT+01:00 Victor Stinner : > > What do you mean by "eating mojibake"? OK, I erroneously understood that the failure mode was that mojibake was produced. > Users complain because their > application is stopped by a Python exception. Got it. > Currently, most Python 3 > applications doesn't produce or display mojibake, since Python is > strict on outputs. (One exception: stdout with the POSIX locale since > Python 3.5). OK, I now tried it myself and indeed it produces the following error: UnicodeEncodeError: 'ascii' codec can't encode character '\xfe' in position 0: ordinal not in range(128) My suggestion would be to make this error message more specific. In particular, if we have LC_TYPE/LANG=C or unset, we could print something like the following information (on Linux only): """ You are attempting to use non-ASCII Unicode characters while your system has been configured (possibly erroneously) to operate in the legacy "C" locale, which is pure ASCII. It is strongly recommended that you configure your system to allow arbitrary non-ASCII Unicode characters This can be done by configuring a UTF-8 locale, for example: export LANG=en_US.UTF-8 Use: locale -a | grep UTF-8 to get a list of all valid UTF-8 locales on your system. """ Stephan -------------- next part -------------- An HTML attachment was scrubbed... URL: From phd at phdru.name Fri Jan 6 14:12:16 2017 From: phd at phdru.name (Oleg Broytman) Date: Fri, 6 Jan 2017 20:12:16 +0100 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: References: <20170105171638.GA4217@phdru.name> Message-ID: <20170106191216.GA6916@phdru.name> On Fri, Jan 06, 2017 at 10:15:52AM +0900, INADA Naoki wrote: > >> Always use UTF-8 > >> ---------------- > >> > >> Python already always use the UTF-8 encoding on Mac OS X, Android and Windows. > >> Since UTF-8 became the defacto encoding, it makes sense to always use it on all > >> platforms with any locale. > > > > Please don't! I use different locales and encodings, sometimes it's > > utf-8, sometimes not - but I have properly configured LC_* settings and > > I prefer Python to follow my command. It'd be disgusting if Python > > starts to bend me to its preferences. > > For stdio (including console), PYTHONIOENCODING can be used for > supporting legacy system. > e.g. `export PYTHONIOENCODING=$(locale charmap)` This means one more thing to reconfigure when I switch locales instead of Python to catches up automatically. > For commandline argument and filepath, UTF-8/surrogateescape can round trip. > But mojibake may happens when pass the path to GUI. > > If we chose "Always use UTF-8 for fs encoding", I think > PYTHONFSENCODING envvar should be > added again. (It should be used from startup: decoding command line argument). > > > > >> The risk is to introduce mojibake if the locale uses a different encoding, > >> especially for locales other than the POSIX locale. > > > > There is no such risk for me as I already have mojibake in my > > systems. Two most notable sources of mojibake are: > > > > 1) FTP servers - people create files (both names and content) in > > different encodings; w32 FTP clients usually send file names and > > content in cp1251 (Russian Windows encoding), sometimes in cp866 > > (Russian Windows OEM encoding). > > > > 2) MP3 tags and play lists - almost always cp1251. > > > > So whatever my personal encoding is - koi8-r or utf-8 - I have to > > deal with file names and content in different encodings. > > 3) unzip zip file sent by Windows. Windows user use no-ASCII filenames, and > create legacy (no UTF-8) zip file very often. Good example, thank you! I forgot about it because I have wrote my own zip.py and unzip.py that encode/decode filenames. > I think people using non UTF-8 should solve encoding issue by themselves. > People should use ASCII or UTF-8 always if they don't want to see mojibake. Impossible. Even if I'd always use UTF-8 I still will receive a lot of cp1251/cp866. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From barry at python.org Fri Jan 6 16:20:26 2017 From: barry at python.org (Barry Warsaw) Date: Fri, 6 Jan 2017 16:20:26 -0500 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode References: <20170105171638.GA4217@phdru.name> <20170106044922.GM3887@ando.pearwood.info> Message-ID: <20170106162026.7edc10a0@subdivisions.wooz.org> On Jan 06, 2017, at 07:22 AM, Stephan Houben wrote: >Because I have the impression that nowadays all Linux distributions are UTF-8 >by default and you have to show some bloody-mindedness to end up with a POSIX >locale. It can still happen in some corner cases, even on Debian and Ubuntu where C.UTF-8 is available and e.g. my desktop defaults to en_US.UTF-8. For example, in an sbuild/schroot environment[*], the default locale is C and I've seen package build failures because of this. There may be other such "corner case" environments where this happens too. Cheers, -Barry [*] Where sbuild/schroot is a very common suite of package building tools. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 801 bytes Desc: OpenPGP digital signature URL: From barry at python.org Fri Jan 6 16:24:32 2017 From: barry at python.org (Barry Warsaw) Date: Fri, 6 Jan 2017 16:24:32 -0500 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode References: Message-ID: <20170106162432.5e6968c9@subdivisions.wooz.org> On Jan 05, 2017, at 05:50 PM, Victor Stinner wrote: >I guess that all users and most developers are more in the "UNIX mode" >camp. *If* we want to change the default, I suggest to use the "UNIX >mode" by default. FWIW, it seems to be a general and widespread recommendation to always pass universal_newlines=True to Popen and friends when you only want to deal with unicode from subprocesses: If encoding or errors are specified, or universal_newlines is true, the file objects stdin, stdout and stderr will be opened in text mode using the encoding and errors specified in the call or the defaults for io.TextIOWrapper. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 801 bytes Desc: OpenPGP digital signature URL: From rosuav at gmail.com Fri Jan 6 17:10:50 2017 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 7 Jan 2017 09:10:50 +1100 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: <20170106162026.7edc10a0@subdivisions.wooz.org> References: <20170105171638.GA4217@phdru.name> <20170106044922.GM3887@ando.pearwood.info> <20170106162026.7edc10a0@subdivisions.wooz.org> Message-ID: On Sat, Jan 7, 2017 at 8:20 AM, Barry Warsaw wrote: > On Jan 06, 2017, at 07:22 AM, Stephan Houben wrote: > >>Because I have the impression that nowadays all Linux distributions are UTF-8 >>by default and you have to show some bloody-mindedness to end up with a POSIX >>locale. > > It can still happen in some corner cases, even on Debian and Ubuntu where > C.UTF-8 is available and e.g. my desktop defaults to en_US.UTF-8. For > example, in an sbuild/schroot environment[*], the default locale is C and I've > seen package build failures because of this. There may be other such "corner > case" environments where this happens too. A lot of background jobs get run in a purged environment, too. I don't remember exactly which ones land in the C locale and which don't, but check cron jobs, systemd background processes, inetd, etc, etc, etc. Having Python DTRT in those situations would be a Good Thing. ChrisA From victor.stinner at gmail.com Fri Jan 6 17:33:16 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 6 Jan 2017 23:33:16 +0100 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: <20170106162026.7edc10a0@subdivisions.wooz.org> References: <20170105171638.GA4217@phdru.name> <20170106044922.GM3887@ando.pearwood.info> <20170106162026.7edc10a0@subdivisions.wooz.org> Message-ID: 2017-01-06 22:20 GMT+01:00 Barry Warsaw : >>Because I have the impression that nowadays all Linux distributions are UTF-8 >>by default and you have to show some bloody-mindedness to end up with a POSIX >>locale. > > It can still happen in some corner cases, even on Debian and Ubuntu where > C.UTF-8 is available and e.g. my desktop defaults to en_US.UTF-8. For > example, in an sbuild/schroot environment[*], the default locale is C and I've > seen package build failures because of this. There may be other such "corner > case" environments where this happens too. Right, that's the whole point of the Nick's PEP 538 and my PEP 540: it's still common to get the POSIX locale. I began to list examples of practical use cases where you get the POSIX locale. https://www.python.org/dev/peps/pep-0540/#posix-locale-used-by-mistake I'm not sure about the title of the section: "POSIX locale used by mistake". Barry: About chroot, why do you get a C locale? Is it because no locale is explicitly configured? Or because no locale is installed in the chroot? Would it work if we had a tool to copy the locale from the host when creating the chroot: env vars and the data files required by the locale (if any)? The chroot issue seems close to the reported chroot issue: http://bugs.python.org/issue28180 I understand that it's more a configuration issue, than a deliberate choice to use the POSIX locale. Again, the user requirement is that Python 3 should just work without any kind of specific configuration, as other classic UNIX tools. Victor From barry at python.org Fri Jan 6 19:06:23 2017 From: barry at python.org (Barry Warsaw) Date: Fri, 6 Jan 2017 19:06:23 -0500 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: References: <20170105171638.GA4217@phdru.name> <20170106044922.GM3887@ando.pearwood.info> <20170106162026.7edc10a0@subdivisions.wooz.org> Message-ID: <20170106190623.2cb3ef60@subdivisions.wooz.org> On Jan 06, 2017, at 11:33 PM, Victor Stinner wrote: >Barry: About chroot, why do you get a C locale? Is it because no >locale is explicitly configured? Or because no locale is installed in >the chroot? For some reason it's not configured: % schroot -u root -c sid-amd64 (sid-amd64)# locale LANG= LANGUAGE= LC_CTYPE="POSIX" LC_NUMERIC="POSIX" LC_TIME="POSIX" LC_COLLATE="POSIX" LC_MONETARY="POSIX" LC_MESSAGES="POSIX" LC_PAPER="POSIX" LC_NAME="POSIX" LC_ADDRESS="POSIX" LC_TELEPHONE="POSIX" LC_MEASUREMENT="POSIX" LC_IDENTIFICATION="POSIX" LC_ALL= (sid-amd64)# export LC_ALL=C.UTF-8 (sid-amd64)# locale LANG= LANGUAGE= LC_CTYPE="C.UTF-8" LC_NUMERIC="C.UTF-8" LC_TIME="C.UTF-8" LC_COLLATE="C.UTF-8" LC_MONETARY="C.UTF-8" LC_MESSAGES="C.UTF-8" LC_PAPER="C.UTF-8" LC_NAME="C.UTF-8" LC_ADDRESS="C.UTF-8" LC_TELEPHONE="C.UTF-8" LC_MEASUREMENT="C.UTF-8" LC_IDENTIFICATION="C.UTF-8" LC_ALL=C.UTF-8 I'm not sure why that's the default inside a chroot. I thought there was a bug or discussion about this, but I can't find it right now. Generally when this happens, exporting this environment variable in your debian/rules file is the way to work around the default. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 801 bytes Desc: OpenPGP digital signature URL: From victor.stinner at gmail.com Fri Jan 6 22:01:57 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Sat, 7 Jan 2017 04:01:57 +0100 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: <20170106190623.2cb3ef60@subdivisions.wooz.org> References: <20170105171638.GA4217@phdru.name> <20170106044922.GM3887@ando.pearwood.info> <20170106162026.7edc10a0@subdivisions.wooz.org> <20170106190623.2cb3ef60@subdivisions.wooz.org> Message-ID: 2017-01-07 1:06 GMT+01:00 Barry Warsaw : > For some reason it's not configured: (...) Ok, thanks for the information. > I'm not sure why that's the default inside a chroot. I found at least one good reason to use the POSIX locale to build a package: it helps to get reproductible builds, see: https://reproducible-builds.org/docs/locales/ I used it as an example in my new rationale: https://www.python.org/dev/peps/pep-0540/#it-s-not-a-bug-you-must-fix-your-locale-is-not-an-acceptable-answer I tried to explain how using LANG=C can be a smart choice in some cases, and so that Python 3 should do its best to not annoy the user with Unicode errors. I also started to list cases where you get the POSIX locale "by mistake". As I wrote previously, I'm not sure that it's correct to add "by mistake". https://www.python.org/dev/peps/pep-0540/#posix-locale-used-by-mistake By the way, I tried to force the POSIX locale in my benchmarking "perf" module. The idea is to get more reproductible results between heterogeneous computers. But I got a bug report. So I decided to copy the locale by default and add an opt-in --no-locale option to ignore the locale (force the POSIX locale). https://github.com/haypo/perf/issues/15 Victor From steve.dower at python.org Sat Jan 7 02:08:27 2017 From: steve.dower at python.org (Steve Dower) Date: Fri, 6 Jan 2017 23:08:27 -0800 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: <20170106162432.5e6968c9@subdivisions.wooz.org> References: <20170106162432.5e6968c9@subdivisions.wooz.org> Message-ID: Passing universal_newlines will use whatever locale.getdefaultencoding() returns (which at least on Windows is useless enough that I added the encoding and errors parameters in 3.6). So it sounds like it'll only actually do Unicode on Linux if enough of the planets have aligned, which is what Victor is trying to do, but you can't force the other process to use a particular encoding. universal_newlines may become a bad choice if the default encoding no longer matches what the environment says, and personally, I wouldn't lose much sleep over that. (As an aside, when I was doing all the Unicode changes for Windows in 3.6, I eventually decided that changing locale.getdefaultencoding() was too big a breaking change to ever be a good idea. Perhaps that will be the same result here too, but I'm nowhere near familiar enough with the conventions at play to state that with any certainty.) Cheers, Steve Top-posted from my Windows Phone -----Original Message----- From: "Barry Warsaw" Sent: ?1/?6/?2017 14:04 To: "python-ideas at python.org" Subject: Re: [Python-ideas] PEP 540: Add a new UTF-8 mode On Jan 05, 2017, at 05:50 PM, Victor Stinner wrote: >I guess that all users and most developers are more in the "UNIX mode" >camp. *If* we want to change the default, I suggest to use the "UNIX >mode" by default. FWIW, it seems to be a general and widespread recommendation to always pass universal_newlines=True to Popen and friends when you only want to deal with unicode from subprocesses: If encoding or errors are specified, or universal_newlines is true, the file objects stdin, stdout and stderr will be opened in text mode using the encoding and errors specified in the call or the defaults for io.TextIOWrapper. Cheers, -Barry -------------- next part -------------- An HTML attachment was scrubbed... URL: From ma3yuki.8mamo10 at gmail.com Sat Jan 7 03:37:09 2017 From: ma3yuki.8mamo10 at gmail.com (Masayuki YAMAMOTO) Date: Sat, 7 Jan 2017 17:37:09 +0900 Subject: [Python-ideas] New PyThread_tss_ C-API for CPython In-Reply-To: References: Message-ID: 2016-12-31 16:42 GMT+09:00 Nick Coghlan : > On 31 December 2016 at 08:24, Masayuki YAMAMOTO > wrote: > >> I have read the discussion and I'm sure that use structure as Py_tss_t >> instead of platform-specific data type. Just as Steve said that Py_tss_t >> should be genuinely treated as an opaque type, the key state checking >> should provide macros or inline functions with name like >> PyThread_tss_is_created. Well, I'd resolve the specification a bit more :) >> >> If PyThread_tss_create is called with the created key, it is no-op but >> which the function should succeed or fail? In my opinion, It is better to >> return a failure because it is a high possibility that the code is >> incorrect for multiple callings of PyThread_tss_create for One key. >> > > That's not what we currently do for the EnsureGIL autoTLS key and the > tracemalloc key though - the reentrant key creation is part of > "create-if-needed" flows where the key creation is silently skipped if the > key already exists. > > Changing that would require some further research into how we ended up > with the current approach, while carrying it over into the new API design > would be the default option. > Yes, as you pointed out, my suggestion changes API semantics and not inherit "create-if-needed". I confirmed again codes...current approach has enough to work and I've not found strong benefit to change the semantics. So I agree with you and withdraw my suggestion. Well, I'm going to update patch based on the result. Best regards, Masayuki -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Jan 7 06:50:48 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 7 Jan 2017 21:50:48 +1000 Subject: [Python-ideas] PEP 538: Coercing the legacy C locale to C.UTF-8 Message-ID: Hi folks, Many of you would have seen Victor's recent PEP proposing the introduction a new "UTF-8" mode that told Python to use UTF-8 by default in the legacy C locale (similar to the way CPython behaves on Mac OS X, Android and iOS), as well as allowing explicit selection of that mode regardless of the current locale settings. That was prompted by my proposal in PEP 538 to start coercing the legacy C locale to C.UTF-8 (when we have the ability and opportunity to do so), and otherwise at least warn that we don't expect the legacy C locale to work properly. That PEP has now been through its initial round of review on the Python Linux SIG, and updated to address both the feedback received there, as well as some of the points Victor raised in PEP 540. The rendered version is available at https://www.python.org/dev/peps/pep-0538/ and the plain text version is included inline below. Folks that have already read PEP 540 may want to start with the new section that looks at the way the two PEPs are potentially complementary to each other rather than competitive: https://www.python.org/dev/peps/pep-0538/#relationship-with-other-peps In particular, the approach in PEP 540 may be a better last resort alternative than setting "LC_CTYPE=en_US.UTF-8" on platforms that don't provide either C.UTF-8 or C.utf8 (which is what the current draft of PEP 538 proposes) Cheers, Nick. =============================== PEP: 538 Title: Coercing the legacy C locale to C.UTF-8 Version: $Revision$ Last-Modified: $Date$ Author: Nick Coghlan Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 28-Dec-2016 Python-Version: 3.7 Abstract ======== An ongoing challenge with Python 3 on \*nix systems is the conflict between needing to use the configured locale encoding by default for consistency with other C/C++ components in the same process and those invoked in subprocesses, and the fact that the standard C locale (as defined in POSIX:2001) specifies a default text encoding of ASCII, which is entirely inadequate for the development of networked services and client applications in a multilingual world. This PEP proposes that the way the CPython implementation handles the default C locale be changed such that: * the standalone CPython binary will automatically attempt to coerce the ``C`` locale to ``C.UTF-8`` (preferred), ``C.utf8`` or ``en_US.UTF-8`` unless the new ``PYTHONCOERCECLOCALE`` environment variable is set to ``0`` * if the subsequent runtime initialization process detects that the legacy ``C`` locale remains active (e.g. locale coercion is disabled, or the runtime is embedded in an application other than the main CPython binary), it will emit a warning on stderr that use of the legacy ``C`` locale's default ASCII text encoding may cause various Unicode compatibility issues Explicitly configuring the ``C.UTF-8`` or ``en_US.UTF-8`` locales has already been used successfully for a number of years (including by the PEP author) to get Python 3 running reliably in environments where no locale is otherwise configured (such as Docker containers). With this change, any \*nix platform that does *not* offer at least one of the ``C.UTF-8``, ``C.utf8`` or ``en_US.UTF-8`` locales as part of its standard configuration would only be considered a fully supported platform for CPython 3.7+ deployments when a locale other than the default ``C`` locale is configured explicitly. Redistributors (such as Linux distributions) with a narrower target audience than the upstream CPython development team may also choose to opt in to this behaviour for the Python 3.6.x series by applying the necessary changes as a downstream patch when first introducing Python 3.6.0. Background ========== While the CPython interpreter is starting up, it may need to convert from the ``char *`` format to the ``wchar_t *`` format, or from one of those formats to ``PyUnicodeObject *``, before its own text encoding handling machinery is fully configured. It handles these cases by relying on the operating system to do the conversion and then ensuring that the text encoding name reported by ``sys.getfilesystemencoding()`` matches the encoding used during this early bootstrapping process. On Apple platforms (including both Mac OS X and iOS), this is straightforward, as Apple guarantees that these operations will always use UTF-8 to do the conversion. On Windows, the limitations of the ``mbcs`` format used by default in these conversions proved sufficiently problematic that PEP 528 and PEP 529 were implemented to bypass the operating system supplied interfaces for binary data handling and force the use of UTF-8 instead. On Android, the locale settings are of limited relevance (due to most applications running in the UTF-16-LE based Dalvik environment) and there's limited value in preserving backwards compatibility with other locale aware C/C++ components in the same process (since it's a relatively new target platform for CPython), so CPython bypasses the operating system provided APIs and hardcodes the use of UTF-8 (similar to its behaviour on Apple platforms). On non-Apple and non-Android \*nix systems however, these operations are handled using the C locale system in glibc, which has the following characteristics [4_]: * by default, all processes start in the ``C`` locale, which uses ``ASCII`` for these conversions. This is almost never what anyone doing multilingual text processing actually wants (including CPython and C/C++ GUI frameworks). * calling ``setlocale(LC_ALL, "")`` reconfigures the active locale based on the locale categories configured in the current process environment * if the locale requested by the current environment is unknown, or no specific locale is configured, then the default ``C`` locale will remain active The specific locale category that covers the APIs that CPython depends on is ``LC_CTYPE``, which applies to "classification and conversion of characters, and to multibyte and wide characters" [5_]. Accordingly, CPython includes the following key calls to ``setlocale``: * in the main ``python`` binary, CPython calls ``setlocale(LC_ALL, "")`` to configure the entire C locale subsystem according to the process environment. It does this prior to making any calls into the shared CPython library * in ``Py_Initialize``, CPython calls ``setlocale(LC_CTYPE, "")``, such that the configured locale settings for that category *always* match those set in the environment. It does this unconditionally, and it *doesn't* revert the process state change in ``Py_Finalize`` (This summary of the locale handling omits several technical details related to exactly where and when the text encoding declared as part of the locale settings is used - see PEP 540 for further discussion, as these particular details matter more when decoupling CPython from the declared C locale than they do when overriding the locale with one based on UTF-8) These calls are usually sufficient to provide sensible behaviour, but they can still fail in the following cases: * SSH environment forwarding means that SSH clients will often forward client locale settings to servers that don't have that locale installed. This leads to CPython running in the default ASCII-based C locale * some process environments (such as Linux containers) may not have any explicit locale configured at all. As with unknown locales, this leads to CPython running in the default ASCII-based C locale The simplest way to deal with this problem for currently released versions of CPython is to explicitly set a more sensible locale when launching the application. For example:: LC_ALL=C.UTF-8 LANG=C.UTF-8 python3 ... In the specific case of Docker containers and similar technologies, the appropriate locale setting can be specified directly in the container image definition. Another common failure case is developers specifying ``LANG=C`` in order to see otherwise translated user interface messages in English, rather than the more narrowly scoped ``LC_MESSAGES=C``. Relationship with other PEPs ============================ This PEP shares a common problem statement with PEP 540 (improving Python 3's behaviour in the default C locale), but diverges markedly in the proposed solution: * PEP 540 proposes to entirely decouple CPython's default text encoding from the C locale system in that case, allowing text handling inconsistencies to arise between CPython and other C/C++ components running in the same process and in subprocesses. This approach aims to make CPython behave less like a locale-aware C/C++ application, and more like C/C++ independent language runtimes like the JVM, .NET CLR, Go, Node.js, and Rust * this PEP proposes to instead override the legacy C locale with a more recently defined locale that uses UTF-8 as its default text encoding. This means that the text encoding override will apply not only to CPython, but also to any locale aware extension modules loaded into the current process, as well as to locale aware C/C++ applications invoked in subprocesses that inherit their environment from the parent process. This approach aims to retain CPython's traditional strong support for integration with other components written in C and C++, while actively helping to push forward the adoption and standardisation of the C.UTF-8 locale as a Unicode-aware replacement for the legacy C locale While the two PEPs present alternate proposed behavioural improvements that align with the interests of different parts of the Python user community, they don't actually conflict at a technical level. That means it would be entirely possible to implement both of them, and end up with a situation where redistributors, application integrators, and end users can choose between: * coercing the default ASCII based C locale to a UTF-8 based locale * instructing CPython to ignore the C locale and use UTF-8 instead * doing both of the above (with this option as the default legacy C locale handling) * forcing use of the default ASCII based C locale by setting both PYTHONCOERCECLOCALE=0 and PYTHONUTF8=0 If this approach was taken, then the proposed modifications to PEP 11 would be adjusted to indicate that the only unsupported configurations are those where both the legacy C locale coercion and the C locale text encoding bypass are disabled. Given such a hybrid implementation, it would also be reasonable to drop the ``en_US.UTF-8`` legacy fallback from the list of UTF-8 locales tried as a coercion target and instead rely solely on the C locale text encoding bypass in such cases. Motivation ========== While Linux container technologies like Docker, Kubernetes, and OpenShift are best known for their use in web service development, the related container formats and execution models are also being adopted for Linux command line application development. Technologies like Gnome Flatpak [7_] and Ubunty Snappy [8_] further aim to bring these same techniques to Linux GUI application development. When using Python 3 for application development in these contexts, it isn't uncommon to see text encoding related errors akin to the following:: $ docker run --rm fedora:25 python3 -c 'print("??????")' Unable to decode the command from the command line: UnicodeEncodeError: 'utf-8' codec can't encode character '\udce2' in position 7: surrogates not allowed $ docker run --rm ncoghlan/debian-python python3 -c 'print("??????")' Unable to decode the command from the command line: UnicodeEncodeError: 'utf-8' codec can't encode character '\udce2' in position 7: surrogates not allowed Even though the same command is likely to work fine when run locally:: $ python3 -c 'print("??????")' ?????? The source of the problem can be seen by instead running the ``locale`` command in the three environments:: $ locale | grep -E 'LC_ALL|LC_CTYPE|LANG' LANG=en_AU.UTF-8 LC_CTYPE="en_AU.UTF-8" LC_ALL= $ docker run --rm fedora:25 locale | grep -E 'LC_ALL|LC_CTYPE|LANG' LANG= LC_CTYPE="POSIX" LC_ALL= $ docker run --rm ncoghlan/debian-python locale | grep -E 'LC_ALL|LC_CTYPE|LANG' LANG= LANGUAGE= LC_CTYPE="POSIX" LC_ALL= In this particular example, we can see that the host system locale is set to "en_AU.UTF-8", so CPython uses UTF-8 as the default text encoding. By contrast, the base Docker images for Fedora and Debian don't have any specific locale set, so they use the POSIX locale by default, which is an alias for the ASCII-based default C locale. The simplest way to get Python 3 (regardless of the exact version) to behave sensibly in Fedora and Debian based containers is to run it in the ``C.UTF-8`` locale that both distros provide:: $ docker run --rm -e LANG=C.UTF-8 fedora:25 python3 -c 'print("??????")' ?????? $ docker run --rm -e LANG=C.UTF-8 ncoghlan/debian-python python3 -c 'print("??????")' ?????? $ docker run --rm -e LANG=C.UTF-8 fedora:25 locale | grep -E 'LC_ALL|LC_CTYPE|LANG' LANG=C.UTF-8 LC_CTYPE="C.UTF-8" LC_ALL= $ docker run --rm -e LANG=C.UTF-8 ncoghlan/debian-python locale | grep -E 'LC_ALL|LC_CTYPE|LANG' LANG=C.UTF-8 LANGUAGE= LC_CTYPE="C.UTF-8" LC_ALL= The Alpine Linux based Python images provided by Docker, Inc, already use the C.UTF-8 locale by default:: $ docker run --rm python:3 python3 -c 'print("??????")' ?????? $ docker run --rm python:3 locale | grep -E 'LC_ALL|LC_CTYPE|LANG' LANG=C.UTF-8 LANGUAGE= LC_CTYPE="C.UTF-8" LC_ALL= Similarly, for custom container images (i.e. those adding additional content on top of a base distro image), a more suitable locale can be set in the image definition so everything just works by default. However, it would provide a much nicer and more consistent user experience if CPython were able to just deal with this problem automatically rather than relying on redistributors or end users to handle it through system configuration changes. While the glibc developers are working towards making the C.UTF-8 locale universally available for use by glibc based applications like CPython [6_], this unfortunately doesn't help on platforms that ship older versions of glibc without that feature, and also don't provide C.UTF-8 as an on-disk locale the way Debian and Fedora do. For these platforms, the best widely available fallback option is the ``en_US.UTF-8`` locale, which while still being unfortunately Anglo-centric, is at least significantly less Anglo-centric than the ASCII text encoding assumption in the default C locale. In the specific case of C locale coercion, the Anglo-centrism implied by the use of ``en_US.UTF-8`` can be mitigated by configuring only the ``LC_CTYPE`` locale category, rather than overriding all the locale categories:: $ docker run --rm -e LANG=C.UTF-8 centos/python-35-centos7 python3 -c 'print("??????")' Unable to decode the command from the command line: UnicodeEncodeError: 'utf-8' codec can't encode character '\udce2' in position 7: surrogates not allowed $ docker run --rm -e LC_CTYPE=en_US.UTF-8 centos/python-35-centos7 python3 -c 'print("??????")' ?????? Specification ============= To better handle the cases where CPython would otherwise end up attempting to operate in the ``C`` locale, this PEP proposes that CPython automatically attempt to coerce the legacy ``C`` locale to a UTF-8 based locale when it is run as a standalone command line application. It further proposes to emit a warning on stderr if the legacy ``C`` locale is in effect at the point where the language runtime itself is initialized, in order to warn system and application integrators that they're running CPython in an unsupported configuration. Legacy C locale coercion in the standalone Python interpreter binary -------------------------------------------------------------------- When run as a standalone application, CPython has the opportunity to reconfigure the C locale before any locale dependent operations are executed in the process. This means that it can change the locale settings not only for the CPython runtime, but also for any other C/C++ components running in the current process (e.g. as part of extension modules), as well as in subprocesses that inherit their environment from the current process. After calling ``setlocale(LC_ALL, "")`` to initialize the locale settings in the current process, the main interpreter binary will be updated to include the following call:: const char *ctype_loc = setlocale(LC_CTYPE, NULL); This cryptic invocation is the API that C provides to query the current locale setting without changing it. Given that query, it is possible to check for exactly the ``C`` locale with ``strcmp``:: ctype_loc != NULL && strcmp(ctype_loc, "C") == 0 # true only in the C locale Given this information, CPython can then attempt to coerce the locale to one that uses UTF-8 rather than ASCII as the default encoding. Three such locales will be tried: * ``C.UTF-8`` (available at least in Debian, Ubuntu, and Fedora 25+, and expected to be available by default in a future version of glibc) * ``C.utf8`` (available at least in HP-UX) * ``en_US.UTF-8`` (available at least in RHEL and CentOS) For ``C.UTF-8`` and ``C.utf8``, the coercion will be implemented by actually setting the ``LANG`` and ``LC_ALL`` environment variables to the candidate locale name, such that future calls to ``setlocale()`` will see them, as will other components looking for those settings (such as GUI development frameworks). The last fallback isn't ideal as a coercion target (as it changes more than just the default text encoding), but has the benefit of currently being more widely available than the C.UTF-8 locale. To minimize the chance of side effects, only the ``LC_CTYPE`` environment variable would be set when using this legacy fallback option, with the other locale categories being left alone. Given time, more environments are expected to provide a ``C.UTF-8`` locale by default, so falling all the way back to the ``en_US.UTF-8`` option is expected to become less common. When this locale coercion is activated, the following warning will be printed on stderr, with the warning containing whichever locale was successfully configured:: Python detected LC_CTYPE=C, LC_ALL & LANG set to C.UTF-8 (set PYTHONCOERCECLOCALE=0 to disable this locale coercion behaviour). When falling all the way back to the ``en_US.UTF-8`` locale, the message would be slightly different:: Python detected LC_CTYPE=C, LC_CTYPE set to en_US.UTF-8 (set PYTHONCOERCECLOCALE=0 to disable this locale coercion behaviour). This locale coercion will mean that the standard Python binary should once again "just work" in the three main failure cases we're aware of (missing locale settings, SSH forwarding of unknown locales, and developers explicitly requesting ``LANG=C``), as long as the target platform provides at least one of the candidate UTF-8 based environments. If ``PYTHONCOERCECLOCALE=0`` is set, or none of the candidate locales is successfully configured, then initialization will continue as usual in the C locale and the Unicode compatibility warning described in the next section will be emitted just as it would for any other application. The interpreter will always check for the ``PYTHONCOERCECLOCALE`` environment variable (even when running under the ``-E`` or ``-I`` switches), as the locale coercion check necessarily takes place before any command line argument processing. Changes to the runtime initialization process --------------------------------------------- By the time that ``Py_Initialize`` is called, arbitrary locale-dependent operations may have taken place in the current process. This means that by the time it is called, it is *too late* to switch to a different locale - doing so would introduce inconsistencies in decoded text, even in the context of the standalone Python interpreter binary. Accordingly, when ``Py_Initialize`` is called and CPython detects that the configured locale is still the default ``C`` locale, the following warning will be issued:: Python runtime initialized with LC_CTYPE=C (a locale with default ASCII encoding), which may cause Unicode compatibility problems. Using C.UTF-8 (if available) as an alternative Unicode-compatible locale is recommended. In this case, no actual change will be made to the locale settings. Instead, the warning informs both system and application integrators that they're running Python 3 in a configuration that we don't expect to work properly. New build-time configuration options ------------------------------------ While both of the above behaviours would be enabled by default, they would also have new associated configuration options and preprocessor definitions for the benefit of redistributors that want to override those default settings. The locale coercion behaviour would be controlled by the flag ``--with[out]-c-locale-coercion``, which would set the ``PY_COERCE_C_LOCALE`` preprocessor definition. The locale warning behaviour would be controlled by the flag ``--with[out]-c-locale-warning``, which would set the ``PY_WARN_ON_C_LOCALE`` preprocessor definition. On platforms where they would have no effect (e.g. Mac OS X, iOS, Android, Windows) these preprocessor variables would always be undefined. Platform Support Changes ======================== A new "Legacy C Locale" section will be added to PEP 11 that states: * as of Python 3.7, the legacy C locale is no longer officially supported, and any Unicode handling issues that occur only in that locale and cannot be reproduced in an appropriately configured non-ASCII locale will be closed as "won't fix" * as of Python 3.7, \*nix platforms are expected to provide at least one of ``C.UTF-8``, ``C.utf8`` or ``en_US.UTF-8`` as an alternative to the legacy ``C`` locale. On platforms which don't yet provide any of these locales, an explicit non-ASCII locale setting will be needed to configure a fully supported environment for running Python 3.7+ Rationale ========= Improving the handling of the C locale -------------------------------------- It has been clear for some time that the C locale's default encoding of ``ASCII`` is entirely the wrong choice for development of modern networked services. Newer languages like Rust and Go have eschewed that default entirely, and instead made it a deployment requirement that systems be configured to use UTF-8 as the text encoding for operating system interfaces. Similarly, Node.js assumes UTF-8 by default (a behaviour inherited from the V8 JavaScript engine) and requires custom build settings to indicate it should use the system locale settings for locale-aware operations. Both the JVM and the .NET CLR use UTF-16-LE as their primary encoding for passing text between applications and the underlying platform. The challenge for CPython has been the fact that in addition to being used for network service development, it is also extensively used as an embedded scripting language in larger applications, and as a desktop application development language, where it is more important to be consistent with other C/C++ components sharing the same process, as well as with the user's desktop locale settings, than it is with the emergent conventions of modern network service development. The core premise of this PEP is that for *all* of these use cases, the default "C" locale is the wrong choice, and furthermore that the following assumptions are valid: * in desktop application use cases, the process locale will *already* be configured appropriately, and if it isn't, then that is an operating system level problem that needs to be reported to and resolved by the operating system provider * in network service development use cases (especially those based on Linux containers), the process locale may not be configured *at all*, and if it isn't, then the expectation is that components will impose their own default encoding the way Rust, Go and Node.js do, rather than trusting the legacy C default encoding of ASCII the way CPython currently does Using "strict" error handling by default ---------------------------------------- By coercing the locale away from the legacy C default and its assumption of ASCII as the preferred text encoding, this PEP also disables the implicit use of the "surrogateescape" error handler on the standard IO streams that was introduced in Python 3.5. This is deliberate, as while UTF-8 as the preferred text encoding is a good working assumption for network service development and for more recent releases of client operating systems, it still isn't a universally valid assumption. In particular, GB 18030 [12_] is a Chinese national text encoding standard that handles all Unicode code points, but is incompatible with both ASCII and UTF-8. Similarly, Shift-JIS [13_] and ISO-2022-JP [14_] remain in widespread use in Japan, and are incompatible with both ASCII and UTF-8. Using strict error handling on the standard streams means that attempting to pass information from a host system using one of these encodings into a container application that is assuming the use of UTF-8 or vice-versa is likely to cause an immediate Unicode encoding or decoding error, rather than potentially causing silent data corruption. Dropping official support for Unicode handling in the legacy C locale --------------------------------------------------------------------- We've been trying to get strict bytes/text separation to work reliably in the legacy C locale for over a decade at this point. Not only haven't we been able to get it to work, neither has anyone else - the only viable alternatives identified have been to pass the bytes along verbatim without eagerly decoding them to text (Python 2.x, Ruby, etc), or else to ignore the nominal C/C++ locale encoding entirely and assume the use of either UTF-8 (PEP 540, Rust, Go, Node.js, etc) or UTF-16-LE (JVM, .NET CLR). While this PEP ensures that developers that need to do so can still opt-in to running their Python code in the legacy C locale, it also makes clear that we *don't* expect Python 3's Unicode handling to be reliable in that configuration, and the recommended alternative is to use a more appropriate locale setting. Providing implicit locale coercion only when running standalone --------------------------------------------------------------- Over the course of Python 3.x development, multiple attempts have been made to improve the handling of incorrect locale settings at the point where the Python interpreter is initialised. The problem that emerged is that this is ultimately *too late* in the interpreter startup process - data such as command line arguments and the contents of environment variables may have already been retrieved from the operating system and processed under the incorrect ASCII text encoding assumption well before ``Py_Initialize`` is called. The problems created by those inconsistencies were then even harder to diagnose and debug than those created by believing the operating system's claim that ASCII was a suitable encoding to use for operating system interfaces. This was the case even for the default CPython binary, let alone larger C/C++ applications that embed CPython as a scripting engine. The approach proposed in this PEP handles that problem by moving the locale coercion as early as possible in the interpreter startup sequence when running standalone: it takes place directly in the C-level ``main()`` function, even before calling in to the `Py_Main()`` library function that implements the features of the CPython interpreter CLI. The ``Py_Initialize`` API then only gains an explicit warning (emitted on ``stderr``) when it detects use of the ``C`` locale, and relies on the embedding application to specify something more reasonable. Querying LC_CTYPE for C locale detection ---------------------------------------- ``LC_CTYPE`` is the actual locale category that CPython relies on to drive the implicit decoding of environment variables, command line arguments, and other text values received from the operating system. As such, it makes sense to check it specifically when attempting to determine whether or not the current locale configuration is likely to cause Unicode handling problems. Setting both LANG & LC_ALL for C.UTF-8 locale coercion ------------------------------------------------------ Python is often used as a glue language, integrating other C/C++ ABI compatible components in the current process, and components written in arbitrary languages in subprocesses. Setting ``LC_ALL`` to ``C.UTF-8`` imposes a locale setting override on all C/C++ components in the current process and in any subprocesses that inherit the current environment. Setting ``LANG`` to ``C.UTF-8`` ensures that even components that only check the ``LANG`` fallback for their locale settings will still use ``C.UTF-8``. Together, these should ensure that when the locale coercion is activated, the switch to the C.UTF-8 locale will be applied consistently across the current process and any subprocesses that inherit the current environment. Allowing restoration of the legacy behaviour -------------------------------------------- The CPython command line interpreter is often used to investigate faults that occur in other applications that embed CPython, and those applications may still be using the C locale even after this PEP is implemented. Providing a simple on/off switch for the locale coercion behaviour makes it much easier to reproduce the behaviour of such applications for debugging purposes, as well as making it easier to reproduce the behaviour of older 3.x runtimes even when running a version with this change applied. Implementation ============== NOTE: The currently posted draft implementation is for a previous iteration of the PEP prior to the incorporation of the feedback noted in [11_]. It was broadly the same in concept (i.e. coercing the legacy C locale to one based on UTF-8), but differs in several details. A draft implementation of the change (including test cases) has been posted to issue 28180 [1_], which is an end user request that ``sys.getfilesystemencoding()`` default to ``utf-8`` rather than ``ascii``. Backporting to earlier Python 3 releases ======================================== Backporting to Python 3.6.0 --------------------------- If this PEP is accepted for Python 3.7, redistributors backporting the change specifically to their initial Python 3.6.0 release will be both allowed and encouraged. However, such backports should only be undertaken either in conjunction with the changes needed to also provide the C.UTF-8 locale by default, or else specifically for platforms where that locale is already consistently available. Backporting to other 3.x releases --------------------------------- While the proposed behavioural change is seen primarily as a bug fix addressing Python 3's current misbehaviour in the default ASCII-based C locale, it still represents a reasonable significant change in the way CPython interacts with the C locale system. As such, while some redistributors may still choose to backport it to even earlier Python 3.x releases based on the needs and interests of their particular user base, this wouldn't be encouraged as a general practice. Acknowledgements ================ The locale coercion approach proposed in this PEP is inspired directly by Armin Ronacher's handling of this problem in the ``click`` command line utility development framework [2_]:: $ LANG=C python3 -c 'import click; cli = click.command()(lambda:None); cli()' Traceback (most recent call last): ... RuntimeError: Click will abort further execution because Python 3 was configured to use ASCII as encoding for the environment. Either run this under Python 2 or consult http://click.pocoo.org/python3/ for mitigation steps. This system supports the C.UTF-8 locale which is recommended. You might be able to resolve your issue by exporting the following environment variables: export LC_ALL=C.UTF-8 export LANG=C.UTF-8 The change was originally proposed as a downstream patch for Fedora's system Python 3.6 package [3_], and then reformulated as a PEP for Python 3.7 with a section allowing for backports to earlier versions by redistributors. The initial draft was posted to the Python Linux SIG for discussion [10_] and then amended based on both that discussion and Victor Stinner's work in PEP 540 [11_]. The "??????" string used in the Unicode handling examples throughout this PEP is taken from Ned Batchelder's excellent "Pragmatic Unicode" presentation [9_]. References ========== .. [1] CPython: sys.getfilesystemencoding() should default to utf-8 (http://bugs.python.org/issue28180) .. [2] Locale configuration required for click applications under Python 3 (http://click.pocoo.org/5/python3/#python-3-surrogate-handling) .. [3] Fedora: force C.UTF-8 when Python 3 is run under the C locale (https://bugzilla.redhat.com/show_bug.cgi?id=1404918) .. [4] GNU C: How Programs Set the Locale ( https://www.gnu.org/software/libc/manual/html_node/Setting-the-Locale.html) .. [5] GNU C: Locale Categories (https://www.gnu.org/software/libc/manual/html_node/Locale-Categories.html) .. [6] glibc C.UTF-8 locale proposal (https://sourceware.org/glibc/wiki/Proposals/C.UTF-8) .. [7] GNOME Flatpak (http://flatpak.org/) .. [8] Ubuntu Snappy (https://www.ubuntu.com/desktop/snappy) .. [9] Pragmatic Unicode (http://nedbatchelder.com/text/unipain.html) .. [10] linux-sig discussion of initial PEP draft (https://mail.python.org/pipermail/linux-sig/2017-January/000014.html) .. [11] Feedback notes from linux-sig discussion and PEP 540 (https://github.com/python/peps/issues/171) .. [12] GB 18030 (https://en.wikipedia.org/wiki/GB_18030) .. [13] Shift-JIS (https://en.wikipedia.org/wiki/Shift_JIS) .. [14] ISO-2022 (https://en.wikipedia.org/wiki/ISO/IEC_2022) Copyright ========= This document has been placed in the public domain under the terms of the CC0 1.0 license: https://creativecommons.org/publicdomain/zero/1.0/ -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From turnbull.stephen.fw at u.tsukuba.ac.jp Sat Jan 7 11:47:21 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Sun, 8 Jan 2017 01:47:21 +0900 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: References: Message-ID: <22641.7065.453418.747867@turnbull.sk.tsukuba.ac.jp> INADA Naoki writes: > I want UTF-8 mode is enabled by default (opt-out option) even if > locale is not POSIX, > like `PYTHONLEGACYWINDOWSFSENCODING`. > > Users depends on locale know what locale is and how to configure it. > They can understand difference between locale mode and UTF-8 mode > and they can opt-out UTF-8 mode. > But many people lives in "UTF-8 everywhere" world, and don't know > about locale. I find all this very strange from someone with what looks like a Japanese name. I see mojibake and non-Unicode encodings around me all the time. Caveat: I teach at a University that prides itself on being the most international of Japanese national universities, so in my daily work I see Japanese in 4 different encodings (5 if you count the UTF-16 used internally by MS Office), Chinese in 3 different (claimed) encodings, and occasionally Russian in at least two encodings, ..., uh, I could go on but won't. In any case, the biggest problems are legacy email programs and busted websites in Japanese, plus email that is labeled "GB2312" but actually conforms to GBK (and this is a reply in Japanese to a Chinese applicant writing in Japanese encoded as GBK). I agree that people around me mostly know only two encodings: "works for me" and "mojibake", but they also use locales configured for them by technical staff. On top of that, international students (the most likely victims of "UTF-8 by default" because students are the biggest Python users) typically have non-Japanese locales set on their imported computers. I'm not going to say my experience is typical enough to block "UTF-8 by default", but let's do this very carefully with thought. From ncoghlan at gmail.com Sat Jan 7 20:08:01 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 8 Jan 2017 11:08:01 +1000 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: <22641.7065.453418.747867@turnbull.sk.tsukuba.ac.jp> References: <22641.7065.453418.747867@turnbull.sk.tsukuba.ac.jp> Message-ID: On 8 January 2017 at 02:47, Stephen J. Turnbull wrote: > I agree that people around me mostly know only two encodings: "works > for me" and "mojibake", but they also use locales configured for them > by technical staff. On top of that, international students (the most > likely victims of "UTF-8 by default" because students are the biggest > Python users) typically have non-Japanese locales set on their > imported computers. > > I'm not going to say my experience is typical enough to block "UTF-8 > by default", but let's do this very carefully with thought. Unsurprisingly (given where I work [1]), one of my key concerns is to enable large Python using institutions to be able to keep moving forward, regardless of whether they've fully standardised their internal environments on UTF-8 or not. As such, while I'm entirely in favour of pushing people towards UTF-8 as the default choice everywhere, I also want to make sure that system and application integrators, including the folks responsible for defining the Standard Operating Environments in large organisations, get warnings of potential problems when they arise, and continue to get encoding errors when we have definitive evidence of a compatibiliy problem. For me, that boils down to: - if a locale is properly configured, we'll continue to respect it - if we're ignoring or changing the locale setting without an explicit config option, we'll emit a warning on stderr that we're doing so (*without* using the warnings system, so there's no way to turn it into an exception) - if a UTF-8 based Linux container is run on a GB-18030/ISO-2022/Shift-JIS/etc host and tries to exchange locally encoded data with that host (rather than exchanging UTF-8 encoded data over a network connection), getting an exception is preferable to silently corrupting the data stream (I think I'll add something along those lines to PEP 538 as a new "Core Design Principles" section) Cheers, Nick. [1] https://docs.python.org/devguide/motivations.html#published-entries -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From songofacandy at gmail.com Sun Jan 8 21:21:41 2017 From: songofacandy at gmail.com (INADA Naoki) Date: Mon, 9 Jan 2017 11:21:41 +0900 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: <22641.7065.453418.747867@turnbull.sk.tsukuba.ac.jp> References: <22641.7065.453418.747867@turnbull.sk.tsukuba.ac.jp> Message-ID: On Sun, Jan 8, 2017 at 1:47 AM, Stephen J. Turnbull wrote: > INADA Naoki writes: > > > I want UTF-8 mode is enabled by default (opt-out option) even if > > locale is not POSIX, > > like `PYTHONLEGACYWINDOWSFSENCODING`. > > > > Users depends on locale know what locale is and how to configure it. > > They can understand difference between locale mode and UTF-8 mode > > and they can opt-out UTF-8 mode. > > But many people lives in "UTF-8 everywhere" world, and don't know > > about locale. > > I find all this very strange from someone with what looks like a > Japanese name. I see mojibake and non-Unicode encodings around me all > the time. Caveat: I teach at a University that prides itself on being > the most international of Japanese national universities, so in my > daily work I see Japanese in 4 different encodings (5 if you count the > UTF-16 used internally by MS Office), Chinese in 3 different (claimed) > encodings, and occasionally Russian in at least two encodings, ..., > uh, I could go on but won't. In any case, the biggest problems are > legacy email programs and busted websites in Japanese, plus email that > is labeled "GB2312" but actually conforms to GBK (and this is a reply > in Japanese to a Chinese applicant writing in Japanese encoded as GBK). Since I work on tech company, and use Linux for most only "server-side" program, I don't live such a situation. But when I see non UTF-8 text, I don't change locale to read such text. (Actually speaking, locale doesn't solve mojibake because it doesn't change my terminal emulator's encoding). And I don't change my terminal emulator setting only for read such a text. What I do is convert it to UTF-8 through command like `view text-from-windows.txt ++enc=cp932` So there are no problem when Python always use UTF-8 for fsencoding and stdio encoding. > > I agree that people around me mostly know only two encodings: "works > for me" and "mojibake", but they also use locales configured for them > by technical staff. On top of that, international students (the most > likely victims of "UTF-8 by default" because students are the biggest > Python users) typically have non-Japanese locales set on their > imported computers. Hmm, Which OS do they use? There are no problem in macOS and Windows. Do they use Linux with locale with encoding other than UTF-8, and their terminal emulator uses non-UTF-8 encoding? As my feeling, UTF-8 start dominating from about 10 years ago, and ja_JP.EUC_JP (it was most common locale for Japanese befoer UTF-8) is complete legacy. There is only one machine (which is in LAN, lives from 10+ years ago, /usr/bin/python is Python 1.5!), I can ssh which has ja_JP.eucjp locale. From simon58500 at bigpond.com Mon Jan 9 06:25:45 2017 From: simon58500 at bigpond.com (Simon Lovell) Date: Mon, 9 Jan 2017 19:25:45 +0800 Subject: [Python-ideas] Python Reviewed Message-ID: <69e3c5d4-d64b-063e-758e-2b0ac1720daa@bigpond.com> Python Reviewed Having used a lot of languages a little bit and not finding satisfactory answers to these in some cases often asked questions, I thought I'd join this group to make a post on the virtues and otherwise of python. The Good: Syntactically significant new lines Syntactically significant indenting Different types of array like structures for different situations Mostly simple and clear structures Avoiding implicit structures like C++ references which add only negative value Avoiding overly complicated chaining expressions like "while(*d++=*s++);" Single syntax for block statements (well, sort of. I'm ignoring lines like "if a=b: c=d") Lack of a with statement which only obscures the code The Bad: Colons at the end of if/while/for blocks. Most of the arguments in favour of this decision boil down to PEP 20.2 "Explicit is better than implicit". Well, no. if/while/for blocks are already explicit. Adding the colon makes it doubly explicit and therefore redundant. There is no reason I can see why this colon can't be made optional except for possibly PEP20.13 "There should be one-- and preferably only one --obvious way to do it". I don't agree that point is sufficient to require colons. No end required for if/while/for blocks. This is particularly a problem when placing code into text without fixed width fonts. It also is a potential problem with tab expansion tricking the programmer. This could be done similarly to requiring declarations in Fortran, which if "implicit none" was added to the top of the program, declarations are required. So add a "Block Close Mandatory" (or similar) keyword to enforce this. In practice there is usually a blank line placed at the end of blocks to try to signal this to someone reading the code. Makes the code less readable and I would refer to PEP20.7 "Readability counts" This code block doesn't compile, even given that function "process" takes one string parameter: f=open(file) endwhile="" while (line=f.readline())!=None: process(line) endwhile I note that many solutions have been proposed to this. In C, it is the ability to write "while(line=fgets(f))" instead of "while((line=fgets(f))!=NULL)" which causes the confusion. No solutions have been accepted to the current method which is tacky: f=open(file) endwhile="" endif="" while True: line=f.readline if line = None: break endif process(line) endwhile Inadequacy of PEP249 - Python Database Specification. This only supports dynamic SQL but SQL and particularly select statements should be easier to work with in the normal cases where you don't need such statements. e.g: endselect="" idList = select from identities where surname = 'JONES': idVar = id forenameVar = forename surnameVar = surname dobVar = dob endselect endfor="" for id in idList: print id.forenameVar, id.dobVar endfor as opposed to what is presently required in the select case which is: curs = connection.cursor() curs.execute("select id, forename, surname, dob from identities where surname = 'JONES'") idList=curs.fetchall() endfor="" for id in idList: print id[1], id[3] endfor I think the improvement in readibility for the first option should be plain to all even in the extremely simple case I've shown. This is the sort of thing which should be possible in any language which works with a database but somehow the IT industry has lost it in the 1990s/2000s. Similarly an upgraded syntax for the insert/values statement which the SQL standard has mis-specified to make the value being inserted too far away from the column name. Should be more like: endinsert="" Insert into identities: id = 1 forename = 'John' surname = 'Smith' dob = '01-Jan-1970' endinsert One of the major problems with the status quo is the lack of named result columns. The other is that the programmer is required to convert the where clause into a string. The functionality of dynamic where/from clauses can still be provided without needing to rely on numbered result columns like so: endselect="" idList = select from identities where :where_clause: id = id forename = forename surname = surname dob = dob endselect Ideally, the bit after the equals sign would support all syntaxes allowed by the host database server which probably means it needs to be free text passed to the server. Where a string variable should be passed, the :variable syntax could be supported but this is not often required Variables never set to anything do not error until they are used, at least in implementations of Python 2 I have tried. e.g. UnlikelyCondition = False endif="" if UnlikelyCondition: print x endif The above code runs fine until UnlikelyCondition is set to True No do-while construct else keyword at the end of while loops is not obvious to those not familiar with it. Something more like whenFalse would be clearer Changing print from a statement to a function in Python 3 adds no positive value that I can see Upper delimiters being exclusive while lower delimiters are inclusive. This is very counter intuitive. e.g. range(1,4) returns [1,2,3]. Better to have the default base as one rather than zero IMO. Of course, the programmer should always be able to define the lower bound. This cannot be changed, of course. Lack of a single character in a method to refer to an attribute instead of a local variable, similar to C's "*" for dereferencing a pointer Inability to make simple chained assignments e.g. "a = b = 0" Conditional expression ( if else ) in Python is less intuitive than in C ( ? : ). Ref PEP308. Why BDFL chose the syntax he did is not at all clear. The Ugly: Persisting with the crapulence from C where a non zero integer is true and zero is false - only ever done because C lacked a boolean data type. This is a flagrant violation of PEP 20.2 "Explicit is better than implicit" and should be removed without providing backwards compatibility. From rosuav at gmail.com Mon Jan 9 08:31:42 2017 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 10 Jan 2017 00:31:42 +1100 Subject: [Python-ideas] Python Reviewed In-Reply-To: <69e3c5d4-d64b-063e-758e-2b0ac1720daa@bigpond.com> References: <69e3c5d4-d64b-063e-758e-2b0ac1720daa@bigpond.com> Message-ID: On Mon, Jan 9, 2017 at 10:25 PM, Simon Lovell wrote: > Python Reviewed > > Having used a lot of languages a little bit and not finding satisfactory > answers to these in some cases often asked questions, I thought I'd join > this group to make a post on the virtues and otherwise of python. I think this thread belongs on python-list at python.org, where you'll find plenty of people happy to discuss why Python is and/or shouldn't be the way it is. A couple of responses to just a couple of your points. > The Good: > Syntactically significant new lines > Syntactically significant indenting > > The Bad: > No end required for if/while/for blocks. This is particularly a problem > when placing code into text without fixed width fonts. It also is a > potential problem with tab expansion tricking the programmer. If indentation and line endings are significant, you shouldn't need end markers. They don't buy you anything. In any case, I've never missed them; in fact, Python code follows the "header and lines" concept that I've worked with in many, MANY data files for decades (think of the sectioned config file format, for example). > This code block doesn't compile, even given that function "process" > takes one string parameter: > f=open(file) > endwhile="" > while (line=f.readline())!=None: > process(line) > endwhile > > I note that many solutions have been proposed to this. In C, it is > the ability to write "while(line=fgets(f))" instead of > "while((line=fgets(f))!=NULL)" which causes the confusion. No solutions have > been accepted to the current method which is tacky: > f=open(file) > endwhile="" > endif="" > while True: > line=f.readline > if line = None: > break > endif > process(line) > endwhile Here's a better way: for line in open(file): process(line) If you translate C code to Python, sure, it'll sometimes come out even uglier than the C original. But there's often a Pythonic way to write things. > Inadequacy of PEP249 - Python Database Specification. This only supports > dynamic SQL but SQL and particularly select statements should be easier to > work with in the normal cases where you don't need such statements. e.g: > endselect="" > idList = select from identities where surname = 'JONES': > idVar = id > forenameVar = forename > surnameVar = surname > dobVar = dob > endselect > > endfor="" > for id in idList: > print id.forenameVar, id.dobVar > endfor You're welcome to propose something like this. I suspect you could build an SQL engine that uses a class to create those bindings - something like: class people(identities): id, forename, surname, dob where="surname = 'JONES'" for person in people: print(person.forename, person.dob) Side point: "forename" and "surname" are inadvisable fields. http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/ > One of the major problems with the status quo is the lack of named > result columns. The other is that the programmer is required to convert the > where clause into a string. The functionality of dynamic where/from clauses > can still be provided without needing to rely on numbered result columns > like so: > endselect="" > idList = select from identities where :where_clause: > id = id > forename = forename > surname = surname > dob = dob > endselect That's easy enough to do with a namedtuple. > Variables never set to anything do not error until they are used, at > least in implementations of Python 2 I have tried. e.g. > UnlikelyCondition = False > endif="" > if UnlikelyCondition: > print x > endif > > The above code runs fine until UnlikelyCondition is set to True That's because globals and builtins could be created dynamically. It's a consequence of not having variable declarations. You'll find a lot of editors/linters will flag this, though. > Changing print from a statement to a function in Python 3 adds no > positive value that I can see Adds heaps of positive value to a lot of people. You simply haven't met the situations where it's better. It's sufficiently better that I often use __future__ to pull it in even in 2.7-only projects. > Lack of a single character in a method to refer to an attribute instead > of a local variable, similar to C's "*" for dereferencing a pointer Ehh. "self." isn't that long. Python isn't AWK. > Inability to make simple chained assignments e.g. "a = b = 0" Really? Works fine. You can chain assignment like that. > Conditional expression ( if else ) > in Python is less intuitive than in C ( ? : > ). Ref PEP308. Why BDFL chose the syntax he did is not at all > clear. I agree with you on this one - specifically, because the order of evaluation is "middle then outside", instead of left-to-right. > The Ugly: > Persisting with the crapulence from C where a non zero integer is true > and zero is false - only ever done because C lacked a boolean data type. > This is a flagrant violation of PEP 20.2 "Explicit is better than implicit" > and should be removed without providing backwards compatibility. In Python, *everything* is either true or false. Anything that represents "something" is true, and anything that represents "nothing" is false. An empty list is false, but a list with items in it is true. This is incredibly helpful and most definitely not ugly; Python is not REXX. ChrisA From simon58500 at bigpond.com Mon Jan 9 08:50:12 2017 From: simon58500 at bigpond.com (Simon Lovell) Date: Mon, 9 Jan 2017 21:50:12 +0800 Subject: [Python-ideas] Python reviewed In-Reply-To: References: Message-ID: <84652a00-c4ef-57f3-cda0-31792b85eae9@bigpond.com> Hmm, Thanks Chris. I thought I was posting this to the correct place. I've never seen that "for line in open ..." after googling it many times! Why is this question so often asked then? Re:Indentation making end block markers not needed; well yes they aren't /needed/. However, they are useful for readability purposes. Perhaps if I use it some more I'll see that they aren't but I doubt it. Re:PEP249 & SQL, I thought I was proposing something like that but it can't be tacked on later I don't think - needs to be an inate part of Python to work as cleanly as 4gl languages. Re: your named tuple suggestion, wouldn't that mean that the naming is divorced from the result column names - that is part of what shouldn't be. Re:Everything being true of false. I don't see the value of that. Only boolean data should be valid in boolean contexts. I don't really see how that can be argued. On 09/01/17 21:31, python-ideas-request at python.org wrote: > Send Python-ideas mailing list submissions to > python-ideas at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/python-ideas > or, via email, send a message with subject or body 'help' to > python-ideas-request at python.org > > You can reach the person managing the list at > python-ideas-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Python-ideas digest..." > > > Today's Topics: > > 1. Re: PEP 540: Add a new UTF-8 mode (INADA Naoki) > 2. Python Reviewed (Simon Lovell) > 3. Re: Python Reviewed (Chris Angelico) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 9 Jan 2017 11:21:41 +0900 > From: INADA Naoki > To: "Stephen J. Turnbull" > Cc: Victor Stinner , python-ideas > > Subject: Re: [Python-ideas] PEP 540: Add a new UTF-8 mode > Message-ID: > > Content-Type: text/plain; charset=UTF-8 > > On Sun, Jan 8, 2017 at 1:47 AM, Stephen J. Turnbull > wrote: >> INADA Naoki writes: >> >> > I want UTF-8 mode is enabled by default (opt-out option) even if >> > locale is not POSIX, >> > like `PYTHONLEGACYWINDOWSFSENCODING`. >> > >> > Users depends on locale know what locale is and how to configure it. >> > They can understand difference between locale mode and UTF-8 mode >> > and they can opt-out UTF-8 mode. >> > But many people lives in "UTF-8 everywhere" world, and don't know >> > about locale. >> >> I find all this very strange from someone with what looks like a >> Japanese name. I see mojibake and non-Unicode encodings around me all >> the time. Caveat: I teach at a University that prides itself on being >> the most international of Japanese national universities, so in my >> daily work I see Japanese in 4 different encodings (5 if you count the >> UTF-16 used internally by MS Office), Chinese in 3 different (claimed) >> encodings, and occasionally Russian in at least two encodings, ..., >> uh, I could go on but won't. In any case, the biggest problems are >> legacy email programs and busted websites in Japanese, plus email that >> is labeled "GB2312" but actually conforms to GBK (and this is a reply >> in Japanese to a Chinese applicant writing in Japanese encoded as GBK). > Since I work on tech company, and use Linux for most only "server-side" program, > I don't live such a situation. > > But when I see non UTF-8 text, I don't change locale to read such text. > (Actually speaking, locale doesn't solve mojibake because it doesn't change > my terminal emulator's encoding). > And I don't change my terminal emulator setting only for read such a text. > What I do is convert it to UTF-8 through command like `view > text-from-windows.txt ++enc=cp932` > > So there are no problem when Python always use UTF-8 for fsencoding > and stdio encoding. > >> I agree that people around me mostly know only two encodings: "works >> for me" and "mojibake", but they also use locales configured for them >> by technical staff. On top of that, international students (the most >> likely victims of "UTF-8 by default" because students are the biggest >> Python users) typically have non-Japanese locales set on their >> imported computers. > Hmm, Which OS do they use? There are no problem in macOS and Windows. > Do they use Linux with locale with encoding other than UTF-8, and > their terminal emulator > uses non-UTF-8 encoding? > > As my feeling, UTF-8 start dominating from about 10 years ago, and > ja_JP.EUC_JP (it was most common locale for Japanese befoer UTF-8) is > complete legacy. > > There is only one machine (which is in LAN, lives from 10+ years ago, > /usr/bin/python is Python 1.5!), > I can ssh which has ja_JP.eucjp locale. > > > ------------------------------ > > Message: 2 > Date: Mon, 9 Jan 2017 19:25:45 +0800 > From: Simon Lovell > To: python-ideas at python.org > Subject: [Python-ideas] Python Reviewed > Message-ID: <69e3c5d4-d64b-063e-758e-2b0ac1720daa at bigpond.com> > Content-Type: text/plain; charset=utf-8; format=flowed > > Python Reviewed > > Having used a lot of languages a little bit and not finding satisfactory > answers to these in some cases often asked questions, I thought I'd join > this group to make a post on the virtues and otherwise of python. > > The Good: > Syntactically significant new lines > Syntactically significant indenting > Different types of array like structures for different situations > Mostly simple and clear structures > Avoiding implicit structures like C++ references which add only > negative value > Avoiding overly complicated chaining expressions like > "while(*d++=*s++);" > Single syntax for block statements (well, sort of. I'm ignoring > lines like "if a=b: c=d") > Lack of a with statement which only obscures the code > > > The Bad: > Colons at the end of if/while/for blocks. Most of the arguments in > favour of this decision boil down to PEP 20.2 "Explicit is better than > implicit". Well, no. if/while/for blocks are already explicit. Adding > the colon makes it doubly explicit and therefore redundant. There is no > reason I can see why this colon can't be made optional except for > possibly PEP20.13 "There should be one-- and preferably only one > --obvious way to do it". I don't agree that point is sufficient to > require colons. > > > No end required for if/while/for blocks. This is particularly a > problem when placing code into text without fixed width fonts. It also > is a potential problem with tab expansion tricking the programmer. This > could be done similarly to requiring declarations in Fortran, which if > "implicit none" was added to the top of the program, declarations are > required. So add a "Block Close Mandatory" (or similar) keyword to > enforce this. In practice there is usually a blank line placed at the > end of blocks to try to signal this to someone reading the code. Makes > the code less readable and I would refer to PEP20.7 "Readability counts" > > > This code block doesn't compile, even given that function "process" > takes one string parameter: > f=open(file) > endwhile="" > while (line=f.readline())!=None: > process(line) > endwhile > > I note that many solutions have been proposed to this. In C, it > is the ability to write "while(line=fgets(f))" instead of > "while((line=fgets(f))!=NULL)" which causes the confusion. No solutions > have been accepted to the current method which is tacky: > f=open(file) > endwhile="" > endif="" > while True: > line=f.readline > if line = None: > break > endif > process(line) > endwhile > > > Inadequacy of PEP249 - Python Database Specification. This only > supports dynamic SQL but SQL and particularly select statements should > be easier to work with in the normal cases where you don't need such > statements. e.g: > endselect="" > idList = select from identities where surname = 'JONES': > idVar = id > forenameVar = forename > surnameVar = surname > dobVar = dob > endselect > > endfor="" > for id in idList: > print id.forenameVar, id.dobVar > endfor > > as opposed to what is presently required in the select case > which is: > curs = connection.cursor() > curs.execute("select id, forename, surname, dob from > identities where surname = 'JONES'") > idList=curs.fetchall() > > endfor="" > for id in idList: > print id[1], id[3] > endfor > > I think the improvement in readibility for the first option > should be plain to all even in the extremely simple case I've shown. > > This is the sort of thing which should be possible in any > language which works with a database but somehow the IT industry has > lost it in the 1990s/2000s. Similarly an upgraded syntax for the > insert/values statement which the SQL standard has mis-specified to make > the value being inserted too far away from the column name. Should be > more like: > endinsert="" > Insert into identities: > id = 1 > forename = 'John' > surname = 'Smith' > dob = '01-Jan-1970' > endinsert > > One of the major problems with the status quo is the lack of > named result columns. The other is that the programmer is required to > convert the where clause into a string. The functionality of dynamic > where/from clauses can still be provided without needing to rely on > numbered result columns like so: > endselect="" > idList = select from identities where :where_clause: > id = id > forename = forename > surname = surname > dob = dob > endselect > > Ideally, the bit after the equals sign would support all > syntaxes allowed by the host database server which probably means it > needs to be free text passed to the server. Where a string variable > should be passed, the :variable syntax could be supported but this is > not often required > > > Variables never set to anything do not error until they are used, > at least in implementations of Python 2 I have tried. e.g. > UnlikelyCondition = False > endif="" > if UnlikelyCondition: > print x > endif > > The above code runs fine until UnlikelyCondition is set to True > > > No do-while construct > > > else keyword at the end of while loops is not obvious to those not > familiar with it. Something more like whenFalse would be clearer > > > Changing print from a statement to a function in Python 3 adds no > positive value that I can see > > > Upper delimiters being exclusive while lower delimiters are > inclusive. This is very counter intuitive. e.g. range(1,4) returns > [1,2,3]. Better to have the default base as one rather than zero IMO. Of > course, the programmer should always be able to define the lower bound. > This cannot be changed, of course. > > > Lack of a single character in a method to refer to an attribute > instead of a local variable, similar to C's "*" for dereferencing a pointer > > > Inability to make simple chained assignments e.g. "a = b = 0" > > > Conditional expression ( if else > ) in Python is less intuitive than in C ( ? > : ). Ref PEP308. Why BDFL chose the syntax he > did is not at all clear. > > > The Ugly: > Persisting with the crapulence from C where a non zero integer is > true and zero is false - only ever done because C lacked a boolean data > type. This is a flagrant violation of PEP 20.2 "Explicit is better than > implicit" and should be removed without providing backwards compatibility. > > > > ------------------------------ > > Message: 3 > Date: Tue, 10 Jan 2017 00:31:42 +1100 > From: Chris Angelico > To: python-ideas > Subject: Re: [Python-ideas] Python Reviewed > Message-ID: > > Content-Type: text/plain; charset=UTF-8 > > On Mon, Jan 9, 2017 at 10:25 PM, Simon Lovell wrote: >> Python Reviewed >> >> Having used a lot of languages a little bit and not finding satisfactory >> answers to these in some cases often asked questions, I thought I'd join >> this group to make a post on the virtues and otherwise of python. > I think this thread belongs on python-list at python.org, where you'll > find plenty of people happy to discuss why Python is and/or shouldn't > be the way it is. > > A couple of responses to just a couple of your points. > >> The Good: >> Syntactically significant new lines >> Syntactically significant indenting >> The Bad: >> No end required for if/while/for blocks. This is particularly a problem >> when placing code into text without fixed width fonts. It also is a >> potential problem with tab expansion tricking the programmer. > If indentation and line endings are significant, you shouldn't need > end markers. They don't buy you anything. In any case, I've never > missed them; in fact, Python code follows the "header and lines" > concept that I've worked with in many, MANY data files for decades > (think of the sectioned config file format, for example). > >> This code block doesn't compile, even given that function "process" >> takes one string parameter: >> f=open(file) >> endwhile="" >> while (line=f.readline())!=None: >> process(line) >> endwhile >> >> I note that many solutions have been proposed to this. In C, it is >> the ability to write "while(line=fgets(f))" instead of >> "while((line=fgets(f))!=NULL)" which causes the confusion. No solutions have >> been accepted to the current method which is tacky: >> f=open(file) >> endwhile="" >> endif="" >> while True: >> line=f.readline >> if line = None: >> break >> endif >> process(line) >> endwhile > Here's a better way: > > for line in open(file): > process(line) > > If you translate C code to Python, sure, it'll sometimes come out even > uglier than the C original. But there's often a Pythonic way to write > things. > >> Inadequacy of PEP249 - Python Database Specification. This only supports >> dynamic SQL but SQL and particularly select statements should be easier to >> work with in the normal cases where you don't need such statements. e.g: >> endselect="" >> idList = select from identities where surname = 'JONES': >> idVar = id >> forenameVar = forename >> surnameVar = surname >> dobVar = dob >> endselect >> >> endfor="" >> for id in idList: >> print id.forenameVar, id.dobVar >> endfor > You're welcome to propose something like this. I suspect you could > build an SQL engine that uses a class to create those bindings - > something like: > > class people(identities): > id, forename, surname, dob > where="surname = 'JONES'" > > for person in people: > print(person.forename, person.dob) > > Side point: "forename" and "surname" are inadvisable fields. > http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/ > >> One of the major problems with the status quo is the lack of named >> result columns. The other is that the programmer is required to convert the >> where clause into a string. The functionality of dynamic where/from clauses >> can still be provided without needing to rely on numbered result columns >> like so: >> endselect="" >> idList = select from identities where :where_clause: >> id = id >> forename = forename >> surname = surname >> dob = dob >> endselect > That's easy enough to do with a namedtuple. > >> Variables never set to anything do not error until they are used, at >> least in implementations of Python 2 I have tried. e.g. >> UnlikelyCondition = False >> endif="" >> if UnlikelyCondition: >> print x >> endif >> >> The above code runs fine until UnlikelyCondition is set to True > That's because globals and builtins could be created dynamically. It's > a consequence of not having variable declarations. You'll find a lot > of editors/linters will flag this, though. > >> Changing print from a statement to a function in Python 3 adds no >> positive value that I can see > Adds heaps of positive value to a lot of people. You simply haven't > met the situations where it's better. It's sufficiently better that I > often use __future__ to pull it in even in 2.7-only projects. > >> Lack of a single character in a method to refer to an attribute instead >> of a local variable, similar to C's "*" for dereferencing a pointer > Ehh. "self." isn't that long. Python isn't AWK. > >> Inability to make simple chained assignments e.g. "a = b = 0" > Really? Works fine. You can chain assignment like that. > >> Conditional expression ( if else ) >> in Python is less intuitive than in C ( ? : >> ). Ref PEP308. Why BDFL chose the syntax he did is not at all >> clear. > I agree with you on this one - specifically, because the order of > evaluation is "middle then outside", instead of left-to-right. > >> The Ugly: >> Persisting with the crapulence from C where a non zero integer is true >> and zero is false - only ever done because C lacked a boolean data type. >> This is a flagrant violation of PEP 20.2 "Explicit is better than implicit" >> and should be removed without providing backwards compatibility. > In Python, *everything* is either true or false. Anything that > represents "something" is true, and anything that represents "nothing" > is false. An empty list is false, but a list with items in it is true. > This is incredibly helpful and most definitely not ugly; Python is not > REXX. > > ChrisA > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > > ------------------------------ > > End of Python-ideas Digest, Vol 122, Issue 20 > ********************************************* From bussonniermatthias at gmail.com Mon Jan 9 09:32:49 2017 From: bussonniermatthias at gmail.com (Matthias Bussonnier) Date: Mon, 9 Jan 2017 15:32:49 +0100 Subject: [Python-ideas] Python reviewed In-Reply-To: <84652a00-c4ef-57f3-cda0-31792b85eae9@bigpond.com> References: <84652a00-c4ef-57f3-cda0-31792b85eae9@bigpond.com> Message-ID: On Mon, Jan 9, 2017 at 2:50 PM, Simon Lovell wrote: > Hmm, Thanks Chris. I thought I was posting this to the correct place. > > I've never seen that "for line in open ..." after googling it many times! > Why is this question so often asked then? > The distinction and the explanation of this is the first result of the google search "Python process file line by line" http://stackoverflow.com/questions/11130312/line-by-line-file-processing-for-loop-vs-with You probably haven't came across that because you think and search using C terms. Once you get used to context managers they are incredibly useful. I would advise to watch talks like "Beyond Pep 8" [1], comparing the same program in Python and Java. > > Re:Indentation making end block markers not needed; well yes they aren't > /needed/. However, they are useful for readability purposes. Perhaps if I > use it some more I'll see that they aren't but I doubt it. I use to like end markers. Though not having them make the code quite shorted. When you have 1 line condition, or context manager or loop, it literally adds 50% more line to your condition/loop/contextmanager. As the next line is anyway de-indented, you see your block anyway. > > > Re:Everything being true of false. I don't see the value of that. Only > boolean data should be valid in boolean contexts. I don't really see how > that can be argued. > Things are true-ish or false-ish, if you prefer. This allows idioms like if mycontainer: process_my_container(mycontainer) And you can process it only if you have items and it is not none. Your own object can raise if they get casted to bools, so if you really like your object to not behave like a bool, you can. It's not because they are truthy or falsy that they compare equal: >>> if [] == False: ... print('[] equals False') ... else: ... print('[] not equals...') ... [] not equals... >>> if not []:print('[] is Falsy') ... [] is Falsy Also Boolean in Python are singletons (like None) , so you will see comparison by identity `... is None` ( `... is False`, `... is True` rarely) if you really care about only being in boolean context you can. Also Python is "ducktyped", if it quack and walks like a duck , then it's a duck. If an object defines how to behave as a bool then that's great. You can then use your own object that are truthy-falsy and carry more information by still using other libraries. if not allowed: raise OhNoooe('You can't because', allowed.reason) -- M [1]:https://www.youtube.com/watch?v=wf-BqAjZb8M Sorry Simon for double Mail, I forgot to reply-all. From barry at python.org Mon Jan 9 11:15:46 2017 From: barry at python.org (Barry Warsaw) Date: Mon, 9 Jan 2017 11:15:46 -0500 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode References: <20170106162432.5e6968c9@subdivisions.wooz.org> Message-ID: <20170109111546.4c9e7edb@subdivisions.wooz.org> On Jan 06, 2017, at 11:08 PM, Steve Dower wrote: >Passing universal_newlines will use whatever locale.getdefaultencoding() There is no locale.getdefaultencoding(); I think you mean locale.getpreferredencoding(False). (See the "Changed in version 3.3" note in $17.5.1.1 of the stdlib docs.) >universal_newlines may become a bad choice if the default encoding no longer >matches what the environment says, and personally, I wouldn't lose much sleep >over that. universal_newlines is also problematic because it's misnamed from the more common motivation to use it. Very often people do want to open std* in text mode (and thus trade in Unicodes), but they rarely equate that to "universal newlines". So the option is just more hidden magical side-effect and cargo-culted lore. It's certainly *useful* though, and I think we want to be sure that we don't break existing code that uses it for this purpose. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 801 bytes Desc: OpenPGP digital signature URL: From turnbull.stephen.fw at u.tsukuba.ac.jp Mon Jan 9 13:42:38 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Tue, 10 Jan 2017 03:42:38 +0900 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: References: <22641.7065.453418.747867@turnbull.sk.tsukuba.ac.jp> Message-ID: <22643.55710.512378.478006@turnbull.sk.tsukuba.ac.jp> INADA Naoki writes: > But when I see non UTF-8 text, I don't change locale to read such > text. Nobody does. The problem is if people have locales set for non-UTF-8, which Chinese people often do ("GB18030 isn't just a good idea, it's the law"). Especially forcing stdout to something other than the locale is likely to mess things up. > As my feeling, UTF-8 start dominating from about 10 years ago, and > ja_JP.EUC_JP (it was most common locale for Japanese before UTF-8) is > complete legacy. My university's internal systems typically produce database output (class registration lists and the like) in Shift JIS, but that's not reliable. Some departments still have their home pages in EUC-JP, and pages where the meta http-equiv elements disagree with the content are not unusual. Private sector may be up to date, but academic sector (and from the state of e-stat.go.jp, government in general, I suspect) is stuck in the Jomon era. I don't know that there's going to be a problem, but the idea of implicitly forcing an encoding different from the locale seems likely to cause confusion to me. Aside from Nick's special case of containers supplied by a vendor different from the host OS, I don't really see why this is a good idea. I think it's best to go with the locale that is set (or not), unless we have very good reason to believe that by far most users would be surprised by that, and those who aren't surprised are mostly expert enough to know how to deal with a forced UTF-8 environment if they *don't* want it. A user-selected option is another matter. From turnbull.stephen.fw at u.tsukuba.ac.jp Mon Jan 9 13:45:14 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Tue, 10 Jan 2017 03:45:14 +0900 Subject: [Python-ideas] Python reviewed In-Reply-To: <84652a00-c4ef-57f3-cda0-31792b85eae9@bigpond.com> References: <84652a00-c4ef-57f3-cda0-31792b85eae9@bigpond.com> Message-ID: <22643.55866.291758.483624@turnbull.sk.tsukuba.ac.jp> Simon Lovell writes: > Hmm, Thanks Chris. I thought I was posting this to the correct > place. Well, you didn't actually make any specific suggestions, and you describe it as a "review" rather than an RFE. > I've never seen that "for line in open ..." after googling it many > times! Why is this question so often asked then? Lot of C programmers out there, I guess. It's in all the textbooks and references I have, and in the tutorial: https://docs.python.org/3/tutorial/inputoutput.html#methods-of-file-objects > Re:PEP249 & SQL, I thought I was proposing something like that but it > can't be tacked on later I don't think - needs to be an inate part of > Python to work as cleanly as 4gl languages. You should take a look at SQLAlchemy and other Python database managers. I don't know what your "untackable it" is so I can't be more specific. Note that PEP 249 is not intended to provide an API for ordinary Python programmers' use. It was expected that convenient management modules would be provided on top of the DBAPI. PEP 249 is intended to provide an API between the backend drivers and the database manager modules, so that any manager could easily interface with any driver. > Re:Everything being true of false. I don't see the value of > that. Only boolean data should be valid in boolean contexts. I > don't really see how that can be argued. There's only a point in arguing it if you think that data types are fundamentally mutually exclusive. But they're not in object-oriented languages like Python. Something can be object, boolean, and str all at the same time. (The presence of a type called boolean is a red herring here. True and False are merely the representative boolean values, a convenience for programmers who want them.) In Python, all data is boolean, unambiguously being interpreted as "true" or "false" in boolean contexts. As far as boolean contexts are concerned, there are an infinite number of objects equivalent to True and another bunch (currently not infinite) equivalent to False. It could be argued that this leads to programmers making bugs, but I personally haven't found it so, in Python or Lisp, and I find the lack of it very annoying when I'm writing Scheme since it's so similar to Lisp. > > The Bad: > > Colons at the end of if/while/for blocks. Most of the arguments > > in favour of this decision boil down to PEP 20.2 "Explicit is > > better than implicit". I seem to recall that this has to do with an implementation requirement, that the syntax be parseable with an LL parser. > > This could be done similarly to requiring declarations in > > Fortran, which if "implicit none" was added to the top of the > > program, declarations are required. It could, but won't. That's a pragma, and Guido hates pragmas. It's certainly not worth a keyword. > > "while((line=fgets(f))!=NULL)" which causes the confusion. No solutions > > have been accepted to the current method which is tacky: > > f=open(file) > > endwhile="" > > endif="" > > while True: > > line=f.readline > > if line = None: > > break > > endif > > process(line) > > endwhile Aside: I really find those suite terminators gratuitous; if I need something "stronger" than a dedent, an empty line is much prettier than they are IMO. Actually the accepted loop-and-a-half idiom is f = open(file) line = f.readline() while line: process(line) line = f.readline() "for line in f" makes that unnecessary in this case, but there do remain cases where loop-and-a-half is needed because of the lack of an assignment expression. > > else keyword at the end of while loops is not obvious to those > > not familiar with it. Something more like whenFalse would be > > clearer Keywords are expensive in that every Python programmer needs to know all of them to avoid using one as an identifier. So it is a general design principle of Python to avoid adding new ones where possible. It turns out that for ... else is rarely used even by experts, and the main use case is extremely idiomatic, so probably no harm is done. > > Changing print from a statement to a function in Python 3 adds no > > positive value that I can see It eliminates a keyword, makes it possible to experiment with different implementations, and allows printing in the middle of expressions (although since print() always returns None, that's not as useful as it could be). > > Upper delimiters being exclusive while lower delimiters are > > inclusive. This is very counter intuitive. If you say so. But it is convenient because list == list[:n] + list[n:]. > > Conditional expression ( if else > > ) in Python is less intuitive than in C ( > > ? : ). Ref PEP308. Why BDFL chose the > > syntax he did is not at all clear. I seem to recall that ?: was out because Guido at the time was adamantly against use of ? as syntax in Python, so we were kinda stuck with keywords. He didn't want to have non-unary expressions start with keywords (would have caused ugliness in the parser, I guess) and he did want to reuse keywords. " then else " was suggested but most who posted thought it less readable than the syntax chosen. YMMV, of course. > > The Ugly: > > Persisting with the crapulence from C where a non zero integer is > > true and zero is false - only ever done because C lacked a > > boolean data type. There were at least four reasons for this in C, in fact, and none have anything to do with the inability to add a Boolean type to a programming language: (0) that's the way the hardware works (1) the idiom "if (ptr)" (2) the idiom "while (i--)" (3) the idiom "while (*dest++ = *src++)" all of which compiled[1] nicely to efficient machine code without an optimization pass, and some take advantage of useful characteristics of the Unix OS. Today one might argue that these are an attractive nuisance and too cute to be allowed to live, but at that time not so much was known about the kind of mistakes programmers like to make. Footnotes: [1] On modern machines (3) is handled more efficiently by specialized instructions. From stephanh42 at gmail.com Mon Jan 9 14:04:17 2017 From: stephanh42 at gmail.com (Stephan Houben) Date: Mon, 9 Jan 2017 20:04:17 +0100 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: <22643.55710.512378.478006@turnbull.sk.tsukuba.ac.jp> References: <22641.7065.453418.747867@turnbull.sk.tsukuba.ac.jp> <22643.55710.512378.478006@turnbull.sk.tsukuba.ac.jp> Message-ID: Hi Stephen, 2017-01-09 19:42 GMT+01:00 Stephen J. Turnbull < turnbull.stephen.fw at u.tsukuba.ac.jp>: > > Private sector may be up to date, but academic sector > (and from the state of e-stat.go.jp, government in general, I suspect) > is stuck in the Jomon era. > I went to that page, checked the HTML and found: Admittedly, the page is in HTML 4.01, but then the Jomon era predates HTML5 by about 16,000 years, so I'll cut them some slack. Anyway, I am quite willing to believe that the situation is as dire as you describe on Windows. However, on OS X, Apple enforces UTF-8. And the Linux vendors are moving in that direction too. And the proposal under discussion is specifically about Linux So, again I am wondering if there are many people who end up with a *Linux* system which has a non-UTF-8 locale. For example, if you use the Ubuntu graphical installer, it asks for your language and then gives you the UTF-8 locale which comes with that. You have to really dive into the non-graphical configuration to get yourself a non-UTF8 locale. Stephan -------------- next part -------------- An HTML attachment was scrubbed... URL: From songofacandy at gmail.com Mon Jan 9 14:12:47 2017 From: songofacandy at gmail.com (INADA Naoki) Date: Tue, 10 Jan 2017 04:12:47 +0900 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: <22643.55710.512378.478006@turnbull.sk.tsukuba.ac.jp> References: <22641.7065.453418.747867@turnbull.sk.tsukuba.ac.jp> <22643.55710.512378.478006@turnbull.sk.tsukuba.ac.jp> Message-ID: > > The problem is if people have locales set for non-UTF-8, which Chinese > people often do ("GB18030 isn't just a good idea, it's the law"). > Especially forcing stdout to something other than the locale is likely > to mess things up. Oh, I didn't know non-UTF-8 is used for LC_CTYPE in these years! > > > As my feeling, UTF-8 start dominating from about 10 years ago, and > > ja_JP.EUC_JP (it was most common locale for Japanese before UTF-8) is > > complete legacy. > > My university's internal systems typically produce database output > (class registration lists and the like) in Shift JIS, but that's not > reliable. Some departments still have their home pages in EUC-JP, and > pages where the meta http-equiv elements disagree with the content are > not unusual. Private sector may be up to date, but academic sector > (and from the state of e-stat.go.jp, government in general, I suspect) > is stuck in the Jomon era. I talked about LC_CTYPE. We have some legacy files too. But it's not relating to neither of fsencoding nor stdio encoding. > > I don't know that there's going to be a problem, but the idea of > implicitly forcing an encoding different from the locale seems > likely to cause confusion to me. Aside from Nick's special case of > containers supplied by a vendor different from the host OS, I don't > really see why this is a good idea. I think it's best to go with the > locale that is set (or not), unless we have very good reason to > believe that by far most users would be surprised by that, and those > who aren't surprised are mostly expert enough to know how to deal with > a forced UTF-8 environment if they *don't* want it. > > A user-selected option is another matter. > Yes. This is balance matter. Some people are surprised by Python may not use UTF-8 even when writing source code in UTF-8, unlike most of other languages. (Not only rust, Go, node.js, but also Ruby, Perl, or even C!) And some people are surprised because they used locale to tell terminal encoding (which is not UTF-8) to some commands, and Python ~3.6 followed it. I thought later group is very small, and more smaller when 3.7 is released. And if we can drop locale support in the future, we will be able to remove some very dirty code in Python/fileutil.c. That's why I prefer locale-free UTF-8 mode by default, and locale-aware mode as opt-in. But I'm OK we start to ignore C locale, sure. From steve at pearwood.info Mon Jan 9 14:40:34 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 10 Jan 2017 06:40:34 +1100 Subject: [Python-ideas] Python Reviewed In-Reply-To: <69e3c5d4-d64b-063e-758e-2b0ac1720daa@bigpond.com> References: <69e3c5d4-d64b-063e-758e-2b0ac1720daa@bigpond.com> Message-ID: <20170109194033.GN3887@ando.pearwood.info> On Mon, Jan 09, 2017 at 07:25:45PM +0800, Simon Lovell wrote: > The Good: > Syntactically significant new lines > Syntactically significant indenting > Different types of array like structures for different situations > Mostly simple and clear structures > Avoiding implicit structures like C++ references which add only > negative value > Avoiding overly complicated chaining expressions like > "while(*d++=*s++);" > Single syntax for block statements (well, sort of. I'm ignoring > lines like "if a=b: c=d") > Lack of a with statement which only obscures the code Python has a `with` statement. As a newcomer to this community, and apparently the language as well, do you understand how obnoxious and arrogant it comes across for you to declare what is and isn't "good" and "bad" about the language, particularly when you appear to know the language very well? Most of your comments aren't even a little bit objective, they're subjective judgements based on (I expect) what you are used to, and nothing more. Matters of taste, at best, and yet you're stating them as objective fact. So if you feel that my response is a tad blunt or even brusque, perhaps you can understand why. > The Bad: > Colons at the end of if/while/for blocks. Most of the arguments in > favour of this decision boil down to PEP 20.2 "Explicit is better than > implicit". This is the first time I've heard this ridiculous explanation for the use of colons! Where did you get it from? It sounds like the sort of nonsense that gets highly voted on StackOverflow. The reason for colons is a FAQ: https://docs.python.org/3/faq/design.html#why-are-colons-required-for-the-if-while-def-class-statements > No end required for if/while/for blocks. Thank you Guido, for avoiding needing all those redundant and unnecessary "end" statements that other languages waste my time with. > This code block doesn't compile, even given that function "process" > takes one string parameter: > f=open(file) > endwhile="" > while (line=f.readline())!=None: > process(line) > endwhile Assignment is not and should not be an expression, at very least not using the same = (pseudo-)operator which is used as an assignment statement. It may be acceptible with some other syntax that cannot be easily mistyped when you want == equality, but not = alone. But what is that obfuscated code meant to do? Operate on the file, one line at a time? This is simpler: with open(file) as f: for line in f: process(line) and it has the advantage of also automatically closing the file when the `with` block gets exited. > Inadequacy of PEP249 - Python Database Specification. [...] I cannot comment on this. > Variables never set to anything do not error until they are used, Fair enough, that is a minor annoyance occasionally. > No do-while construct What do you mean by "do-while" and how do you expect it to differ from "while"? > else keyword at the end of while loops is not obvious to those not > familiar with it. Something more like whenFalse would be clearer Indeed. for...else and while...else are much more accurately described as for...then, while...then. The code in the "else" block is *unconditionally* executed following the for... or while... block, which makes the block a "then" rather than "else". To avoid that block, you have to jump out of the entire compound block, using "break", "return" or "raise". But we're stuck with the name now. > Changing print from a statement to a function in Python 3 adds no > positive value that I can see Consistency: print doesn't need to be a special cased statement. It does nothing special that a function can't do. So why make it a statement? As a function, it can be passed around as a first class value, used as a callback, monkey-patched or shadowed or mocked as needed. None of these things are possible with a statement. It can use the same ordinary syntax as other functions, instead of the bizarre and ugly special case syntax used as a statement: # Python 3 print(value, end='', file=sys.stderr) # Python 2 print >>sys.stderr, value, # note the magical trailing comma If Python was invented today, what arguments could anyone make for having print be a statement instead of a function? "It saves typing two parentheses." Anything else? > Upper delimiters being exclusive while lower delimiters are > inclusive. Dijkstra explained it well: http://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html Half-open ranges are much superior for avoiding off-by-one errors than closed ranges. > This is very counter intuitive. e.g. range(1,4) returns > [1,2,3]. Better to have the default base as one rather than zero IMO. Of > course, the programmer should always be able to define the lower bound. > This cannot be changed, of course. It is true that starting counting at zero takes a bit of getting used to, and there are times when it is more convenient to start at one. But no one solution is ideal all the time, and we have to pick one system or the other, or else we end up with an overly complex syntax with marginal utility. > Lack of a single character in a method to refer to an attribute > instead of a local variable, similar to C's "*" for dereferencing a pointer The lack of single-character syntax for many things helps prevents Python code looking like line noise. > Inability to make simple chained assignments e.g. "a = b = 0" Python does support chained assignment. > Conditional expression ( if else > ) in Python is less intuitive than in C ( ? > : ). Ref PEP308. Why BDFL chose the syntax he > did is not at all clear. What is "intuitive" to people who have learned C and C-influenced languages is a perplexing, mysterious enigma almost impossible to google for or look up in books. flag ? 1 : 2 means nothing to people who haven't memorized what it means. But Python's ternary syntax is actual grammatically correct English: 1 if flag else 2 and it puts one of the values first, in the most important part, instead of the flag. > The Ugly: > Persisting with the crapulence from C where a non zero integer is > true and zero is false - only ever done because C lacked a boolean data > type. This is a flagrant violation of PEP 20.2 "Explicit is better than > implicit" and should be removed without providing backwards compatibility. Thanks for your opinion, but I prefer the status quo. Welcome to the list. -- Steve From ckaynor at zindagigames.com Mon Jan 9 15:26:45 2017 From: ckaynor at zindagigames.com (Chris Kaynor) Date: Mon, 9 Jan 2017 12:26:45 -0800 Subject: [Python-ideas] Python Reviewed In-Reply-To: <20170109194033.GN3887@ando.pearwood.info> References: <69e3c5d4-d64b-063e-758e-2b0ac1720daa@bigpond.com> <20170109194033.GN3887@ando.pearwood.info> Message-ID: On Mon, Jan 9, 2017 at 11:40 AM, Steven D'Aprano wrote: > On Mon, Jan 09, 2017 at 07:25:45PM +0800, Simon Lovell wrote: >> Lack of a with statement which only obscures the code >Python has a `with` statement. I suspect Simon means similar to the VB with statement, which allows an object to become the default namespace. Basically, rather than: object.alpha() object.beta() you can do: with object: alpha() beta() or some slight variant thereof. Both cases do the same thing. Personally, I vastly prefer the explicit naming of self, and it should be noted that the style guides I have seen for C/C++ code have required member variable names to be prefixed with something like "m_" or just "_" to keep them from getting confused with local variables. Python solves this problem by requiring the object reference, and generally, I have found that much of the time it does not add that much extra to the code to have the explicit references, while making it very clear what is intended. >> No do-while construct > > What do you mean by "do-while" and how do you expect it to differ from > "while"? This one is somewhat common in other languages, and a do-while executes at least once (checks the condition at the end), while while executes at least zero times (checks the condition at the start). In C: do { ..block.. } while (condition); is the same as: ..block.. while (condition) { ..block.. } where both "..block.." are the same. The exact same result can be gotten with (in Python): while True: ..block.. if condition: break That said, I've only occasionally had use for the construct, and the most common case I've seen for it is multi-line macros in C/C++ where it is needed to get proper handling to require a semicolon at the end of the macro invocation and handle the optional braces in the flow control structures. Almost always, if checking the condition at the start is not good enough, I almost always want to check the condition somewhere in the middle instead, so the "while True:" works better. >> This is very counter intuitive. e.g. range(1,4) returns >> [1,2,3]. Better to have the default base as one rather than zero IMO. Of >> course, the programmer should always be able to define the lower bound. >> This cannot be changed, of course. > > It is true that starting counting at zero takes a bit of getting used > to, and there are times when it is more convenient to start at one. But > no one solution is ideal all the time, and we have to pick one system or > the other, or else we end up with an overly complex syntax with marginal > utility. I've had to deal with mixed zero-based and one-based in coding before (and still often), and it was a huge pain. Zero-based works much better for many practical programming tasks, but can be difficult to get used to. I work in game development, so it is not uncommon to have mixed (artists and designers like one-based as they are generally non-technical). It is pretty annoying when may variable names have to be post fixed by "number" or "index" to try to keep it straight (and that fails half the time). The worst I had to deal with regarding it was in code that was crossing C++ and LUA, where C++ is zero-based and LUA is one-based - it was extremely difficult to remember all the needed +1s and -1s in the boundary code... >> Lack of a single character in a method to refer to an attribute >> instead of a local variable, similar to C's "*" for dereferencing a pointer > > The lack of single-character syntax for many things helps prevents > Python code looking like line noise. I would not recommend this, however it should be noted that Python assigns no special meaning to the name "self", and the variable could be named anything you want in your methods, including single character names. You would still need to type a minimum of two characters though. This is perfectly valid Python code, though would be against many (if not most) style guides: class MyObject: def __init__(s, myArg1, myArg2): s.myArg1 = myArg1 s.myArg2 = myArg2 a = MyObject(1, 2) print(a.myArg1, a.myArg2) # prints "1 2" If the variable "s" were renamed to "self", you would get the exact same result, however the code would match most style guides, and linters will likely stop complaining. From ned at nedbatchelder.com Mon Jan 9 15:56:33 2017 From: ned at nedbatchelder.com (Ned Batchelder) Date: Mon, 9 Jan 2017 15:56:33 -0500 Subject: [Python-ideas] Python Reviewed In-Reply-To: References: <69e3c5d4-d64b-063e-758e-2b0ac1720daa@bigpond.com> Message-ID: <43c02e7c-f63a-18d3-4261-3a16e1396108@nedbatchelder.com> On 1/9/17 8:31 AM, Chris Angelico wrote: > On Mon, Jan 9, 2017 at 10:25 PM, Simon Lovell wrote: >> Python Reviewed >> >> Having used a lot of languages a little bit and not finding satisfactory >> answers to these in some cases often asked questions, I thought I'd join >> this group to make a post on the virtues and otherwise of python. > I think this thread belongs on python-list at python.org, where you'll > find plenty of people happy to discuss why Python is and/or shouldn't > be the way it is. I think this is the only reasonable response to this posting on this mailing list. Simon: quoting from the Python-Ideas info page: "This list is to contain discussion of speculative language ideas for Python for possible inclusion into the language." Your comments, while interesting, don't make specific proposals for changes to Python. python-list at python.org is good for general discussion. If you do intend to make specific proposals, you'll have to put a lot more work into them. Proposals should be focused and specific; one thread with a dozen ideas makes discussion impossible. It helps to understand the language and its history. Many of your reactions to Python have been expressed many times before, so there are well-documented discussions and rationales for Python being the way it is. Doing some research beforehand can save you some work. Finally, backward compatibility is a serious consideration. Proposals containing new keywords, for example, are nearly impossible to get approved. Welcome to the community, --Ned. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikhailwas at gmail.com Mon Jan 9 17:33:36 2017 From: mikhailwas at gmail.com (Mikhail V) Date: Mon, 9 Jan 2017 23:33:36 +0100 Subject: [Python-ideas] Python Reviewed In-Reply-To: <69e3c5d4-d64b-063e-758e-2b0ac1720daa@bigpond.com> References: <69e3c5d4-d64b-063e-758e-2b0ac1720daa@bigpond.com> Message-ID: On 9 January 2017 at 12:25, Simon Lovell wrote: > Python Reviewed > > The Good : > ... > The Bad: > ... I agree with many points, but: > No end required for if/while/for blocks. .. Makes the code less readable Nope, it makes code significantly better readable. I'm sort of past master in such questions so there is very little chance of BSing me. In some cases I'd want some space after the block, so probably future IDEs will allow placing small vertical indents to help with that. > No do-while construct I don't think it is needed much, I never came up with thoughts that I want it. If I'd design a syntax from scratch, there would be only infinite loop and break. > Conditional expression in Python is less intuitive than in C Probably, but conditional expressions are IMO not needed and I'd remove them just not to pollute the syntax. Mikhail -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Mon Jan 9 20:09:25 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 10 Jan 2017 12:09:25 +1100 Subject: [Python-ideas] Python Reviewed In-Reply-To: <20170109194033.GN3887@ando.pearwood.info> References: <69e3c5d4-d64b-063e-758e-2b0ac1720daa@bigpond.com> <20170109194033.GN3887@ando.pearwood.info> Message-ID: <20170110010923.GO3887@ando.pearwood.info> On Tue, Jan 10, 2017 at 06:40:34AM +1100, Steven D'Aprano wrote: > particularly when you appear to know the language very well? Of course I mean "don't appear". -- Steve From simon58500 at bigpond.com Mon Jan 9 20:12:05 2017 From: simon58500 at bigpond.com (Simon Lovell) Date: Tue, 10 Jan 2017 09:12:05 +0800 Subject: [Python-ideas] Python reviewed Message-ID: Thanks for the feedback guys. A few quick comments: Re: Colons. I'm sure I've seen that FAQ before. I may be arrogant but I can't take it seriously. Being "slightly" easier to read is hardly a reason for a mandatory structure. Re: PEP249. I thought I'd detailed quite a bit of what I thought should be possible. Is there a forum for advancing this? Re: do-while - that a is a loop construct that executes once before evaluating the condition. Supported by most languages. Re: Counters starting at zero vs one, Fortran has a neat solution to this for arrays if not strings - allow the programmer to select the starting index. I've seen -1 and 1000, for example. I can't say I'm convinced by Dijkstra's argument but it is somewhat esoteric because it isn't changing. When I've programmed for loops in C, sometimes you want zero based and sometimes one based, while most times you don't really care. To make it readable I would wherever possible write either: for (i=0;i References: <84652a00-c4ef-57f3-cda0-31792b85eae9@bigpond.com> <22643.55866.291758.483624@turnbull.sk.tsukuba.ac.jp> Message-ID: not even sure why Im engaging, but.... Note 1) Many of these issues have been widely discussed all over the internet -- I don't think I've seen anything new here. So it would have been nice to do some more research before posting. Now into the fray! > Re:Everything being true of false. I don't see the value of > > that. Only boolean data should be valid in boolean contexts. I actually agree with that -- but there are a lot of nice conveniences from Python's idea of "truthiness", too. > > > The Bad: > > > Colons at the end of if/while/for blocks. Most of the arguments > > > in favour of this decision boil down to PEP 20.2 "Explicit is > > > better than implicit". > > I seem to recall that this has to do with an implementation > requirement, that the syntax be parseable with an LL parser. I don't so -- but I DO think that this was a usability issue that was investigated early in PYthon's development. (Or maybe even ABC's development) -- in fact, I suspect is is one of the few programming language syntax decisions (in any language) that went through any kind of formal usability testing. I didn't bring it up -- so I'll leave the googling to others. > Actually the accepted loop-and-a-half idiom is > > f = open(file) > line = f.readline() > while line: > process(line) > line = f.readline() > I used to write that style, but I've never liked it, so went with: f = open(file) while True: line = f.readline() if not f: break process(line) the while True and check for a break is kinda ugly, but I think less ugly than two separate calls to readline() and, of course, we now have: for line in f: process(line) which is cleaner an easier than anything I've seen in any other language -- so WHAT exactly was the original complaint??? > > else keyword at the end of while loops is not obvious to those > > > not familiar with it. Something more like whenFalse would be > > > clearer > I've kinda wanted an "do-until" loop of some sort sometimes, but frankly, not that badly :-) > > > Changing print from a statement to a function in Python 3 adds no > > > positive value that I can see > yeah, yeah yeah -- PLEASE go read an number of py3 justifications and rants! This is really a dead horse. > > > Upper delimiters being exclusive while lower delimiters are > > > inclusive. This is very counter intuitive. > but SO much better than the alternative! This was also tested som, I think maybe by Dijkstra of C++ fame. but these identities are REALLY, REALLY useful: s[:n] + s[n:] == s len(s[:n]) == n len(s[:-n]) == n len(s[n:i]) == i - n (maybe a few others?) These prevent a HUGE number of off by one errors and ecta code adding and subtracting one all over the place -- this is almost as important to Pythons' usability as the significant indentation :-) > > Conditional expression ( if else > > > ) in Python is less intuitive than in C ( > > > ? : ). Ha Ha Ha Ha!!! I could read and understand Python's version the first time I saw it -- I still have to look up the C version (not much of C programmer). maybe you think it's wordy -- but it's very readable. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Mon Jan 9 20:26:00 2017 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 9 Jan 2017 20:26:00 -0500 Subject: [Python-ideas] Python reviewed In-Reply-To: References: Message-ID: On Mon, Jan 9, 2017 at 8:12 PM, Simon Lovell wrote: > Re: Colons. I'm sure I've seen that FAQ before. I may be arrogant but I > can't take it seriously. Being "slightly" easier to read is hardly a reason > for a mandatory structure. > "Readability counts." Did you notice that you placed a redundant ":"s in every comment of yours after "Re"? I don't think your message would look better without them. Another advantage of having : is that it allows smart editors to detect and highlight errors sooner and to better perform auto-indentation. -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Jan 9 20:26:19 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 9 Jan 2017 17:26:19 -0800 Subject: [Python-ideas] Python reviewed In-Reply-To: References: Message-ID: On Mon, Jan 9, 2017 at 5:12 PM, Simon Lovell wrote: > Re: Counters starting at zero vs one, Fortran has a neat solution to this > for arrays if not strings - allow the programmer to select the starting > index. I liked that back in the day, but I think it's really better if it's always the same. and see my other note for why the zero-based and open ended slicing is fabulous -- indexing really needs to match slicing. ONe more: since you mentioned Fortran -- it's a common use-case for an array to model some sort of regular spaced grid, so: x = start_x + i*delta_x really easy and logical math for figuring out where you are on a grid (and the reverse calculation) -- this is a pain with 1-based indexing.... (of course, C does this for pointer math for the same reason...) When I've programmed for loops in C, sometimes you want zero based and > sometimes one based, while most times you don't really care. To make it > readable I would wherever possible write either: > for (i=0;i for (i=1;i<=j;i++) // In both cases always executing j times > in pyton, you never right that code anyway. most of the time, it's for item in sequence: no indexes at all. or: for i in range(N): ... indexes, but you dont care or for i, item in enumerate(seq): ... or for item1, item2 in zip(sequence): ... i.e you almost never care what the starting index is! -CHB > Rgds > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Jan 9 20:30:33 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 9 Jan 2017 17:30:33 -0800 Subject: [Python-ideas] Python Reviewed In-Reply-To: <69e3c5d4-d64b-063e-758e-2b0ac1720daa@bigpond.com> References: <69e3c5d4-d64b-063e-758e-2b0ac1720daa@bigpond.com> Message-ID: I just noticed a logical inconsistency here: The Good: > Syntactically significant indenting > The Bad: > Colons at the end of if/while/for blocks. > > No end required for if/while/for blocks. Huh? if you have Syntactically significant indenting, then an "end" indicator is redundant. and right above, you say you don't like the redundant colon.... which is it?? It also is a potential problem with tab expansion tricking the programmer. I do wish tabs had been banned from indentation from the beginning.... -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Jan 9 20:32:41 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 9 Jan 2017 17:32:41 -0800 Subject: [Python-ideas] Python Reviewed In-Reply-To: References: <69e3c5d4-d64b-063e-758e-2b0ac1720daa@bigpond.com> Message-ID: Oh, Here is the history of the colon: http://python-history.blogspot.com/2009/02/early-language-design-and-development.html -CHB On Mon, Jan 9, 2017 at 5:30 PM, Chris Barker wrote: > I just noticed a logical inconsistency here: > > The Good: >> Syntactically significant indenting > > > >> The Bad: >> Colons at the end of if/while/for blocks. >> > > >> No end required for if/while/for blocks. > > > Huh? if you have Syntactically significant indenting, then an "end" > indicator is redundant. and right above, you say you don't like the > redundant colon.... > > which is it?? > > It also is a potential problem with tab expansion tricking the programmer. > > > I do wish tabs had been banned from indentation from the beginning.... > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Mon Jan 9 20:38:50 2017 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 9 Jan 2017 20:38:50 -0500 Subject: [Python-ideas] Python reviewed In-Reply-To: References: <84652a00-c4ef-57f3-cda0-31792b85eae9@bigpond.com> <22643.55866.291758.483624@turnbull.sk.tsukuba.ac.jp> Message-ID: On Mon, Jan 9, 2017 at 8:19 PM, Chris Barker wrote: > > I think maybe by Dijkstra of C++ fame. Dijkstra is famous for many things, but C++ is another Dutchman's fault. Dijkstra's famous works include "GOTO Considered Harmful" [1] and "How do we tell truths that might hurt?" [2]. [1]: http://wiki.c2.com/?GotoConsideredHarmful [2]: http://www.cs.utexas.edu/users/EWD/transcriptions/EWD04xx/EWD498.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From simon58500 at bigpond.com Mon Jan 9 20:44:31 2017 From: simon58500 at bigpond.com (Simon Lovell) Date: Tue, 10 Jan 2017 09:44:31 +0800 Subject: [Python-ideas] Python reviewed In-Reply-To: References: Message-ID: <451e10fa-f437-2132-8a0f-a167b78fac6a@bigpond.com> Also in Python you can use: for x in range (1,j+1): to loop j times. Although it does read as though it is looping j+1 times to those not familiar. One more comment I wanted to make about end blocks, is that a respectable editor will add them for you, together with the indentation of the next line. EditPlus 2 did it best in my experience although I think I just haven't seen a well configured alternative. I very rarely forget the block closer but I do sometimes forget the colon. Regarding the logical inconsistency of my argument, well I am saying that I would prefer my redundancy at the end of the loop rather than the beginning. To say that the status quo is better is to say that you prefer your redundancy at the beginning. Fair enough, I'm happy to respect your opinion there. I still struggle to see why it should be mandatory though? For those who prefer to have the block closing delimiters this way, is the need for a keyword (could be a command line option) really the objection? I'll have a detailed look at your colon link a bit later. On 10/01/17 09:26, Chris Barker wrote: > On Mon, Jan 9, 2017 at 5:12 PM, Simon Lovell > wrote: > > Re: Counters starting at zero vs one, Fortran has a neat solution > to this for arrays if not strings - allow the programmer to select > the starting index. > > > I liked that back in the day, but I think it's really better if it's > always the same. > > and see my other note for why the zero-based and open ended slicing is > fabulous -- indexing really needs to match slicing. > > ONe more: > > since you mentioned Fortran -- it's a common use-case for an array to > model some sort of regular spaced grid, so: > > x = start_x + i*delta_x > > really easy and logical math for figuring out where you are on a grid > (and the reverse calculation) -- this is a pain with 1-based indexing.... > > (of course, C does this for pointer math for the same reason...) > > When I've programmed for loops in C, sometimes you want zero based > and sometimes one based, while most times you don't really care. > To make it readable I would wherever possible write either: > for (i=0;i for (i=1;i<=j;i++) // In both cases always executing j times > > > in pyton, you never right that code anyway. most of the time, it's > > for item in sequence: > > no indexes at all. > > or: > > for i in range(N): > ... > > indexes, but you dont care > > or > > for i, item in enumerate(seq): > ... > > or for item1, item2 in zip(sequence): > ... > > i.e you almost never care what the starting index is! > > -CHB > > > > Rgds > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > > > > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From ned at nedbatchelder.com Mon Jan 9 20:50:07 2017 From: ned at nedbatchelder.com (Ned Batchelder) Date: Mon, 9 Jan 2017 20:50:07 -0500 Subject: [Python-ideas] Python reviewed In-Reply-To: <451e10fa-f437-2132-8a0f-a167b78fac6a@bigpond.com> References: <451e10fa-f437-2132-8a0f-a167b78fac6a@bigpond.com> Message-ID: On 1/9/17 8:44 PM, Simon Lovell wrote: > > Also in Python you can use: > > for x in range (1,j+1): > > to loop j times. Although it does read as though it is looping j+1 > times to those not familiar. > > One more comment I wanted to make about end blocks, is that a > respectable editor will add them for you, together with the > indentation of the next line. EditPlus 2 did it best in my experience > although I think I just haven't seen a well configured alternative. I > very rarely forget the block closer but I do sometimes forget the colon. > > Regarding the logical inconsistency of my argument, well I am saying > that I would prefer my redundancy at the end of the loop rather than > the beginning. To say that the status quo is better is to say that you > prefer your redundancy at the beginning. Fair enough, I'm happy to > respect your opinion there. I still struggle to see why it should be > mandatory though? For those who prefer to have the block closing > delimiters this way, is the need for a keyword (could be a command > line option) really the objection? > Can you clarify something for us? Are you proposing to add end-block syntax? You've said you prefer it, and that you miss it. But you haven't said, "I am proposing that Python should change." This list is about proposing changes to Python. Are you proposing that Python change in this way? You understand that this is a significant shift from a syntax that has been in place for a quarter century? Perhaps you should give yourself time to get used to Python as Python is. --Ned. > > I'll have a detailed look at your colon link a bit later. > > > On 10/01/17 09:26, Chris Barker wrote: >> On Mon, Jan 9, 2017 at 5:12 PM, Simon Lovell > > wrote: >> >> Re: Counters starting at zero vs one, Fortran has a neat solution >> to this for arrays if not strings - allow the programmer to >> select the starting index. >> >> >> I liked that back in the day, but I think it's really better if it's >> always the same. >> >> and see my other note for why the zero-based and open ended slicing >> is fabulous -- indexing really needs to match slicing. >> >> ONe more: >> >> since you mentioned Fortran -- it's a common use-case for an array to >> model some sort of regular spaced grid, so: >> >> x = start_x + i*delta_x >> >> really easy and logical math for figuring out where you are on a grid >> (and the reverse calculation) -- this is a pain with 1-based indexing.... >> >> (of course, C does this for pointer math for the same reason...) >> >> When I've programmed for loops in C, sometimes you want zero >> based and sometimes one based, while most times you don't really >> care. To make it readable I would wherever possible write either: >> for (i=0;i> for (i=1;i<=j;i++) // In both cases always executing j times >> >> >> in pyton, you never right that code anyway. most of the time, it's >> >> for item in sequence: >> >> no indexes at all. >> >> or: >> >> for i in range(N): >> ... >> >> indexes, but you dont care >> >> or >> >> for i, item in enumerate(seq): >> ... >> >> or for item1, item2 in zip(sequence): >> ... >> >> i.e you almost never care what the starting index is! >> >> -CHB >> >> >> >> >> >> Rgds >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> >> >> >> >> -- >> >> Christopher Barker, Ph.D. >> Oceanographer >> >> Emergency Response Division >> NOAA/NOS/OR&R (206) 526-6959 voice >> 7600 Sand Point Way NE (206) 526-6329 fax >> Seattle, WA 98115 (206) 526-6317 main reception >> >> Chris.Barker at noaa.gov > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Mon Jan 9 21:56:36 2017 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 10 Jan 2017 13:56:36 +1100 Subject: [Python-ideas] Python reviewed In-Reply-To: <451e10fa-f437-2132-8a0f-a167b78fac6a@bigpond.com> References: <451e10fa-f437-2132-8a0f-a167b78fac6a@bigpond.com> Message-ID: On Tue, Jan 10, 2017 at 12:44 PM, Simon Lovell wrote: > Regarding the logical inconsistency of my argument, well I am saying that I > would prefer my redundancy at the end of the loop rather than the beginning. > To say that the status quo is better is to say that you prefer your > redundancy at the beginning. Fair enough, I'm happy to respect your opinion > there. I still struggle to see why it should be mandatory though? For those > who prefer to have the block closing delimiters this way, is the need for a > keyword (could be a command line option) really the objection? Actually, Python does have a way to enable optional block closing directives. They're a little more compact than "endfor" and "endwhile" etc, and they're optional, so the compiler won't require you to use them (that would break heaps of libraries), but try this: -- cut -- import sys for arg in sys.argv: if arg == "hello": print("Hello, sir/madam") #if #for -- cut -- Okay, okay, that's a bit of a cheat, but still, if you really truly want "endfor", all you have to do is spell it "#for" and it'll be accepted. Don't expect experienced Python programmers to accept this at code review though. (And if you insist on a command line option, "python3 -X hashblockend" will do that for you. It won't actually DO anything though.) ChrisA From python at lucidity.plus.com Mon Jan 9 22:29:38 2017 From: python at lucidity.plus.com (Erik) Date: Tue, 10 Jan 2017 03:29:38 +0000 Subject: [Python-ideas] Python reviewed In-Reply-To: <451e10fa-f437-2132-8a0f-a167b78fac6a@bigpond.com> References: <451e10fa-f437-2132-8a0f-a167b78fac6a@bigpond.com> Message-ID: On 10/01/17 01:44, Simon Lovell wrote: > Regarding the logical inconsistency of my argument, well I am saying > that I would prefer my redundancy at the end of the loop rather than the > beginning. To say that the status quo is better is to say that you > prefer your redundancy at the beginning. It's not really that one prefers redundancy anywhere. It's more a question of: a) Does the redundancy have any (however small) benefit? b) How "expensive" is the redundancy (in this case, that equates to mandatory characters typed and subsequent screen noise when reading the code). I don't understand how a "redundancy" of a trailing colon in any statement that will introduce a new level of indentation is worse than having to remember to type "end" when a dedent (which is zero characters) does that. Trailing colon "cost": 1 * (0.n) Block end "cost": (len("end") + len(statement_text)) * 1.0 > I still struggle to see why it should be > mandatory though? That looks like a statement, but you've ended it with a question mark. Are you asking if you still struggle? I can't tell. Perhaps it's just the correct use of punctuation that you're objecting to ;) > One more comment I wanted to make about end blocks, is that a > respectable editor will add them for you, You are now asking me to write code with what you describe as a "respectable" editor. I use vim, which is very respectable, thank you. You'd like me to use "EditPlus 2" or equivalent. I struggle to see why that should be mandatory. Thanks for starting an entertaining thread, though ;) E. From simon58500 at bigpond.com Tue Jan 10 00:18:19 2017 From: simon58500 at bigpond.com (Simon Lovell) Date: Tue, 10 Jan 2017 13:18:19 +0800 Subject: [Python-ideas] Python Reviewed In-Reply-To: References: <69e3c5d4-d64b-063e-758e-2b0ac1720daa@bigpond.com> Message-ID: <9ff56983-a543-14c7-f2a6-d060f454fdab@bigpond.com> Hi Kyle, I don't see the harm caused from having a do-while construct. Not the most used construct but it is sometimes useful and not having it means you need to either have a first time through marker or a break at the end of a "while True:" loop. I would say that None should also be non-boolean. Otherwise you are saying that things which might be None would be True if not None. Re: SQLAlchemy, this does not resolve the issues satisfactorily to me. Re: half-loop. The idea that you have the same code before entry and at the end of the loop is really ugly and raises the potential for errors. I can remember doing something similarly ugly with this for looping through a cursor but I can't recall why I couldn't just do a .fetchall() and then loop through the results. Maybe I ultimately changed it to do precisely that. Perhaps you have too big a data set to load into memory all at once though and can't do it that way. Anyway, the SQL processing is all too difficult in Python and Java and nearly all modern languages. Re: Conditional expression. You could have: " = if then else Oh one last thing (I hope), the argument for having the current slice notation by Dijkstra, that it looks messy to have a loop where the contents are never executed or can no longer be executed is ridiculous! That *should* look messy. for range(1,1): means executing once to me. If you had 1 based, two of four of the other idioms would work the same: s[:n] + s[n:] == s // doesn't work. I don't think it should work though len(s[:n]) == n // works len(s[:-n]) == n // rather independent but would still work if language is otherwise unchanged. len(s[n:i]) == i - n // doesn't work. Does it need to? Rgds -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Tue Jan 10 00:31:20 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 09 Jan 2017 21:31:20 -0800 Subject: [Python-ideas] Python Reviewed In-Reply-To: <9ff56983-a543-14c7-f2a6-d060f454fdab@bigpond.com> References: <69e3c5d4-d64b-063e-758e-2b0ac1720daa@bigpond.com> <9ff56983-a543-14c7-f2a6-d060f454fdab@bigpond.com> Message-ID: <587471A8.4060508@stoneleaf.us> On 01/09/2017 09:18 PM, Simon Lovell wrote: [snip] This is not the place for this conversation. Please take it to Python List. -- ~Ethan~ From steve at pearwood.info Tue Jan 10 05:42:18 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 10 Jan 2017 21:42:18 +1100 Subject: [Python-ideas] Python reviewed In-Reply-To: <451e10fa-f437-2132-8a0f-a167b78fac6a@bigpond.com> References: <451e10fa-f437-2132-8a0f-a167b78fac6a@bigpond.com> Message-ID: <20170110104218.GP3887@ando.pearwood.info> On Tue, Jan 10, 2017 at 09:44:31AM +0800, Simon Lovell wrote: > Also in Python you can use: > > for x in range (1,j+1): > > to loop j times. Although it does read as though it is looping j+1 times > to those not familiar. *shrug* To those "not familiar", most language features are mysterious and it is easy to guess wrong. What's the difference between foo[x+1] and foo(x+1)? In Python, the first is a key or index lookup and the second is a function call; but in Mathematica, the first is a function call and the second is foo multiplied by x+1. Python prides itself in having a much easier learning curve than many languages, with syntax that is close to "executable pseudo-code", but that doesn't mean that there is *nothing* to learn. > One more comment I wanted to make about end blocks, [...] If I never have to see code like: end end end end end end again, it will be too soon. > I still struggle to see why it should be > mandatory though? For those who prefer to have the block closing > delimiters this way, is the need for a keyword (could be a command line > option) really the objection? It's not mandatory -- there are dozens of other languages you can use that will satisfy your urge for a redundant "end" block marker. But for *Python*, it is mandatory because it is Guido's language, not yours. When you design your own language, you can design it to be as complicated or simple, as baroque or plain as you like. Think about what you are asking for: a command-line option that controls whether or not the interpreter requires "end" after each block. Now every single library module needs to be written twice, once with "end", once without. Otherwise, it won't even compile for half the users. If all you care about is something physically at the end of the block, without the compiler enforcing it, then Python already supports this, using your choice of keyword: with open(filename) as f: for line in f: if condition: while something: spam(line) #end #finish #done #conclusion Problem solved. -- Steve From rob.cliffe at btinternet.com Tue Jan 10 07:06:29 2017 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Tue, 10 Jan 2017 12:06:29 +0000 Subject: [Python-ideas] Python Reviewed In-Reply-To: <9ff56983-a543-14c7-f2a6-d060f454fdab@bigpond.com> References: <69e3c5d4-d64b-063e-758e-2b0ac1720daa@bigpond.com> <9ff56983-a543-14c7-f2a6-d060f454fdab@bigpond.com> Message-ID: <2e064f62-2b0c-20d1-40a8-8001fee47e2e@btinternet.com> On 10/01/2017 05:18, Simon Lovell wrote: > Hi Kyle, > > I don't see the harm caused from having a do-while construct. Not the > most used construct but it is sometimes useful and not having it means > you need to either have a first time through marker or a break at the > end of a "while True:" loop. > > > I would say that None should also be non-boolean. Otherwise you are > saying that things which might be None would be True if not None. > > > Re: SQLAlchemy, this does not resolve the issues satisfactorily to me. > > > Re: half-loop. The idea that you have the same code before entry and > at the end of the loop is really ugly and raises the potential for > errors. I can remember doing something similarly ugly with this for > looping through a cursor but I can't recall why I couldn't just do a > .fetchall() and then loop through the results. Maybe I ultimately > changed it to do precisely that. Perhaps you have too big a data set > to load into memory all at once though and can't do it that way. > Anyway, the SQL processing is all too difficult in Python and Java and > nearly all modern languages. > > > Re: Conditional expression. You could have: " = if > then else requires more concentration than it should. However, this means > another keyword*. > > * General comment: I posted this because Googling didn't give me a > satisfactory answer to why Python is the way that it is. I think I see > it now. Guido hates keywords. That last sentence is a ridiculous (and insulting) statement. Adding a keyword to Python means that all Python code ever written potentially becomes incompatible with the next Python release (if it might use that keyword as an identifier). So the bar for adding a *new* keyword is necessarily very high. > I don't find this particularly logical but it is what it is and it > also isn't going to change. That seems also to explain the else > keyword at the end of the while loop. > > > Anyway, I think this discussion has reached its natural conclusion here. Rob Cliffe > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > From guido at python.org Tue Jan 10 10:29:12 2017 From: guido at python.org (Guido van Rossum) Date: Tue, 10 Jan 2017 07:29:12 -0800 Subject: [Python-ideas] How to respond to trolling Message-ID: Was it really necessary for all the usual folks on this list to engage with the "Python review" threads? I think a much more effective response would have been a resounding silence. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Tue Jan 10 10:34:00 2017 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Tue, 10 Jan 2017 09:34:00 -0600 Subject: [Python-ideas] Python Reviewed In-Reply-To: <9ff56983-a543-14c7-f2a6-d060f454fdab@bigpond.com> References: <69e3c5d4-d64b-063e-758e-2b0ac1720daa@bigpond.com> <9ff56983-a543-14c7-f2a6-d060f454fdab@bigpond.com> Message-ID: I just want to point ONE thing out: On Jan 9, 2017 11:18 PM, "Simon Lovell" wrote: * General comment: I posted this because Googling didn't give me a satisfactory answer to why Python is the way that it is. I think I see it now. Guido hates keywords. I don't find this particularly logical but it is what it is and it also isn't going to change. That seems also to explain the else keyword at the end of the while loop. No, it's because every new keyword that you add has the potential to break code that uses that as a variable name. Python is used by thousands of places across the globe right now. It would be suicidal to break half of that because someone felt the need for a new keyword. I feel like there are two things you're missing: 1. The stark majority of the "review" you made is taking about stuff that simply isn't going to change. Again, too much code to break. 2. The entirety of the "review" is your opinion. You may love the `end` keyword and enjoy using it (for some reason), but that doesn't mean it's *objectively* better. It just means it's better for you. Python is the way it is because that's the way it is, and we like it that way. _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ -- Ryan (????) Yoko Shimomura > ryo (supercell/EGOIST) > Hiroyuki Sawano >> everyone else http://kirbyfan64.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From levkivskyi at gmail.com Tue Jan 10 10:43:10 2017 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Tue, 10 Jan 2017 16:43:10 +0100 Subject: [Python-ideas] How to respond to trolling In-Reply-To: References: Message-ID: On 10 January 2017 at 16:29, Guido van Rossum wrote: > I think a much more effective response would have been a resounding > silence. > I agree. -- Ivan -------------- next part -------------- An HTML attachment was scrubbed... URL: From ned at nedbatchelder.com Tue Jan 10 10:56:51 2017 From: ned at nedbatchelder.com (Ned Batchelder) Date: Tue, 10 Jan 2017 10:56:51 -0500 Subject: [Python-ideas] How to respond to trolling In-Reply-To: References: Message-ID: On 1/10/17 10:43 AM, Ivan Levkivskyi wrote: > On 10 January 2017 at 16:29, Guido van Rossum > wrote: > > I think a much more effective response would have been a > resounding silence. > > > I agree. > I don't like to use the term "trolling" except for people who are trying to annoy people. I think the recent thread was misguided, but not malicious. I do agree that the thread should have ended at "unless you are seriously proposing a change to the language, this is not the right list." --Ned. -------------- next part -------------- An HTML attachment was scrubbed... URL: From thane.brimhall at gmail.com Tue Jan 10 11:36:43 2017 From: thane.brimhall at gmail.com (Thane Brimhall) Date: Tue, 10 Jan 2017 08:36:43 -0800 (PST) Subject: [Python-ideas] api suggestions for the cProfile module In-Reply-To: References: Message-ID: <4fc1586d-60c8-420c-90dc-2c73b689570e@googlegroups.com> I hate to be "that guy" but... bump! Does anyone have thoughts on this topic? I assume the silence is because this suggestion is too trivial to matter. /Thane On Tuesday, December 20, 2016 at 5:51:49 PM UTC-7, Thane Brimhall wrote: > > I use cProfile a lot, and would like to suggest three backwards-compatible > improvements to the API. > > 1: When using cProfile on a specific piece of code I often use the > enable() and disable() methods. It occurred to me that this would be an > obvious place to use a context manager. > > 2: Enhance the `print_stats` method on Profile to accept more options > currently available only through the pstats.Stats class. For example, > strip_dirs could be a boolean argument, and limit could accept an int. This > would reduce the number of cases you'd need to use the more complex API. > > 3: I often forget which string keys are available for sorting. It would be > nice to add an enum for these so a user could have their linter and IDE > check that value pre-runtime. Since it would subclass `str` and `Enum` it > would still work with all currently existing code. > > The current documentation contains the following code: > > import cProfile, pstats, io > pr = cProfile.Profile() > pr.enable() > # ... do something ... > pr.disable() > s = io.StringIO() > sortby = 'cumulative' > ps = pstats.Stats(pr, stream=s).sort_stats(sortby) > ps.print_stats() > print(s.getvalue()) > > While the code below doesn't exactly match the functionality above (eg. > not using StringIO), I envision the context manager working like this, > along with some adjustments on how to get the stats from the profiler: > > import cProfile, pstats > with cProfile.Profile() as pr: > # ... do something ... > pr.print_stats(sort=pstats.Sort.cumulative, limit=10, strip_dirs=True) > > As you can see, the code is shorter and somewhat more self-documenting. > The best thing about these suggestions is that as far as I can tell they > would be backwards-compatible API additions. > > What do you think? Thank you in advance for your time! > > /Thane > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Tue Jan 10 11:57:21 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 10 Jan 2017 08:57:21 -0800 Subject: [Python-ideas] api suggestions for the cProfile module In-Reply-To: <4fc1586d-60c8-420c-90dc-2c73b689570e@googlegroups.com> References: <4fc1586d-60c8-420c-90dc-2c73b689570e@googlegroups.com> Message-ID: <58751271.80706@stoneleaf.us> On 01/10/2017 08:36 AM, Thane Brimhall wrote: > Does anyone have thoughts on this topic? I assume the silence is because > this suggestion is too trivial to matter. Sometimes it's just a matter of timing. :) > I use cProfile a lot, and would like to suggest three backwards-compatible > improvements to the API. > > 1: When using cProfile on a specific piece of code I often use the > enable() and disable() methods. It occurred to me that this would > be an obvious place to use a context manager. Absolutely. > 2: Enhance the `print_stats` method on Profile to accept more options > currently available only through the pstats.Stats class. For example, > strip_dirs could be a boolean argument, and limit could accept an int. > This would reduce the number of cases you'd need to use the more complex > API. I don't have much experience with cProfile, but this seems reasonable. > 3: I often forget which string keys are available for sorting. It would > be nice to add an enum for these so a user could have their linter and > IDE check that value pre-runtime. Since it would subclass `str` and > `Enum` it would still work with all currently existing code. Absolutely! :) > The current documentation contains the following code: > > import cProfile, pstats, io > pr = cProfile.Profile() > pr.enable() > # ... do something ... > pr.disable() > s = io.StringIO() > sortby = 'cumulative' > ps = pstats.Stats(pr, stream=s).sort_stats(sortby) > ps.print_stats() > print(s.getvalue()) > > While the code below doesn't exactly match the functionality above (eg. not > using StringIO), I envision the context manager working like this, along > with some adjustments on how to get the stats from the profiler: > > import cProfile, pstats > with cProfile.Profile() as pr: > # ... do something ... > pr.print_stats(sort=pstats.Sort.cumulative, limit=10, strip_dirs=True) > > As you can see, the code is shorter and somewhat more self-documenting. The > best thing about these suggestions is that as far as I can tell they would > be backwards-compatible API additions. The `pr.print_stats... line should not be inside the `with` block unless you want to profile that part as well. These suggestions seem fairly uncontroversial. Have you opened an issue on the issue tracker? The fun part of the patch will be the C code, but a Python proof-of-concept would be useful. -- ~Ethan~ From greg at krypto.org Tue Jan 10 13:01:30 2017 From: greg at krypto.org (Gregory P. Smith) Date: Tue, 10 Jan 2017 18:01:30 +0000 Subject: [Python-ideas] api suggestions for the cProfile module In-Reply-To: <58751271.80706@stoneleaf.us> References: <4fc1586d-60c8-420c-90dc-2c73b689570e@googlegroups.com> <58751271.80706@stoneleaf.us> Message-ID: At a glance, all of these sound like good modernizing enhancements for cprofile. It just takes someone to contribute the work. :) On Tue, Jan 10, 2017, 8:57 AM Ethan Furman wrote: > On 01/10/2017 08:36 AM, Thane Brimhall wrote: > > > Does anyone have thoughts on this topic? I assume the silence is because > > this suggestion is too trivial to matter. > > Sometimes it's just a matter of timing. :) > > > I use cProfile a lot, and would like to suggest three > backwards-compatible > > improvements to the API. > > > > 1: When using cProfile on a specific piece of code I often use the > > enable() and disable() methods. It occurred to me that this would > > be an obvious place to use a context manager. > > Absolutely. > > > 2: Enhance the `print_stats` method on Profile to accept more options > > currently available only through the pstats.Stats class. For example, > > strip_dirs could be a boolean argument, and limit could accept an int. > > This would reduce the number of cases you'd need to use the more complex > > API. > > I don't have much experience with cProfile, but this seems reasonable. > > > 3: I often forget which string keys are available for sorting. It would > > be nice to add an enum for these so a user could have their linter and > > IDE check that value pre-runtime. Since it would subclass `str` and > > `Enum` it would still work with all currently existing code. > > Absolutely! :) > > > The current documentation contains the following code: > > > > import cProfile, pstats, io > > pr = cProfile.Profile() > > pr.enable() > > # ... do something ... > > pr.disable() > > s = io.StringIO() > > sortby = 'cumulative' > > ps = pstats.Stats(pr, stream=s).sort_stats(sortby) > > ps.print_stats() > > print(s.getvalue()) > > > > While the code below doesn't exactly match the functionality above (eg. > not > > using StringIO), I envision the context manager working like this, along > > with some adjustments on how to get the stats from the profiler: > > > > import cProfile, pstats > > with cProfile.Profile() as pr: > > # ... do something ... > > pr.print_stats(sort=pstats.Sort.cumulative, limit=10, > strip_dirs=True) > > > > As you can see, the code is shorter and somewhat more self-documenting. > The > > best thing about these suggestions is that as far as I can tell they > would > > be backwards-compatible API additions. > > The `pr.print_stats... line should not be inside the `with` block unless > you want to profile that part as well. > > These suggestions seem fairly uncontroversial. Have you opened an issue > on the issue tracker? > > The fun part of the patch will be the C code, but a Python > proof-of-concept would be useful. > > -- > ~Ethan~ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thane.brimhall at gmail.com Tue Jan 10 12:16:52 2017 From: thane.brimhall at gmail.com (Thane Brimhall) Date: Tue, 10 Jan 2017 09:16:52 -0800 (PST) Subject: [Python-ideas] api suggestions for the cProfile module In-Reply-To: <58751271.80706@stoneleaf.us> References: <4fc1586d-60c8-420c-90dc-2c73b689570e@googlegroups.com> <58751271.80706@stoneleaf.us> Message-ID: <946388a6-cb58-4742-9e0a-d1b559ab3298@googlegroups.com> Thanks for getting back to me on this! Yes timing can be a big factor. :) Turns out this gave me opportunity to look a little further back in the archives and someone suggested a very similar API change in November, so maybe more people than just me would want a feature like this. Good call on putting the print_stats outside of the context block. Kinda meta to profile the profiler... If the next step is to open an issue on the tracker, I'll do that. I can work on a Python proof-of-concept to attach there as well. Again, thanks for your feedback! /Thane On Tuesday, January 10, 2017 at 9:57:54 AM UTC-7, Ethan Furman wrote: > > On 01/10/2017 08:36 AM, Thane Brimhall wrote: > > > Does anyone have thoughts on this topic? I assume the silence is because > > this suggestion is too trivial to matter. > > Sometimes it's just a matter of timing. :) > > > I use cProfile a lot, and would like to suggest three > backwards-compatible > > improvements to the API. > > > > 1: When using cProfile on a specific piece of code I often use the > > enable() and disable() methods. It occurred to me that this would > > be an obvious place to use a context manager. > > Absolutely. > > > 2: Enhance the `print_stats` method on Profile to accept more options > > currently available only through the pstats.Stats class. For example, > > strip_dirs could be a boolean argument, and limit could accept an int. > > This would reduce the number of cases you'd need to use the more > complex > > API. > > I don't have much experience with cProfile, but this seems reasonable. > > > 3: I often forget which string keys are available for sorting. It would > > be nice to add an enum for these so a user could have their linter and > > IDE check that value pre-runtime. Since it would subclass `str` and > > `Enum` it would still work with all currently existing code. > > Absolutely! :) > > > The current documentation contains the following code: > > > > import cProfile, pstats, io > > pr = cProfile.Profile() > > pr.enable() > > # ... do something ... > > pr.disable() > > s = io.StringIO() > > sortby = 'cumulative' > > ps = pstats.Stats(pr, stream=s).sort_stats(sortby) > > ps.print_stats() > > print(s.getvalue()) > > > > While the code below doesn't exactly match the functionality above (eg. > not > > using StringIO), I envision the context manager working like this, > along > > with some adjustments on how to get the stats from the profiler: > > > > import cProfile, pstats > > with cProfile.Profile() as pr: > > # ... do something ... > > pr.print_stats(sort=pstats.Sort.cumulative, limit=10, > strip_dirs=True) > > > > As you can see, the code is shorter and somewhat more self-documenting. > The > > best thing about these suggestions is that as far as I can tell they > would > > be backwards-compatible API additions. > > The `pr.print_stats... line should not be inside the `with` block unless > you want to profile that part as well. > > These suggestions seem fairly uncontroversial. Have you opened an issue > on the issue tracker? > > The fun part of the patch will be the C code, but a Python > proof-of-concept would be useful. > > -- > ~Ethan~ > _______________________________________________ > Python-ideas mailing list > Python... at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jelle.zijlstra at gmail.com Tue Jan 10 14:18:23 2017 From: jelle.zijlstra at gmail.com (Jelle Zijlstra) Date: Tue, 10 Jan 2017 11:18:23 -0800 Subject: [Python-ideas] api suggestions for the cProfile module Message-ID: 2017-01-10 8:57 GMT-08:00 Ethan Furman : > On 01/10/2017 08:36 AM, Thane Brimhall wrote: > > Does anyone have thoughts on this topic? I assume the silence is because >> this suggestion is too trivial to matter. >> > > Sometimes it's just a matter of timing. :) > > I use cProfile a lot, and would like to suggest three backwards-compatible >> improvements to the API. >> >> 1: When using cProfile on a specific piece of code I often use the >> enable() and disable() methods. It occurred to me that this would >> be an obvious place to use a context manager. >> > > Absolutely. > > 2: Enhance the `print_stats` method on Profile to accept more options >> currently available only through the pstats.Stats class. For example, >> strip_dirs could be a boolean argument, and limit could accept an int. >> This would reduce the number of cases you'd need to use the more complex >> API. >> > > I don't have much experience with cProfile, but this seems reasonable. > > 3: I often forget which string keys are available for sorting. It would >> be nice to add an enum for these so a user could have their linter and >> IDE check that value pre-runtime. Since it would subclass `str` and >> `Enum` it would still work with all currently existing code. >> > > Absolutely! :) > > The current documentation contains the following code: >> >> import cProfile, pstats, io >> pr = cProfile.Profile() >> pr.enable() >> # ... do something ... >> pr.disable() >> s = io.StringIO() >> sortby = 'cumulative' >> ps = pstats.Stats(pr, stream=s).sort_stats(sortby) >> ps.print_stats() >> print(s.getvalue()) >> >> While the code below doesn't exactly match the functionality above (eg. >> not >> using StringIO), I envision the context manager working like this, along >> with some adjustments on how to get the stats from the profiler: >> >> import cProfile, pstats >> with cProfile.Profile() as pr: >> # ... do something ... >> pr.print_stats(sort=pstats.Sort.cumulative, limit=10, >> strip_dirs=True) >> >> As you can see, the code is shorter and somewhat more self-documenting. >> The >> best thing about these suggestions is that as far as I can tell they >> would >> be backwards-compatible API additions. >> > > The `pr.print_stats... line should not be inside the `with` block unless > you want to profile that part as well. > > These suggestions seem fairly uncontroversial. Have you opened an issue > on the issue tracker? > > The fun part of the patch will be the C code, but a Python > proof-of-concept would be useful. > > These changes may not even require C code, since (contrary to its name) cProfile actually is implemented partly in Python. For example, the context manager change could be made simply by adding __enter__ and __exit__ to the cProfile.Profile class. > -- > ~Ethan~ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thane.brimhall at gmail.com Tue Jan 10 15:34:51 2017 From: thane.brimhall at gmail.com (Thane Brimhall) Date: Tue, 10 Jan 2017 12:34:51 -0800 (PST) Subject: [Python-ideas] api suggestions for the cProfile module In-Reply-To: References: Message-ID: That is great news because I'd be happy to do the implementation myself if it only requires Python. (Sadly I'm not proficient in C.) I'll be coding over the next couple days preparing an example implementation, then I will open an issue on the bug tracker. /Thane On Tuesday, January 10, 2017 at 12:19:14 PM UTC-7, Jelle Zijlstra wrote: > > > 2017-01-10 8:57 GMT-08:00 Ethan Furman >: > >> On 01/10/2017 08:36 AM, Thane Brimhall wrote: >> >> Does anyone have thoughts on this topic? I assume the silence is because >>> this suggestion is too trivial to matter. >>> >> >> Sometimes it's just a matter of timing. :) >> >> I use cProfile a lot, and would like to suggest three backwards-compatible >>> improvements to the API. >>> >>> 1: When using cProfile on a specific piece of code I often use the >>> enable() and disable() methods. It occurred to me that this would >>> be an obvious place to use a context manager. >>> >> >> Absolutely. >> >> 2: Enhance the `print_stats` method on Profile to accept more options >>> currently available only through the pstats.Stats class. For example, >>> strip_dirs could be a boolean argument, and limit could accept an int. >>> This would reduce the number of cases you'd need to use the more complex >>> API. >>> >> >> I don't have much experience with cProfile, but this seems reasonable. >> >> 3: I often forget which string keys are available for sorting. It would >>> be nice to add an enum for these so a user could have their linter and >>> IDE check that value pre-runtime. Since it would subclass `str` and >>> `Enum` it would still work with all currently existing code. >>> >> >> Absolutely! :) >> >> The current documentation contains the following code: >>> >>> import cProfile, pstats, io >>> pr = cProfile.Profile() >>> pr.enable() >>> # ... do something ... >>> pr.disable() >>> s = io.StringIO() >>> sortby = 'cumulative' >>> ps = pstats.Stats(pr, stream=s).sort_stats(sortby) >>> ps.print_stats() >>> print(s.getvalue()) >>> >>> While the code below doesn't exactly match the functionality above (eg. >>> not >>> using StringIO), I envision the context manager working like this, along >>> with some adjustments on how to get the stats from the profiler: >>> >>> import cProfile, pstats >>> with cProfile.Profile() as pr: >>> # ... do something ... >>> pr.print_stats(sort=pstats.Sort.cumulative, limit=10, >>> strip_dirs=True) >>> >>> As you can see, the code is shorter and somewhat more self-documenting. >>> The >>> best thing about these suggestions is that as far as I can tell they >>> would >>> be backwards-compatible API additions. >>> >> >> The `pr.print_stats... line should not be inside the `with` block unless >> you want to profile that part as well. >> >> These suggestions seem fairly uncontroversial. Have you opened an issue >> on the issue tracker? >> >> The fun part of the patch will be the C code, but a Python >> proof-of-concept would be useful. >> >> These changes may not even require C code, since (contrary to its name) > cProfile actually is implemented partly in Python. For example, the context > manager change could be made simply by adding __enter__ and __exit__ to the > cProfile.Profile class. > >> -- >> ~Ethan~ >> _______________________________________________ >> Python-ideas mailing list >> Python... at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From prometheus235 at gmail.com Tue Jan 10 16:03:26 2017 From: prometheus235 at gmail.com (Nick Timkovich) Date: Tue, 10 Jan 2017 21:03:26 +0000 Subject: [Python-ideas] How to respond to trolling In-Reply-To: References: Message-ID: If you're proposing throwing half of Python's current syntax in the bin, this isn't the right list either. If not marginally malicious, I think it's delusional to think a post to Language X's lists by someone who recommends multiple breaking changes would ever be accepted. The correct response (if any) would be to use another language or write your own transpiler that better agrees with your aesthetic. Nick > I don't like to use the term "trolling" except for people who are trying to > annoy people. I think the recent thread was misguided, but not malicious. > I do agree that the thread should have ended at "unless you are seriously > proposing a change to the language, this is not the right list." > > > --Ned > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Jan 10 16:58:23 2017 From: guido at python.org (Guido van Rossum) Date: Tue, 10 Jan 2017 13:58:23 -0800 Subject: [Python-ideas] How to respond to trolling In-Reply-To: References: Message-ID: Whether the intent was to annoy or just to provoke, the effect was dozens of messages with people falling over each other trying to engage the OP, who clearly was ignorant of most language design issues and uninterested in learning, and threw some insults in for good measure. The respondents should have known better. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Tue Jan 10 17:55:57 2017 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 10 Jan 2017 17:55:57 -0500 Subject: [Python-ideas] api suggestions for the cProfile module In-Reply-To: <4fc1586d-60c8-420c-90dc-2c73b689570e@googlegroups.com> References: <4fc1586d-60c8-420c-90dc-2c73b689570e@googlegroups.com> Message-ID: On 1/10/2017 11:36 AM, Thane Brimhall wrote: > Does anyone have thoughts on this topic? I assume the silence is because > this suggestion is too trivial to matter. 1 and 3 don't really need discussion here. 2 perhaps. I would open 3 separate enhancement issues. As near as I can tell, profile and cProfile have the same API and to the extent it is true, this should be maintained. > On Tuesday, December 20, 2016 at 5:51:49 PM UTC-7, Thane Brimhall wrote: > > I use cProfile a lot, and would like to suggest three > backwards-compatible improvements to the API. > > 1: When using cProfile on a specific piece of code I often use the > enable() and disable() methods. It occurred to me that this would be > an obvious place to use a context manager. > > 2: Enhance the `print_stats` method on Profile to accept more > options currently available only through the pstats.Stats class. For > example, strip_dirs could be a boolean argument, and limit could > accept an int. This would reduce the number of cases you'd need to > use the more complex API. > > 3: I often forget which string keys are available for sorting. It > would be nice to add an enum for these so a user could have their > linter and IDE check that value pre-runtime. Since it would subclass > `str` and `Enum` it would still work with all currently existing code. -- Terry Jan Reedy From rainventions at gmail.com Tue Jan 10 17:55:04 2017 From: rainventions at gmail.com (Ryan Birmingham) Date: Tue, 10 Jan 2017 17:55:04 -0500 Subject: [Python-ideas] How to respond to trolling In-Reply-To: References: Message-ID: I think that replying with an almost canned response, like the one Ned proposed ("unless you are seriously proposing a change to the language, this is not the right list."), would help discourage other list members from responding where responses aren't necessary. -Ryan Birmingham On 10 January 2017 at 16:58, Guido van Rossum wrote: > Whether the intent was to annoy or just to provoke, the effect was dozens > of messages with people falling over each other trying to engage the OP, > who clearly was ignorant of most language design issues and uninterested in > learning, and threw some insults in for good measure. The respondents > should have known better. > > -- > --Guido van Rossum (python.org/~guido) > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Tue Jan 10 18:36:01 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 11 Jan 2017 10:36:01 +1100 Subject: [Python-ideas] How to respond to trolling In-Reply-To: References: Message-ID: <20170110233600.GQ3887@ando.pearwood.info> On Tue, Jan 10, 2017 at 07:29:12AM -0800, Guido van Rossum wrote: > Was it really necessary for all the usual folks on this list to engage with > the "Python review" threads? I think a much more effective response would > have been a resounding silence. Giving a newcomer the Silent Treatment because they've questioned some undocumented set of features not open to change is not Open, Considerate or Respectful (the CoC). Even if their ideas are ignorant or ill-thought out, we must give them the benefit of the doubt and assume they are making their comments in good faith rather than trolling. Shunning is a particularly nasty form of passive-aggression, as the person being shunned doesn't even get any hint as to what they have done to bring it on. It's one thing to ignore an unrepentant troublemaker or troll after numerous warnings -- that's the old Usenet "plonk" or kill-file treatment -- but greeting a newcome who has inadvertently (we must assume good faith) crossed a line in that way is hostile behaviour. I don't think it is necessary for somebody to explicitly say the magic words "I propose this as a change..." for it to be obvious that the OP was suggesting his "review" to initiate a discussion for ways Python should change. I don't know whether the OP has learned anything from his treatment here. But I know he wouldn't learn anything except that the Python community is closed-minded and unwelcoming if he had been greeted with silence. -- Steve From chris.barker at noaa.gov Tue Jan 10 22:39:42 2017 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Tue, 10 Jan 2017 19:39:42 -0800 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: References: <20170105171638.GA4217@phdru.name> <20170106044922.GM3887@ando.pearwood.info> Message-ID: <3443043241353663718@unknownmsgid> >> How common is this problem? > > Last 2 or 3 years, I don't recall having be bitten by such issue. We just got bitten by this on our CI server. Granted, we could fix it by properly configuring docker, but it would have been nice if it " just worked" -CHB From pavol.lisy at gmail.com Tue Jan 10 23:44:00 2017 From: pavol.lisy at gmail.com (Pavol Lisy) Date: Wed, 11 Jan 2017 05:44:00 +0100 Subject: [Python-ideas] How to respond to trolling In-Reply-To: <20170110233600.GQ3887@ando.pearwood.info> References: <20170110233600.GQ3887@ando.pearwood.info> Message-ID: On 1/11/17, Steven D'Aprano wrote: > On Tue, Jan 10, 2017 at 07:29:12AM -0800, Guido van Rossum wrote: >> Was it really necessary for all the usual folks on this list to engage >> with >> the "Python review" threads? I think a much more effective response would >> have been a resounding silence. > > Giving a newcomer the Silent Treatment because they've questioned some > undocumented set of features not open to change is not Open, Considerate > or Respectful (the CoC). Even if their ideas are ignorant or ill-thought > out, we must give them the benefit of the doubt and assume they are > making their comments in good faith rather than trolling. I think that in this case: not all(people) == any(people) So in my humble opinion it is good to say something by some (which reflect CoC culture of python community) and be silent like zen master (and most people did it) by others :) BTW. This discussion could be inspiring and we could prepare PEP where we could give some hints how to do (stupid?) things in unified and clever way! ;) For example: 1. if you want to have endblocks then use "# endfor" instead of "# for" 2. if you want to replace "self" by one letters then use ? (CANADIAN SYLLABICS FINAL GRAVE) see https://www.dcl.hpi.uni-potsdam.de/home/loewis/table-3131.html etc. Some editors could probably syntax highlight these constructs (and maybe hide dot too in case of replacing self :P) in future (at least in fancy mode :) ( Some of us could test idea through vi conceal feature for example to http://gnosis.cx/bin/.vim/after/syntax/python.vim insert something like syntax match pyNiceStatement "\ References: <20170110233600.GQ3887@ando.pearwood.info> Message-ID: <22645.49466.968872.108760@turnbull.sk.tsukuba.ac.jp> Steven D'Aprano writes: > Giving a newcomer the Silent Treatment because they've questioned some > undocumented set of features not open to change is not Open, Considerate > or Respectful (the CoC). Even if their ideas are ignorant or ill-thought > out, we must give them the benefit of the doubt and assume they are > making their comments in good faith rather than trolling. Honest question: do you think that response has to be done in public? (Whether Guido intended "private" as an alternative or not is a red herring, irrelevant to my question.) I would prefer answers at GitHub: https://github.com/python/overload-sig/issues/5. but that's up to respondents. (Will summarize responses privately and in other channels to that issue. This is an experiment for the Overload SIG: https://mail.python.org/mm3/mailman3/lists/overload-sig at python.org/.) Steve From turnbull.stephen.fw at u.tsukuba.ac.jp Wed Jan 11 00:23:24 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Wed, 11 Jan 2017 14:23:24 +0900 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: <3443043241353663718@unknownmsgid> References: <20170105171638.GA4217@phdru.name> <20170106044922.GM3887@ando.pearwood.info> <3443043241353663718@unknownmsgid> Message-ID: <22645.49484.967553.775552@turnbull.sk.tsukuba.ac.jp> Chris Barker - NOAA Federal writes: > >> How common is this problem? > > > > Last 2 or 3 years, I don't recall having be bitten by such issue. > > We just got bitten by this on our CI server. Granted, we could fix it > by properly configuring docker, but it would have been nice if it " > just worked" Of course. The question is not "should cb at noaa properly configure docker?", it's "Can docker properly configure docker (soon enough)? And if not, should we configure Python?" The third question depends on whether fixing it for you breaks things for others. From songofacandy at gmail.com Wed Jan 11 00:49:29 2017 From: songofacandy at gmail.com (INADA Naoki) Date: Wed, 11 Jan 2017 14:49:29 +0900 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: <22645.49484.967553.775552@turnbull.sk.tsukuba.ac.jp> References: <20170105171638.GA4217@phdru.name> <20170106044922.GM3887@ando.pearwood.info> <3443043241353663718@unknownmsgid> <22645.49484.967553.775552@turnbull.sk.tsukuba.ac.jp> Message-ID: > > Of course. The question is not "should cb at noaa properly configure > docker?", it's "Can docker properly configure docker (soon enough)? > And if not, should we configure Python?" The third question depends > on whether fixing it for you breaks things for others. When talking about general Docker image, using C locale is OK for most cases. In other words, images using C locale is properly configured. All of node.js, Ruby, Perl, Go and Rust application can talk UTF-8 in docker using C locale, without special configuration. Locale dependent application is very special in this area. From turnbull.stephen.fw at u.tsukuba.ac.jp Wed Jan 11 02:05:01 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Wed, 11 Jan 2017 16:05:01 +0900 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: References: <20170105171638.GA4217@phdru.name> <20170106044922.GM3887@ando.pearwood.info> <3443043241353663718@unknownmsgid> <22645.49484.967553.775552@turnbull.sk.tsukuba.ac.jp> Message-ID: <22645.55581.671261.687104@turnbull.sk.tsukuba.ac.jp> INADA Naoki writes: > When talking about general Docker image, using C locale is OK for > most cases. In other words, images using C locale is properly > configured. s/properly/compatibly/. "Proper" has strong connotations of "according to protocol". Configuring LC_CTYPE for ASCII expecting applications to say "You're lying!" and spew UTF-8 anyway is not "proper". That kind of thing makes me very nervous, and I think justifiably so. And it's only *sufficient* to justify a change to Python's defaults if Python checks for and accurately identifies when it's in a container. Anyway, I need to look more carefully at the actual PEPs and see if there's something concrete to worry about. But remember, we have about 18 months to chew over this if necessary -- I'm only asking for a few more days (until after the "cripple the minds of Japanese youth day", er, "University Admissions Center Examination" this weekend ;-). Steve From songofacandy at gmail.com Wed Jan 11 02:31:09 2017 From: songofacandy at gmail.com (INADA Naoki) Date: Wed, 11 Jan 2017 16:31:09 +0900 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: <22645.55581.671261.687104@turnbull.sk.tsukuba.ac.jp> References: <20170105171638.GA4217@phdru.name> <20170106044922.GM3887@ando.pearwood.info> <3443043241353663718@unknownmsgid> <22645.49484.967553.775552@turnbull.sk.tsukuba.ac.jp> <22645.55581.671261.687104@turnbull.sk.tsukuba.ac.jp> Message-ID: > > That kind of thing makes me very nervous, and I think justifiably so. > And it's only *sufficient* to justify a change to Python's defaults if > Python checks for and accurately identifies when it's in a container. > In my company, we use network boot servers. To reduce boot image, the image is built with minimalistic approach too. So there were only C locale in most of our servers, and many people in my company had bitten by this problem. I teach them to adding `export PYTHONIOENCODING=utf-8` in their .bashrc. But they had bitten again when using cron. So this problem is not only for docker container. Since UTF-8 dominated, many people including me use C locale to avoid unintentional behavior of commands seeing locale (sort, ls, date, bash, etc...). And use non C locale only for reading non English output from some command, like `man` or `hg help`. It's for i18n / l10n, but not for changing encoding. People live in UTF-8 world are never helped by changing encoding by locale. They are only bitten by the behavior. > Anyway, I need to look more carefully at the actual PEPs and see if > there's something concrete to worry about. But remember, we have > about 18 months to chew over this if necessary -- I'm only asking for > a few more days (until after the "cripple the minds of Japanese youth > day", er, "University Admissions Center Examination" this weekend ;-). > > Steve Off course. And both PEP doesn't propose default behavior except C locale. So there are 36+ months about changing default behavior. I hope 36 months is enough for people using legacy systems are moving to UTF-8 world. Regards, From songofacandy at gmail.com Wed Jan 11 03:17:46 2017 From: songofacandy at gmail.com (INADA Naoki) Date: Wed, 11 Jan 2017 17:17:46 +0900 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: References: <20170105171638.GA4217@phdru.name> <20170106044922.GM3887@ando.pearwood.info> <3443043241353663718@unknownmsgid> <22645.49484.967553.775552@turnbull.sk.tsukuba.ac.jp> <22645.55581.671261.687104@turnbull.sk.tsukuba.ac.jp> Message-ID: Here is one example of locale pitfall. --- # from http://unix.stackexchange.com/questions/169739/why-is-coreutils-sort-slower-than-python $ cat letters.py import string import random def main(): for _ in range(1_000_000): c = random.choice(string.ascii_letters) print(c) main() $ python3 letters.py > letters.txt $ LC_ALL=C time sort letters.txt > /dev/null 0.35 real 0.32 user 0.02 sys $ LC_ALL=C.UTF-8 time sort letters.txt > /dev/null 0.36 real 0.33 user 0.02 sys $ LC_ALL=ja_JP.UTF-8 time sort letters.txt > /dev/null 11.03 real 10.95 user 0.04 sys $ LC_ALL=en_US.UTF-8 time sort letters.txt > /dev/null 11.05 real 10.97 user 0.04 sys --- This is why some engineer including me use C locale on Linux, at least when there are no C.UTF-8 locale. Off course, we can use LC_CTYPE=en_US.UTF-8, instead of LANG or LC_ALL. (I wonder if we can use LC_CTYPE=UTF-8...) But I dislike current situation that "people should learn how to configure locale properly, and pitfall of non-C locale, only for using UTF-8 on Python". From turnbull.stephen.fw at u.tsukuba.ac.jp Wed Jan 11 04:36:08 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Wed, 11 Jan 2017 18:36:08 +0900 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: References: <20170105171638.GA4217@phdru.name> <20170106044922.GM3887@ando.pearwood.info> <3443043241353663718@unknownmsgid> <22645.49484.967553.775552@turnbull.sk.tsukuba.ac.jp> <22645.55581.671261.687104@turnbull.sk.tsukuba.ac.jp> Message-ID: <22645.64648.890661.647417@turnbull.sk.tsukuba.ac.jp> INADA Naoki writes: > Off course, we can use LC_CTYPE=en_US.UTF-8, instead of LANG or LC_ALL. You can also use LC_COLLATE=C. > (I wonder if we can use LC_CTYPE=UTF-8...) Syntactically incorrect: that means the language UTF-8. "LC_TYPE=.UTF-8" might work, but IIRC the language tag is required, the region and encoding are optional. Thus ja_JP, ja.UTF-8 are OK, but .UTF-8 is not. Rant follows: > But I dislike current situation that "people should learn how to > configure locale properly, and pitfall of non-C locale, only for > using UTF-8 on Python". You can use a distro that implements and defaults to the C.utf-8 locale, and presumably you'll be OK tomorrow, well before 3.7 gets released. (If there are no leftover mines in the field, I don't see a good reason to wait for 3.8 given the known deficiencies of the C locale and the precedent of PEPs 528/529.) Really, we're catering to users who won't set their locales properly and insist on old distros. For Debian, C.utf-8 was suggested in 2009[1], and that RFE refers to other distros that had already implemented it. I have all the sympathy in the world for them -- systems *should* Just Work -- but I'm going to lean against kludges if they mean punishing people who actually learn about and conform to applicable standards (and that includes well-motivated, properly- documented, and carefully-implemented platform-specific extensions), or use systems designed by developers who do.[2] Footnotes: [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=609306 [2] I know how bad standards can suck -- I'm a Mailman developer, looking at you RFC 561, er, 5322. While I'm all for nonconformism if you take responsibility for any disasters that result, developers who conform on behalf of their users are heroes. From songofacandy at gmail.com Wed Jan 11 05:15:43 2017 From: songofacandy at gmail.com (INADA Naoki) Date: Wed, 11 Jan 2017 19:15:43 +0900 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: <22645.64648.890661.647417@turnbull.sk.tsukuba.ac.jp> References: <20170105171638.GA4217@phdru.name> <20170106044922.GM3887@ando.pearwood.info> <3443043241353663718@unknownmsgid> <22645.49484.967553.775552@turnbull.sk.tsukuba.ac.jp> <22645.55581.671261.687104@turnbull.sk.tsukuba.ac.jp> <22645.64648.890661.647417@turnbull.sk.tsukuba.ac.jp> Message-ID: > > > (I wonder if we can use LC_CTYPE=UTF-8...) > > Syntactically incorrect: that means the language UTF-8. > "LC_TYPE=.UTF-8" might work, but IIRC the language tag is required, > the region and encoding are optional. Thus ja_JP, ja.UTF-8 are OK, > but .UTF-8 is not. I'm sorry. I know it, but I'm not good at English. I meant "I wish posix allowed LC_CTYPE=UTF-8 setting." It's just my desire. > > Rant follows: > > > But I dislike current situation that "people should learn how to > > configure locale properly, and pitfall of non-C locale, only for > > using UTF-8 on Python". > > You can use a distro that implements and defaults to the C.utf-8 > locale, and presumably you'll be OK tomorrow, well before 3.7 gets > released. Many people use new Python on legacy Linux which don't have C.UTF-8 locale. I learned how to configure locale for using UTF-8 on Python. But I don't want to force people to learn it, only for Python. > > Really, we're catering to users who won't set their locales properly > and insist on old distros. For Debian, C.utf-8 was suggested in > 2009[1], and that RFE refers to other distros that had already > implemented it. CentOS 7 (and RHEL 7, maybe) seems don't provide C.UTF-8 by default. It means C.UTF-8 is not "universal available" locale at least next 5 years. $ cat /etc/redhat-release CentOS Linux release 7.2.1511 (Core) $ locale -a | grep ^C C > I have all the sympathy in the world for them -- > systems *should* Just Work -- but I'm going to lean against kludges > if they mean punishing people who actually learn about and conform to > applicable standards (and that includes well-motivated, properly- > documented, and carefully-implemented platform-specific extensions), > or use systems designed by developers who do.[2] > > Footnotes: > [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=609306 > > [2] I know how bad standards can suck -- I'm a Mailman developer, > looking at you RFC 561, er, 5322. While I'm all for nonconformism if > you take responsibility for any disasters that result, developers who > conform on behalf of their users are heroes. From stephanh42 at gmail.com Wed Jan 11 05:46:24 2017 From: stephanh42 at gmail.com (Stephan Houben) Date: Wed, 11 Jan 2017 11:46:24 +0100 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: References: <20170105171638.GA4217@phdru.name> <20170106044922.GM3887@ando.pearwood.info> <3443043241353663718@unknownmsgid> <22645.49484.967553.775552@turnbull.sk.tsukuba.ac.jp> <22645.55581.671261.687104@turnbull.sk.tsukuba.ac.jp> Message-ID: Hi INADA Naoki, (Sorry, I am unsure if INADA or Naoki is your first name...) While I am very much in favour of everything working "out of the box", an issue is that we don't have control over external code (be it Python extensions or external processes invoked from Python). And that code will only look at LANG/LC_TYPE and ignore any cleverness we build into Python. For example, this may mean that a built-in Python string sort will give you a different ordering than invoking the external "sort" command. I have been bitten by this kind of issues, leading to spurious "diffs" if you try to use sorting to put strings into a canonical order. So my feeling is that people are ultimately not being helped by Python trying to be "nice", since they will be bitten by locale issues anyway. IMHO ultimately better to educate them to configure the locale. (I realise that people may reasonably disagree with this assessment ;-) ) I would then recommend to set to en_US.UTF-8, which is slower and less elegant but at least more widely supported. By the way, I know a bit how Node.js deals with locales, and it doesn't try to compensate for "C" locales either. But what it *does* do is that Node never uses the locale settings to determine the encoding of a file: you either have to specify it explicitly OR it defaults to UTF-8 (the latter on output only). So in this respect it is by specification immune against misconfiguration of the encoding. However, other stuff (e.g. date formatting) will still be influenced by the "C" locale as usual. Stephan 2017-01-11 9:17 GMT+01:00 INADA Naoki : > Here is one example of locale pitfall. > > --- > # from http://unix.stackexchange.com/questions/169739/why-is- > coreutils-sort-slower-than-python > > $ cat letters.py > import string > import random > > def main(): > for _ in range(1_000_000): > c = random.choice(string.ascii_letters) > print(c) > > main() > > $ python3 letters.py > letters.txt > > $ LC_ALL=C time sort letters.txt > /dev/null > 0.35 real 0.32 user 0.02 sys > > $ LC_ALL=C.UTF-8 time sort letters.txt > /dev/null > 0.36 real 0.33 user 0.02 sys > > $ LC_ALL=ja_JP.UTF-8 time sort letters.txt > /dev/null > 11.03 real 10.95 user 0.04 sys > > $ LC_ALL=en_US.UTF-8 time sort letters.txt > /dev/null > 11.05 real 10.97 user 0.04 sys > --- > > This is why some engineer including me use C locale on Linux, > at least when there are no C.UTF-8 locale. > > Off course, we can use LC_CTYPE=en_US.UTF-8, instead of LANG or LC_ALL. > (I wonder if we can use LC_CTYPE=UTF-8...) > > But I dislike current situation that "people should learn > how to configure locale properly, and pitfall of non-C locale, only for > using UTF-8 on Python". > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From encukou at gmail.com Wed Jan 11 06:22:57 2017 From: encukou at gmail.com (Petr Viktorin) Date: Wed, 11 Jan 2017 12:22:57 +0100 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: References: <20170105171638.GA4217@phdru.name> <20170106044922.GM3887@ando.pearwood.info> <3443043241353663718@unknownmsgid> <22645.49484.967553.775552@turnbull.sk.tsukuba.ac.jp> <22645.55581.671261.687104@turnbull.sk.tsukuba.ac.jp> Message-ID: On 01/11/2017 11:46 AM, Stephan Houben wrote: > Hi INADA Naoki, > > (Sorry, I am unsure if INADA or Naoki is your first name...) > > While I am very much in favour of everything working "out of the box", > an issue is that we don't have control over external code > (be it Python extensions or external processes invoked from Python). > > And that code will only look at LANG/LC_TYPE and ignore any cleverness > we build into Python. > > For example, this may mean that a built-in Python string sort will give you > a different ordering than invoking the external "sort" command. > I have been bitten by this kind of issues, leading to spurious "diffs" if > you try to use sorting to put strings into a canonical order. AFAIK, this would not be a problem under PEP 538, which effectively treats the "C" locale as "C.UTF-8". Strings of Unicode codepoints and the corresponding UTF-8-encoded bytes sort the same way. Is that wrong, or do you have a better example of trouble with using "C.UTF-8" instead of "C"? > So my feeling is that people are ultimately not being helped by > Python trying to be "nice", since they will be bitten by locale issues > anyway. IMHO ultimately better to educate them to configure the locale. > (I realise that people may reasonably disagree with this assessment ;-) ) > > I would then recommend to set to en_US.UTF-8, which is slower and > less elegant but at least more widely supported. What about the spurious diffs you'd get when switching from "C" to "en_US.UTF-8"? $ LC_ALL=en_US.UTF-8 sort file.txt a a A A $ LC_ALL=C sort file.txt A A a a > By the way, I know a bit how Node.js deals with locales, and it doesn't try > to compensate for "C" locales either. But what it *does* do is that > Node never uses the locale settings to determine the encoding of a file: > you either have to specify it explicitly OR it defaults to UTF-8 (the > latter on output only). > So in this respect it is by specification immune against > misconfiguration of the encoding. > However, other stuff (e.g. date formatting) will still be influenced by > the "C" locale > as usual. I believe the main problem is that the "C" locale really means two very different things: a) Text is encoded as 7-bit ASCII; higher codepoints are an error b) No encoding was specified In both cases, treating "C" as "C.UTF-8" is not bad: a) For 7-bit "text", there's no real difference between these locales b) UTF-8 is a much better default than ASCII From songofacandy at gmail.com Wed Jan 11 06:27:43 2017 From: songofacandy at gmail.com (INADA Naoki) Date: Wed, 11 Jan 2017 20:27:43 +0900 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: References: <20170105171638.GA4217@phdru.name> <20170106044922.GM3887@ando.pearwood.info> <3443043241353663718@unknownmsgid> <22645.49484.967553.775552@turnbull.sk.tsukuba.ac.jp> <22645.55581.671261.687104@turnbull.sk.tsukuba.ac.jp> Message-ID: On Wed, Jan 11, 2017 at 7:46 PM, Stephan Houben wrote: > Hi INADA Naoki, > > (Sorry, I am unsure if INADA or Naoki is your first name...) Never mind, I don't care about name ordering. (INADA is family name). > > While I am very much in favour of everything working "out of the box", > an issue is that we don't have control over external code > (be it Python extensions or external processes invoked from Python). > > And that code will only look at LANG/LC_TYPE and ignore any cleverness > we build into Python. > I'm sorry, could you give me more concrete example? My opinion is +1 to PEP 540, there should be an option to ignore locale setting. (And I hope it will be default setting in future version.) What is your concern? > For example, this may mean that a built-in Python string sort will give you > a different ordering than invoking the external "sort" command. > I have been bitten by this kind of issues, leading to spurious "diffs" if > you try to use sorting to put strings into a canonical order. > > So my feeling is that people are ultimately not being helped by > Python trying to be "nice", since they will be bitten by locale issues > anyway. IMHO ultimately better to educate them to configure the locale. > (I realise that people may reasonably disagree with this assessment ;-) ) > > I would then recommend to set to en_US.UTF-8, which is slower and > less elegant but at least more widely supported. But someone can't accept 30x slower only sorting ASCII text. At least, infrastructure engineer in my company loves C locale. New Python programmer (e.g. there are many data scientists learning Python) may want to work on Linux server, and learning about locale is not their concern. Web programmers are same. Just want to print UTF-8. Learning about locale may not worth enough for them. But I think there should be an option, and I want to use it. > > By the way, I know a bit how Node.js deals with locales, and it doesn't try > to compensate for "C" locales either. But what it *does* do is that > Node never uses the locale settings to determine the encoding of a file: > you either have to specify it explicitly OR it defaults to UTF-8 (the latter > on output only). > So in this respect it is by specification immune against misconfiguration of > the encoding. > However, other stuff (e.g. date formatting) will still be influenced by the > "C" locale > as usual. > > > Stephan > Yes. Both of PEP 538 and 540 is about encoding. I'm sorry about my misleading word "locale-free". There should be locale support for time formatting, at least UTF-8 locale. Regards, From chris.barker at noaa.gov Wed Jan 11 11:50:48 2017 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Wed, 11 Jan 2017 08:50:48 -0800 Subject: [Python-ideas] Python reviewed In-Reply-To: <350231e0-0ef0-46da-d0ff-e5e7999196a4@bigpond.com> References: <350231e0-0ef0-46da-d0ff-e5e7999196a4@bigpond.com> Message-ID: <-6326646733162116282@unknownmsgid> for range(1,1): means executing once to me. The indexing/slicing approach was designed for indexing and slicing. Then it made sense to have range() match. But range() is not part of the for construction. It is a convenience function for providing an iterable of integers. And you are welcome to write your own range-like iterable if you want. But if you want to look once, you'd use range(1), not range(1,2) anyway. Clear as day. And if you use: range(n, n+I), it is very clear that you will loop i times. s[:n] + s[n:] == s // doesn't work. I don't think it should work though Have you ever used a 1-based and closed-end indexing language that supported slicing? I have (matlab), and these kinds of constructions are really ugly and prone to error. It's not that you want to be able to divide a sequence and immediately put it back together, it's that you often want to do one thing with the first part of a sequence, and another with the second part, and you don't want them to overlap. len(s[:n]) == n // works len(s[:-n]) == n // rather independent but would still work if language is otherwise unchanged. len(s[n:i]) == i - n // doesn't work. Does it need to? It's not that it HAS to - it's that it's much less likely that you will make off by one errors if it does. -CHB -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Wed Jan 11 11:58:03 2017 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Wed, 11 Jan 2017 08:58:03 -0800 Subject: [Python-ideas] How to respond to trolling In-Reply-To: References: Message-ID: <-5099315068893620308@unknownmsgid> > the effect was dozens of messages with people falling over each other trying to engage the OP, Sure -- but all in one thread > The respondents should have known better. But we like to kibitz-- that's why (many of us) are on this list. Anyway, all (most anyway) of the points brought up are : A) not going to change B) have been justified / explained in multiple blog posts, wiki pages, and what have you. So perhaps the best response would be: "These are all fairly core Python design decisions -- do some googling to find out why." But it made me think that it would be good to have a single place that addresses these kinds of thing to point people to. There was the old "python warts" page, but this would be a "python features page" Maybe I'll start that if I find the roundtoits. Or, if it already exists -- someone please point me to it. -CHB From g.rodola at gmail.com Wed Jan 11 12:38:19 2017 From: g.rodola at gmail.com (Giampaolo Rodola') Date: Wed, 11 Jan 2017 18:38:19 +0100 Subject: [Python-ideas] api suggestions for the cProfile module In-Reply-To: References: Message-ID: On Wed, Dec 21, 2016 at 1:50 AM, Thane Brimhall wrote: > I use cProfile a lot, and would like to suggest three backwards-compatible > improvements to the API. > > 1: When using cProfile on a specific piece of code I often use the > enable() and disable() methods. It occurred to me that this would be an > obvious place to use a context manager. > I think this makes sense. I did that in https://bugs.python.org/issue9285 but unfortunately I got stuck and the issue remained stagnant. Signaling it here just in case somebody has some insights on how to proceed. > 2: Enhance the `print_stats` method on Profile to accept more options > currently available only through the pstats.Stats class. For example, > strip_dirs could be a boolean argument, and limit could accept an int. This > would reduce the number of cases you'd need to use the more complex API. > I'm not sure about this. I agree the current API is not the nicest one. I use a wrapper on top of cProfile which does this: stats = pstats.Stats(file.name) if strip_dirs: stats.strip_dirs() if isinstance(sort, (tuple, list)): stats.sort_stats(*sort) else: stats.sort_stats(sort) stats.print_stats(lines) With your proposal we would have 2 ways of doing the same thing and I'm not entirely sure that is good. 3: I often forget which string keys are available for sorting. It would be > nice to add an enum for these so a user could have their linter and IDE > check that value pre-runtime. Since it would subclass `str` and `Enum` it > would still work with all currently existing code. > > The current documentation contains the following code: > > import cProfile, pstats, io > pr = cProfile.Profile() > pr.enable() > # ... do something ... > pr.disable() > s = io.StringIO() > sortby = 'cumulative' > ps = pstats.Stats(pr, stream=s).sort_stats(sortby) > ps.print_stats() > print(s.getvalue()) > > While the code below doesn't exactly match the functionality above (eg. > not using StringIO), I envision the context manager working like this, > along with some adjustments on how to get the stats from the profiler: > > import cProfile, pstats > with cProfile.Profile() as pr: > # ... do something ... > pr.print_stats(sort=pstats.Sort.cumulative, limit=10, strip_dirs=True) > > As you can see, the code is shorter and somewhat more self-documenting. > The best thing about these suggestions is that as far as I can tell they > would be backwards-compatible API additions. > > What do you think? Thank you in advance for your time! > > /Thane > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Giampaolo - http://grodola.blogspot.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From xavier.combelle at gmail.com Wed Jan 11 15:02:35 2017 From: xavier.combelle at gmail.com (Xavier Combelle) Date: Wed, 11 Jan 2017 21:02:35 +0100 Subject: [Python-ideas] How to respond to trolling In-Reply-To: References: Message-ID: I did not read the thread, but it looks like the insult should be a red flag and a good time to stop immediately and baning the troll Le 10/01/2017 ? 22:58, Guido van Rossum a ?crit : > Whether the intent was to annoy or just to provoke, the effect was > dozens of messages with people falling over each other trying to > engage the OP, who clearly was ignorant of most language design issues > and uninterested in learning, and threw some insults in for good > measure. The respondents should have known better. > > -- > --Guido van Rossum (python.org/~guido ) > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ckaynor at zindagigames.com Wed Jan 11 16:04:54 2017 From: ckaynor at zindagigames.com (Chris Kaynor) Date: Wed, 11 Jan 2017 13:04:54 -0800 Subject: [Python-ideas] How to respond to trolling In-Reply-To: References: Message-ID: On Wed, Jan 11, 2017 at 12:02 PM, Xavier Combelle wrote: > I did not read the thread, but it looks like the insult should be a red flag > and a good time to stop immediately > and baning the troll Personally, when I read the original posting, there is quite a bit of it that comes across as arrogant and ignorant, but none that comes across as insulting. As such, I agree with some of the other replies to this thread: some reply was needed to the original thread. While the thread belonged on python-list, it was also not fully off-topic for python-ideas: while the wording was of a review of Python, and it was not worded as actually suggesting changes, it could be read as indirectly proposing changes or new features. As such, I feel that any reply to the thread should at least aim to point the poster to the correct forum (in this case, python-list), and it is not unreasonable to answer some of the points as though they are in fact suggesting changes - likely with links to the rational of the original decisions, or at least enough information for the original poster to fairly easily find such rationals themselves (eg a tutorial page or faq). From victor.stinner at gmail.com Wed Jan 11 17:15:56 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 11 Jan 2017 23:15:56 +0100 Subject: [Python-ideas] RFC: PEP 540 version 3 (Add a new UTF-8 mode) Message-ID: Hi, I also implemented my PEP 540, you can now test it! Use the latest patch attached to: http://bugs.python.org/issue29240 I made multiple changes since the first version of my PEP: * The UTF-8 Strict mode now only uses strict for inputs and outputs: it keeps surrogateescape for operating system data. Read the "Use the strict error handler for operating system data" alternative for the rationale. * The POSIX locale now enables the UTF-8 mode. See the "Don't modify the encoding of the POSIX locale" alternative for the rationale. * Specify the priority between -X utf8, PYTHONUTF8, PYTHONIOENCODING, etc. The PEP version 3 has a longer rationale with more example. IMHO the "List a directory into stdout" use case is the most representative case of "UNIX should just work" thing and encoding issues: https://www.python.org/dev/peps/pep-0540/#list-a-directory-into-stdout It reads data from the operating system (directory content) and writes it into an output (stdout). It combines two things which are similar but different in subtle ways. I included example with commands and their output to this use case, to have a more "real world" example instead of a long list of theorical things :-) Read the PEP 540 online (HTML): https://www.python.org/dev/peps/pep-0540/ Full text below. Victor PEP: 540 Title: Add a new UTF-8 mode Version: $Revision$ Last-Modified: $Date$ Author: Victor Stinner Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 5-January-2016 Python-Version: 3.7 Abstract ======== Add a new UTF-8 mode, disabled by default, to ignore the locale and force the usage of the UTF-8 encoding. Basically, the UTF-8 mode behaves as Python 2: it "just works" and don't bother users with encodings, but it can produce mojibake. The UTF-8 mode can be configured as strict to prevent mojibake. New ``-X utf8`` command line option and ``PYTHONUTF8`` environment variable are added to control the UTF-8 mode. The POSIX locale enables the UTF-8 mode. Rationale ========= "It's not a bug, you must fix your locale" is not an acceptable answer ---------------------------------------------------------------------- Since Python 3.0 was released in 2008, the usual answer to users getting Unicode errors is to ask developers to fix their code to handle Unicode properly. Most applications and Python modules were fixed, but users keep reporting Unicode errors regulary: see the long list of issues in the `Links`_ section below. In fact, a second class of bug comes from a locale which is not properly configured. The usual answer to such bug report is: "it is not a bug, you must fix your locale". Technically, the answer is correct, but from a practical point of view, the answer is not acceptable. In many cases, "fixing the issue" is an hard task. Moreover, sometimes, the usage of the POSIX locale is deliberate. A good example of a concrete issue are build systems which create a fresh environment for each build using a chroot, a container, a virtual machine or something else to get reproductible builds. Such setup usually uses the POSIX locale. To get 100% reproductible builds, the POSIX locale is a good choice: see the `Locales section of reproducible-builds.org `_. UNIX users don't expect Unicode errors, since the common command lines tools like ``cat``, ``grep`` or ``sed`` never fail with Unicode errors. These users expect that Python 3 "just works" with any locale and don't bother them with encodings. From their point of the view, the bug is not their locale but is obviously Python 3. Since Python 2 handles data as bytes, it's rarer in Python 2 compared to Python 3 to get Unicode errors. It also explains why users also perceive Python 3 as the root cause of their Unicode errors. Some users expect that Python 3 just works with any locale and so don't bother with mojibake, whereas some developers are working hard to prevent mojibake and so expect that Python 3 fails early before creating mojibake. Since different group of users have different expectations, there is no silver bullet which solves all issues at once. Last but not least, backward compatibility should be preserved whenever possible. Locale and operating system data -------------------------------- .. _operating system data: Python uses an encoding called the "filesystem encoding" to decide how to encode and decode data from/to the operating system: * file content * command line arguments: ``sys.argv`` * standard streams: ``sys.stdin``, ``sys.stdout``, ``sys.stderr`` * environment variables: ``os.environ`` * filenames: ``os.listdir(str)`` for example * pipes: ``subprocess.Popen`` using ``subprocess.PIPE`` for example * error messages: ``os.strerror(code)`` for example * user and terminal names: ``os``, ``grp`` and ``pwd`` modules * host name, UNIX socket path: see the ``socket`` module * etc. At startup, Python calls ``setlocale(LC_CTYPE, "")`` to use the user ``LC_CTYPE`` locale and then store the locale encoding as the "filesystem error". It's possible to get this encoding using ``sys.getfilesystemencoding()``. In the whole lifetime of a Python process, the same encoding and error handler are used to encode and decode data from/to the operating system. The ``os.fsdecode()`` and ``os.fsencode()`` functions can be used to decode and encode operating system data. These functions use the filesystem error handler: ``sys.getfilesystemencodeerrors()``. .. note:: In some corner case, the *current* ``LC_CTYPE`` locale must be used instead of ``sys.getfilesystemencoding()``. For example, the ``time`` module uses the *current* ``LC_CTYPE`` locale to decode timezone names. The POSIX locale and its encoding --------------------------------- The following environment variables are used to configure the locale, in this preference order: * ``LC_ALL``, most important variable * ``LC_CTYPE`` * ``LANG`` The POSIX locale,also known as "the C locale", is used: * if the first set variable is set to ``"C"`` * if all these variables are unset, for example when a program is started in an empty environment. The encoding of the POSIX locale must be ASCII or a superset of ASCII. On Linux, the POSIX locale uses the ASCII encoding. On FreeBSD and Solaris, ``nl_langinfo(CODESET)`` announces an alias of the ASCII encoding, whereas ``mbstowcs()`` and ``wcstombs()`` functions use the ISO 8859-1 encoding (Latin1) in practice. The problem is that ``os.fsencode()`` and ``os.fsdecode()`` use ``locale.getpreferredencoding()`` codec. For example, if command line arguments are decoded by ``mbstowcs()`` and encoded back by ``os.fsencode()``, an ``UnicodeEncodeError`` exception is raised instead of retrieving the original byte string. To fix this issue, Python checks since Python 3.4 if ``mbstowcs()`` really uses the ASCII encoding if the the ``LC_CTYPE`` uses the the POSIX locale and ``nl_langinfo(CODESET)`` returns ``"ASCII"`` (or an alias to ASCII). If not (the effective encoding is not ASCII), Python uses its own ASCII codec instead of using ``mbstowcs()`` and ``wcstombs()`` functions for `operating system data`_. See the `POSIX locale (2016 Edition) `_. POSIX locale used by mistake ---------------------------- In many cases, the POSIX locale is not really expected by users who get it by mistake. Examples: * program started in an empty environment * User forcing LANG=C to get messages in english * LANG=C used for bad reasons, without being aware of the ASCII encoding * SSH shell * Linux installed with no configured locale * chroot environment, Docker image, container, ... with no locale is configured * User locale set to a non-existing locale, typo in the locale name for example C.UTF-8 and C.utf8 locales -------------------------- Some UNIX operating systems provide a variant of the POSIX locale using the UTF-8 encoding: * Fedora 25: ``"C.utf8"`` or ``"C.UTF-8"`` * Debian (eglibc 2.13-1, 2011), Ubuntu: ``"C.UTF-8"`` * HP-UX: ``"C.utf8"`` It was proposed to add a ``C.UTF-8`` locale to the glibc: `glibc C.UTF-8 proposal `_. It is not planned to add such locale to BSD systems. Popularity of the UTF-8 encoding -------------------------------- Python 3 uses UTF-8 by default for Python source files. On Mac OS X, Windows and Android, Python always use UTF-8 for operating system data. For Windows, see the `PEP 529`_: "Change Windows filesystem encoding to UTF-8". On Linux, UTF-8 became the de facto standard encoding, replacing legacy encodings like ISO 8859-1 or ShiftJIS. For example, using different encodings for filenames and standard streams is likely to create mojibake, so UTF-8 is now used *everywhere*. The UTF-8 encoding is the default encoding of XML and JSON file format. In January 2017, UTF-8 was used in `more than 88% of web pages `_ (HTML, Javascript, CSS, etc.). See `utf8everywhere.org `_ for more general information on the UTF-8 codec. .. note:: Some applications and operating systems (especially Windows) use Byte Order Markers (BOM) to indicate the used Unicode encoding: UTF-7, UTF-8, UTF-16-LE, etc. BOM are not well supported and rarely used in Python. Old data stored in different encodings and surrogateescape ---------------------------------------------------------- Even if UTF-8 became the de facto standard, there are still systems in the wild which don't use UTF-8. And there are a lot of data stored in different encodings. For example, an old USB key using the ext3 filesystem with filenames encoded to ISO 8859-1. The Linux kernel and the libc don't decode filenames: a filename is used as a raw array of bytes. The common solution to support any filename is to store filenames as bytes and don't try to decode them. When displayed to stdout, mojibake is displayed if the filename and the terminal don't use the same encoding. Python 3 promotes Unicode everywhere including filenames. A solution to support filenames not decodable from the locale encoding was found: the ``surrogateescape`` error handler (`PEP 383`_), store undecodable bytes as surrogate characters. This error handler is used by default for `operating system data`_, by ``os.fsdecode()`` and ``os.fsencode()`` for example (except on Windows which uses the ``strict`` error handler). Standard streams ---------------- Python uses the locale encoding for standard streams: stdin, stdout and stderr. The ``strict`` error handler is used by stdin and stdout to prevent mojibake. The ``backslashreplace`` error handler is used by stderr to avoid Unicode encode error when displaying non-ASCII text. It is especially useful when the POSIX locale is used, because this locale usually uses the ASCII encoding. The problem is that `operating system data`_ like filenames are decoded using the ``surrogateescape`` error handler (`PEP 383`_). Displaying a filename to stdout raises a Unicode encode error if the filename contains an undecoded byte stored as a surrogate character. Python 3.6 now uses ``surrogateescape`` for stdin and stdout if the POSIX locale is used: `issue #19977 `_. The idea is to passthrough `operating system data`_ even if it means mojibake, because most UNIX applications work like that. Most UNIX applications store filenames as bytes, usually simply because bytes are first-citizen class in the used programming language, whereas Unicode is badly supported. .. note:: The encoding and/or the error handler of standard streams can be overriden with the ``PYTHONIOENCODING`` environment variable. Proposal ======== Changes ------- Add a new UTF-8 mode, disabled by default, to ignore the locale and force the usage of the UTF-8 encoding with the ``surrogateescape`` error handler, instead using the locale encoding (with ``strict`` or ``surrogateescape`` error handler depending on the case). Basically, the UTF-8 mode behaves as Python 2: it "just works" and don't bother users with encodings, but it can produce mojibake. It can be configured as strict to prevent mojibake: the UTF-8 encoding is used with the ``strict`` error handler for inputs and outputs, but the ``surrogateescape`` error handler is still used for `operating system data`_. New ``-X utf8`` command line option and ``PYTHONUTF8`` environment variable are added to control the UTF-8 mode. The UTF-8 mode is enabled by ``-X utf8`` or ``PYTHONUTF8=1``. The UTF-8 is configured as strict by ``-X utf8=strict`` or ``PYTHONUTF8=strict``. Other option values fail with an error. The POSIX locale enables the UTF-8 mode. In this case, the UTF-8 mode can be explicitly disabled by ``-X utf8=0`` or ``PYTHONUTF8=0``. The ``-X utf8`` has the priority over the ``PYTHONUTF8`` environment variable. For example, ``PYTHONUTF8=0 python3 -X utf8`` enables the UTF-8 mode. Encoding and error handler -------------------------- The UTF-8 mode changes the default encoding and error handler used by open(), os.fsdecode(), os.fsencode(), sys.stdin, sys.stdout and sys.stderr: ============================ ======================= ========================== ========================== Function Default UTF-8 or POSIX locale UTF-8 Strict ============================ ======================= ========================== ========================== open() locale/strict **UTF-8/surrogateescape** **UTF-8**/strict os.fsdecode(), os.fsencode() locale/surrogateescape **UTF-8**/surrogateescape **UTF-8**/surrogateescape sys.stdin, sys.stdout locale/strict **UTF-8/surrogateescape** **UTF-8**/strict sys.stderr locale/backslashreplace **UTF-8**/backslashreplace **UTF-8**/backslashreplace ============================ ======================= ========================== ========================== By comparison, Python 3.6 uses: ============================ ======================= ========================== Function Default POSIX locale ============================ ======================= ========================== open() locale/strict locale/strict os.fsdecode(), os.fsencode() locale/surrogateescape locale/surrogateescape sys.stdin, sys.stdout locale/strict locale/**surrogateescape** sys.stderr locale/backslashreplace locale/backslashreplace ============================ ======================= ========================== The UTF-8 mode uses the ``surrogateescape`` error handler instead of the strict mode for convenience: the idea is that data not encoded to UTF-8 are passed through "Python" without being modified, as raw bytes. The ``PYTHONIOENCODING`` environment variable has the priority on the UTF-8 mode for standard streams. For example, ``PYTHONIOENCODING=latin1 python3 -X utf8`` uses the Latin1 encoding for stdin, stdout and stderr. Encodings used by ``open()``, highest priority first: * *encoding* and *errors* parameters (if set) * UTF-8 mode * os.device_encoding(fd) * os.getpreferredencoding(False) Rationale --------- The UTF-8 mode is disabled by default to keep hard Unicode errors when encoding or decoding `operating system data`_ failed, and to keep the backward compatibility. The user is responsible to enable explicitly the UTF-8 mode, and so is better prepared for mojibake than if the UTF-8 mode would be enabled *by default*. The UTF-8 mode should be used on systems known to be configured with UTF-8 where most applications speak UTF-8. It prevents Unicode errors if the user overrides a locale *by mistake* or if a Python program is started with no locale configured (and so with the POSIX locale). Most UNIX applications handle `operating system data`_ as bytes, so ``LC_ALL``, ``LC_CTYPE`` and ``LANG`` environment variables have a limited impact on how these data are handled by the application. The Python UTF-8 mode should help to make Python more interoperable with the other UNIX applications in the system assuming that *UTF-8* is used everywhere and that users *expect* UTF-8. Ignoring ``LC_ALL``, ``LC_CTYPE`` and ``LANG`` environment variables in Python is more convenient, since they are more commonly misconfigured *by mistake* (configured to use an encoding different than UTF-8, whereas the system uses UTF-8), rather than being misconfigured by intent. Expected mojibake and surrogate character issues ------------------------------------------------ The UTF-8 mode only affects code running directly in Python, especially code written in pure Python. The other code, called "external code" here, is not aware of this mode. Examples: * C libraries called by Python modules like OpenSSL * The application code when Python is embedded in an application In the UTF-8 mode, Python uses the ``surrogateescape`` error handler which stores bytes not decodable from UTF-8 as surrogate characters. If the external code uses the locale and the locale encoding is UTF-8, it should work fine. External code using bytes ^^^^^^^^^^^^^^^^^^^^^^^^^ If the external code process data as bytes, surrogate characters are not an issue since they are only used inside Python. Python encodes back surrogate characters to bytes at the edges, before calling external code. The UTF-8 mode can produce mojibake since Python and external code don't both of invalid bytes, but it's a deliberate choice. The UTF-8 mode can be configured as strict to prevent mojibake and be fail early when data is not decodable from UTF-8 or not encodable to UTF-8. External code using text ^^^^^^^^^^^^^^^^^^^^^^^^ If the external code uses text API, for example using the ``wchar_t*`` C type, mojibake should not occur, but the external code can fail on surrogate characters. Use Cases ========= The following use cases were written to help to understand the impact of chosen encodings and error handlers on concrete examples. The "Always work" results were written to prove the benefit of having a UTF-8 mode which works with any data and any locale, compared to the existing old Python versions. The "Mojibake" column shows that ignoring the locale causes a pratical issue: the UTF-8 mode produces mojibake if the terminal doesn't use the UTF-8 encoding. List a directory into stdout ---------------------------- Script listing the content of the current directory into stdout:: import os for name in os.listdir(os.curdir): print(name) Result: ======================== ============ ========= Python Always work? Mojibake? ======================== ============ ========= Python 2 **Yes** **Yes** Python 3 No No Python 3.5, POSIX locale **Yes** **Yes** UTF-8 mode **Yes** **Yes** UTF-8 Strict mode No No ======================== ============ ========= "No" means that the script can fail on decoding or encoding a filename depending on the locale or the filename. To be able to always work, the program must be able to produce mojibake. Mojibake is more user friendly than an error with a truncated or empty output. Example with a directory which contains the file called ``b'xxx\xff'`` (the byte ``0xFF`` is invalid in UTF-8). Default and UTF-8 Strict mode fail on ``print()`` with an encode error:: $ python3.7 ../ls.py Traceback (most recent call last): File "../ls.py", line 5, in print(name) UnicodeEncodeError: 'utf-8' codec can't encode character '\udcff' ... $ python3.7 -X utf8=strict ../ls.py Traceback (most recent call last): File "../ls.py", line 5, in print(name) UnicodeEncodeError: 'utf-8' codec can't encode character '\udcff' ... The UTF-8 mode, POSIX locale, Python 2 and the UNIX ``ls`` command work but display mojibake:: $ python3.7 -X utf8 ../ls.py xxx? $ LC_ALL=C /python3.6 ../ls.py xxx? $ python2 ../ls.py xxx? $ ls 'xxx'$'\377' List a directory into a text file --------------------------------- Similar to the previous example, except that the listing is written into a text file:: import os names = os.listdir(os.curdir) with open("/tmp/content.txt", "w") as fp: for name in names: fp.write("%s\n" % name) Result: ======================== ============ ========= Python Always work? Mojibake? ======================== ============ ========= Python 2 **Yes** **Yes** Python 3 No No Python 3.5, POSIX locale No No UTF-8 mode **Yes** **Yes** UTF-8 Strict mode No No ======================== ============ ========= "Yes" involves that mojibake can be produced. "No" means that the script can fail on decoding or encoding a filename depending on the locale or the filename. Typical error:: $ LC_ALL=C python3 test.py Traceback (most recent call last): File "test.py", line 5, in fp.write("%s\n" % name) UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128) Display Unicode characters into stdout -------------------------------------- Very basic example used to illustrate a common issue, display the euro sign (U+20AC: ?):: print("euro: \u20ac") Result: ======================== ============ ========= Python Always work? Mojibake? ======================== ============ ========= Python 2 No No Python 3 No No Python 3.5, POSIX locale No No UTF-8 mode **Yes** **Yes** UTF-8 Strict mode **Yes** **Yes** ======================== ============ ========= The UTF-8 and UTF-8 Strict modes will always encode the euro sign as UTF-8. If the terminal uses a different encoding, we get mojibake. Replace a word in a text ------------------------ The following scripts replaces the word "apple" with "orange". It reads input from stdin and writes the output into stdout:: import sys text = sys.stdin.read() sys.stdout.write(text.replace("apple", "orange")) Result: ======================== ============ ========= Python Always work? Mojibake? ======================== ============ ========= Python 2 **Yes** **Yes** Python 3 No No Python 3.5, POSIX locale **Yes** **Yes** UTF-8 mode **Yes** **Yes** UTF-8 Strict mode No No ======================== ============ ========= Producer-consumer model using pipes ----------------------------------- Let's say that we have a "producer" program which writes data into its stdout and a "consumer" program which reads data from its stdin. On a shell, such programs are run with the command:: producer | consumer The question if these programs will work with any data and any locale. UNIX users don't expect Unicode errors, and so expect that such programs "just works". If the producer only produces ASCII output, no error should occur. Let's say the that producer writes at least one non-ASCII character (at least one byte in the range ``0x80..0xff``). To simplify the problem, let's say that the consumer has no output (don't write result into a file or stdout). A "Bytes producer" is an application which cannot fail with a Unicode error and produces bytes into stdout. Let's say that a "Bytes consumer" does not decode stdin but stores data as bytes: such consumer always work. Common UNIX command line tools like ``cat``, ``grep`` or ``sed`` are in this category. Many Python 2 applications are also in this category. "Python producer" and "Python consumer" are producer and consumer implemented in Python. Bytes producer, Bytes consumer ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ It always work, but it is out of the scope of this PEP since it doesn't involve Python. Python producer, Bytes consumer ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Python producer:: print("euro: \u20ac") Result: ======================== ============ ========= Python Always work? Mojibake? ======================== ============ ========= Python 2 No No Python 3 No No Python 3.5, POSIX locale No No UTF-8 mode **Yes** **Yes** UTF-8 Strict mode No No ======================== ============ ========= The question here is not if the consumer is able to decode the input, but if Python is able to produce its ouput. So it's similar to the `Display Unicode characters into stdout`_ case. UTF-8 modes work with any locale since the consumer doesn't try to decode its stdin. Bytes producer, Python consumer ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Python consumer:: import sys text = sys.stdin.read() result = text.replace("apple", "orange") # ignore the result Result: ======================== ============ ========= Python Always work? Mojibake? ======================== ============ ========= Python 2 **Yes** **Yes** Python 3 No No Python 3.5, POSIX locale **Yes** **Yes** UTF-8 mode **Yes** **Yes** UTF-8 Strict mode No No ======================== ============ ========= Python 3 fails on decoding stdin depending on the input and the locale. Python producer, Python consumer ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Python producer:: print("euro: \u20ac") Python consumer:: import sys text = sys.stdin.read() result = text.replace("apple", "orange") # ignore the result Result, same Python version used for the producer and the consumer: ======================== ============ ========= Python Always work? Mojibake? ======================== ============ ========= Python 2 No No Python 3 No No Python 3.5, POSIX locale No No UTF-8 mode **Yes** **Yes** UTF-8 Strict mode No No ======================== ============ ========= This case combines a Python producer with a Python consumer, so the result is the subset of `Python producer, Bytes consumer`_ and `Bytes producer, Python consumer`_. Backward Compatibility ====================== The main backward incompatible change is that the UTF-8 encoding is now used by default if the locale is POSIX. Since the UTF-8 encoding is used with the ``surrogateescape`` error handler, encoding errors should not occur and so the change should not break applications. The more likely source of trouble comes from external libraries. Python can decode successfully data from UTF-8, but a library using the locale encoding can fail to encode the decoded text back to bytes. Hopefully, encoding text in a library is a rare operation. Very few libraries expect text, most libraries expect bytes and even manipulate bytes internally. The PEP only changes the default behaviour if the locale is POSIX. For other locales, the *default* behaviour is unchanged. Alternatives ============ Don't modify the encoding of the POSIX locale --------------------------------------------- A first version of the PEP did not change the encoding and error handler used of the POSIX locale. The problem is that adding the ``-X utf8`` command line option or setting the ``PYTHONUTF8`` environment variable is not possible in some cases, or at least not convenient. Moreover, many users simply expect that Python 3 behaves as Python 2: don't bother them with encodings and "just works" in all cases. These users don't worry about mojibake, or even expect mojibake because of complex documents using multiple incompatibles encodings. Always use UTF-8 ---------------- Python already always use the UTF-8 encoding on Mac OS X, Android and Windows. Since UTF-8 became the de facto encoding, it makes sense to always use it on all platforms with any locale. The risk is to introduce mojibake if the locale uses a different encoding, especially for locales other than the POSIX locale. Force UTF-8 for the POSIX locale -------------------------------- An alternative to always using UTF-8 in any case is to only use UTF-8 when the ``LC_CTYPE`` locale is the POSIX locale. The `PEP 538`_ "Coercing the legacy C locale to C.UTF-8" of Nick Coghlan proposes to implement that using the ``C.UTF-8`` locale. Use the strict error handler for operating system data ------------------------------------------------------ Using the ``surrogateescape`` error handler for `operating system data`_ creates surprising surrogate characters. No Python codec (except of ``utf-7``) accept surrogates, and so encoding text coming from the operating system is likely to raise an error error. The problem is that the error comes late, very far from where the data was read. The ``strict`` error handler can be used instead to decode (``os.fsdecode()``) and encode (``os.fsencode()``) operating system data, to raise encoding errors as soon as possible. It helps to find bugs more quickly. The main drawback of this strategy is that it doesn't work in practice. Python 3 is designed on top on Unicode strings. Most functions expect Unicode and produce Unicode. Even if many operating system functions have two flavors, bytes and Unicode, the Unicode flavar is used is most cases. There are good reasons for that: Unicode is more convenient in Python 3 and using Unicode helps to support the full Unicode Character Set (UCS) on Windows (even if Python now uses UTF-8 since Python 3.6, see the `PEP 528`_ and the `PEP 529`_). For example, if ``os.fsdecode()`` uses ``utf8/strict``, ``os.listdir(str)`` fails to list filenames of a directory if a single filename is not decodable from UTF-8. As a consequence, ``shutil.rmtree(str)`` fails to remove a directory. Undecodable filenames, environment variables, etc. are simply too common to make this alternative viable. Links ===== PEPs: * `PEP 538 `_: "Coercing the legacy C locale to C.UTF-8" * `PEP 529 `_: "Change Windows filesystem encoding to UTF-8" * `PEP 528 `_: "Change Windows console encoding to UTF-8" * `PEP 383 `_: "Non-decodable Bytes in System Character Interfaces" Main Python issues: * `Issue #29240: Implementation of the PEP 540: Add a new UTF-8 mode `_ * `Issue #28180: sys.getfilesystemencoding() should default to utf-8 `_ * `Issue #19977: Use "surrogateescape" error handler for sys.stdin and sys.stdout on UNIX for the C locale `_ * `Issue #19847: Setting the default filesystem-encoding `_ * `Issue #8622: Add PYTHONFSENCODING environment variable `_: added but reverted because of many issues, read the `Inconsistencies if locale and filesystem encodings are different `_ thread on the python-dev mailing list Incomplete list of Python issues related to Unicode errors, especially with the POSIX locale: * 2016-12-22: `LANG=C python3 -c "import os; os.path.exists('\xff')" `_ * 2014-07-20: `issue #22016: Add a new 'surrogatereplace' output only error handler `_ * 2014-04-27: `Issue #21368: Check for systemd locale on startup if current locale is set to POSIX `_ -- read manually /etc/locale.conf when the locale is POSIX * 2014-01-21: `Issue #20329: zipfile.extractall fails in Posix shell with utf-8 filename `_ * 2013-11-30: `Issue #19846: Python 3 raises Unicode errors with the C locale `_ * 2010-05-04: `Issue #8610: Python3/POSIX: errors if file system encoding is None `_ * 2013-08-12: `Issue #18713: Clearly document the use of PYTHONIOENCODING to set surrogateescape `_ * 2013-09-27: `Issue #19100: Use backslashreplace in pprint `_ * 2012-01-05: `Issue #13717: os.walk() + print fails with UnicodeEncodeError `_ * 2011-12-20: `Issue #13643: 'ascii' is a bad filesystem default encoding `_ * 2011-03-16: `issue #11574: TextIOWrapper should use UTF-8 by default for the POSIX locale `_, thread on python-dev: `Low-Level Encoding Behavior on Python 3 `_ * 2010-04-26: `Issue #8533: regrtest: use backslashreplace error handler for stdout `_, regrtest fails with Unicode encode error if the locale is POSIX Some issues are real bug in applications which must set explicitly the encoding. Well, it just works in the common case (locale configured correctly), so what? But the program "suddenly" fails when the POSIX locale is used (probably for bad reasons). Such bug is not well understood by users. Example of such issue: * 2013-11-21: `pip: open() uses the locale encoding to parse Python script, instead of the encoding cookie `_ -- pip must use the encoding cookie to read a Python source code file * 2011-01-21: `IDLE 3.x can crash decoding recent file list `_ Prior Art ========= Perl has a ``-C`` command line option and a ``PERLUNICODE`` environment varaible to force UTF-8: see `perlrun `_. It is possible to configure UTF-8 per standard stream, on input and output streams, etc. Copyright ========= This document has been placed in the public domain. From victor.stinner at gmail.com Wed Jan 11 17:54:10 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 11 Jan 2017 23:54:10 +0100 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: References: <20170105171638.GA4217@phdru.name> <22638.64668.809353.547149@turnbull.sk.tsukuba.ac.jp> Message-ID: 2017-01-06 10:50 GMT+01:00 M.-A. Lemburg : > Victor: I think you are taking the UTF-8 idea a bit too far. > Nick was trying to address the situation where the locale is > set to "C", or rather not set at all (in which case the lib C > defaults to the "C" locale). The latter is a fairly standard > situation when piping data on Unix or when spawning processes > which don't inherit the current OS environment. My PEP 540 is different than Nick's PEP 538, even for the POSIX locale. I propose to always use the surrogateescape error handler, whereas Nick wants to keep the strict error handler for inputs and outputs. https://www.python.org/dev/peps/pep-0540/#encoding-and-error-handler The surrogateescape error handler is useful to write programs which work as pipes, as cat, grep, sed, ... UNIX program: https://www.python.org/dev/peps/pep-0540/#producer-consumer-model-using-pipes You can get the behaviour of Nick's PEP 538 using my UTF-8 Strict mode. Compare "UTF-8 mode" and "UTF-8 Strict mode" lines in the tables of my use case. The UTF-8 mode always works, but can produce mojibake, whereas UTF-8 Strict doesn't produce mojibake but can fail depending on data and the locale. IMHO most users prefers usability ("just work") over correctness (prevent mojibake). So Nick and me don't have exaclty the same scope and use cases. Victor From songofacandy at gmail.com Wed Jan 11 19:23:14 2017 From: songofacandy at gmail.com (INADA Naoki) Date: Thu, 12 Jan 2017 09:23:14 +0900 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: References: <20170105171638.GA4217@phdru.name> <22638.64668.809353.547149@turnbull.sk.tsukuba.ac.jp> Message-ID: > My PEP 540 is different than Nick's PEP 538, even for the POSIX > locale. I propose to always use the surrogateescape error handler, > whereas Nick wants to keep the strict error handler for inputs and > outputs. > https://www.python.org/dev/peps/pep-0540/#encoding-and-error-handler > > The surrogateescape error handler is useful to write programs which > work as pipes, as cat, grep, sed, ... UNIX program: > https://www.python.org/dev/peps/pep-0540/#producer-consumer-model-using-pipes > > You can get the behaviour of Nick's PEP 538 using my UTF-8 Strict > mode. Compare "UTF-8 mode" and "UTF-8 Strict mode" lines in the tables > of my use case. The UTF-8 mode always works, but can produce mojibake, > whereas UTF-8 Strict doesn't produce mojibake but can fail depending > on data and the locale. > > IMHO most users prefers usability ("just work") over correctness > (prevent mojibake). > I'm ?0 to surrogateescape by default. I feel +1 for stdout and -1 for stdin. In output case, surrogateescape is weaker than strict, but it only allows surrgateescaped binary. If program carefully use surrogateescaped decode, surrogateescape on stdout is safe enough. On the other hand, surrogateescape is very weak for input. It accepts arbitrary bytes. It should be used carefully. But I agree different encoding handler between stdin/stdout is not beautiful. That's why I'm ?0. FYI, when http://bugs.python.org/issue15216 is merged, we can change error handler easily: ``sys.stdout.set_encoding(errors='surrogateescape')`` So it's controllable from Python. Some program which handles filenames may prefer surrogateescape, and some program like CGI may prefer strict UTF-8 because JSON and HTML5 shouldn't contain arbitrary bytes. From steve at pearwood.info Wed Jan 11 20:05:19 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 12 Jan 2017 12:05:19 +1100 Subject: [Python-ideas] How to respond to trolling In-Reply-To: <22645.49466.968872.108760@turnbull.sk.tsukuba.ac.jp> References: <20170110233600.GQ3887@ando.pearwood.info> <22645.49466.968872.108760@turnbull.sk.tsukuba.ac.jp> Message-ID: <20170112010517.GR3887@ando.pearwood.info> On Wed, Jan 11, 2017 at 02:23:06PM +0900, Stephen J. Turnbull wrote: > Steven D'Aprano writes: > > > Giving a newcomer the Silent Treatment because they've questioned some > > undocumented set of features not open to change is not Open, Considerate > > or Respectful (the CoC). Even if their ideas are ignorant or ill-thought > > out, we must give them the benefit of the doubt and assume they are > > making their comments in good faith rather than trolling. > > Honest question: do you think that response has to be done in public? "Has to be done in public" in the sense of being mandatory? No. There are pros and cons to both public and private messaging. But in the sense of preferred, yes, I do think so. Private responses could be the idiosyncratic response of a single weirdo who doesn't speak for the community. Public responses that don't get contradicted demonstrate community aggreement, and offer the OP a way to engage if they are willing to ask questions, learn from the answers, and moderate their tone. -- Steve From steve at pearwood.info Wed Jan 11 21:37:41 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 12 Jan 2017 13:37:41 +1100 Subject: [Python-ideas] Things that won't change (proposed PEP) Message-ID: <20170112023740.GS3887@ando.pearwood.info> I have a proposal for an Informational PEP that lists things which will not change in Python. If accepted, it could be linked to from the signup page for the mailing list, and be the one obvious place to point newcomers to if they propose the same old cliches. Thoughts? * * * * * * * * * * PEP: XXX Title: Things that won't change in Python Version: $Revision$ Last-Modified: $Date$ Author: Steven D'Aprano Status: Draft Type: Informational Content-Type: text/x-rst Created: 11-Jan-2017 Post-History: 12-Jan-2017 Abstract ======== This PEP documents things which will not change in future versions of Python. Rationale ========= This PEP hopes to reduce the noise on `Python-Ideas `_ and other mailing lists. If you have a proposal for future Python development, and it requires changing one of the things listed here, it is dead in the water and has **no chance of being accepted**, either because the benefit is too little, the cost of changing the language (including backwards compatibility) is too high, or simply because it goes against the design preferred by the BDFL. Many of these things are already listed in the `FAQs `_. You should be familiar with both Python and the FAQs before proposing changes to the language. Just because something is not listed here does not necessarily mean that it will be changed. Each proposal will be weighed on its merits, costs compared to benefits. Sometimes the decision will come down to a matter of subjective taste, in which case the BDFL has the final say. Language Direction ================== Python 3 -------- This shouldn't need saying, but Python 3 will not be abandoned. Python 2.8 ---------- There will be `no official Python 2.8 `_ , although third parties are welcome to fork the language, backport Python 3 features, and maintain the hybrid themselves. Just don't call it "Python 2.8", or any other name which gives the impression that it is maintained by the official Python core developers. Type checking ------------- Duck-typing remains a fundamental part of Python and `type checking `_ will not be mandatory. Even if the Python interpreter someday gains a built-in type checker, it will remain optional. Syntax ====== Braces ------ It doesn't matter whether optional or mandatory, whether spelled ``{ }`` like in C, or ``BEGIN END`` like in Pascal, braces to delimit code blocks are not going to happen. For another perspective on this, try running ``from __future__ import braces`` at the interactive interpreter. (There is a *tiny* loophole here: multi-statement lambda, or Ruby-like code blocks have not been ruled out. Such a feature may require some sort of block delimiter -- but it won't be braces, as they clash with the syntax for dicts and sets.) Colons after statements that introduce a block ---------------------------------------------- Statements which introduce a code block, such as ``class``, ``def``, or ``if``, require a colon. Colons have been found to increase readability. See the `FAQ `_ for more detail. End statements -------------- Python does not use ``END`` statements following blocks. Given significant indentation, they are redundant and add noise to the source code. If you really want end markers, use a comment ``# end``. Explicit self ------------- Explicit ``self`` is a feature, not a bug. See the `FAQ `_ for more detail. Print function -------------- The ``print`` statement in Python 1 and 2 was a mistake that Guido long regretted. Now that it has been corrected in Python 3, it will not be reverted back to a statement. See `PEP 3105 `_ for more details. Significant indentation ----------------------- `Significant indentation `_ is a core feature of Python. Other syntax ------------ Python will not use ``$`` as syntax. Guido doesn't like it. (But it is okay to use ``$`` in DSLs like template strings and regular expressions.) Built-in Functions And Types ============================ Strings ------- Strings are `immutable `_ and represent Unicode code points, not bytes. Bools ----- ``bool`` is a subclass of ``int``, with ``True == 1`` and ``False == 0``. This is mostly for historical reasons, but the benefit of changing it now is too low to be worth breaking backwards compatibility and the enormous disruption it would cause. Built-in functions ------------------ Python is an object-oriented language, but it is not *purely* object-oriented. Not everything needs to be `a method of some object `_, and functions have their advantages. See the `FAQ `_ for more detail. Other Language Features ======================= The interactive interpreter --------------------------- The default prompt is ``>>> ``. Guido likes it that way. Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: From rosuav at gmail.com Wed Jan 11 21:52:08 2017 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 12 Jan 2017 13:52:08 +1100 Subject: [Python-ideas] Things that won't change (proposed PEP) In-Reply-To: <20170112023740.GS3887@ando.pearwood.info> References: <20170112023740.GS3887@ando.pearwood.info> Message-ID: On Thu, Jan 12, 2017 at 1:37 PM, Steven D'Aprano wrote: > I have a proposal for an Informational PEP that lists things which will > not change in Python. If accepted, it could be linked to from the signup > page for the mailing list, and be the one obvious place to point > newcomers to if they propose the same old cliches. > > +1. Sits in a similar place to PEP 3099; can some sort of appropriate/memorable number be picked for this? ChrisA From phd at phdru.name Wed Jan 11 22:17:55 2017 From: phd at phdru.name (Oleg Broytman) Date: Thu, 12 Jan 2017 04:17:55 +0100 Subject: [Python-ideas] Things that won't change (proposed PEP) In-Reply-To: <20170112023740.GS3887@ando.pearwood.info> References: <20170112023740.GS3887@ando.pearwood.info> Message-ID: <20170112031755.GA7523@phdru.name> On Thu, Jan 12, 2017 at 01:37:41PM +1100, Steven D'Aprano wrote: > Explicit self > ------------- > > Explicit ``self`` is a feature, not a bug. See the > `FAQ `_ > for more detail. If one thinks that ``self`` is too long and tedious to write she can use ``s`` instead. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From chris.barker at noaa.gov Wed Jan 11 23:28:31 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 11 Jan 2017 20:28:31 -0800 Subject: [Python-ideas] Things that won't change (proposed PEP) In-Reply-To: <20170112031755.GA7523@phdru.name> References: <20170112023740.GS3887@ando.pearwood.info> <20170112031755.GA7523@phdru.name> Message-ID: I think this is a fine idea, but I also think we could use a more verbose FAC: Frequently Asked Criticisms Some of the same things, but it would focus on the "why" of many of the issues. Many of the things people (newbies, mostly) complain about are simply taste, or legacy that isn't worth changing. But many are deliberate design decisions that were thoroughly thought out, and have been well explained in various places (i.e. zero-based indexing and open-on-the-right slicing). It would be good to have it all in one place. Maybe this PEP could be extended to include that, but it doesn't feel PEP-like to me. -CHB On Wed, Jan 11, 2017 at 7:17 PM, Oleg Broytman wrote: > On Thu, Jan 12, 2017 at 01:37:41PM +1100, Steven D'Aprano < > steve at pearwood.info> wrote: > > Explicit self > > ------------- > > > > Explicit ``self`` is a feature, not a bug. See the > > `FAQ be-used-explicitly-in-method-definitions-and-calls>`_ > > for more detail. > > If one thinks that ``self`` is too long and tedious to write she can > use ``s`` instead. > > Oleg. > -- > Oleg Broytman http://phdru.name/ phd at phdru.name > Programmers don't die, they just GOSUB without RETURN. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikhailwas at gmail.com Wed Jan 11 23:39:19 2017 From: mikhailwas at gmail.com (Mikhail V) Date: Thu, 12 Jan 2017 05:39:19 +0100 Subject: [Python-ideas] Things that won't change (proposed PEP) In-Reply-To: <20170112023740.GS3887@ando.pearwood.info> References: <20170112023740.GS3887@ando.pearwood.info> Message-ID: On 12 January 2017 at 03:37, Steven D'Aprano wrote: > I have a proposal for an Informational PEP that lists things which will > not change in Python. If accepted, it could be linked to from the signup > page for the mailing list, and be the one obvious place to point > newcomers to if they propose the same old cliches. > > Thoughts? > Excellent idea, I was going to ask about such list during my own attempts here. And my first though is about "will not change". Like: never ever change or like: will not change in 10 years or 20 years. And on this occassion, I'd look forward to some alternative informal proposals repository. For example something like "futuristic corner" for some unusual topics but potentially useful for future development. Now there is "informational PEP" section, but not very clear if it is more informal than normal PEPs or how would one go with merely marginal topics, including those things, which will most obviously not change. And probably your idea is exactly against this attitude, hard to say. Mikhail -------------- next part -------------- An HTML attachment was scrubbed... URL: From simon58500 at bigpond.com Wed Jan 11 23:13:03 2017 From: simon58500 at bigpond.com (Simon Lovell) Date: Thu, 12 Jan 2017 12:13:03 +0800 Subject: [Python-ideas] How to respond to trolling (Guido van Rossum) Message-ID: <31b49e4a-e33e-26e0-5cc1-32187a4cd639@bigpond.com> I feel I have to respond to this one. More than half of what I suggested could have and should be implemented. In particular the truthiness of non-boolean data and the lack of a reasonable SQL syntax. Several other points have been discussed endlessly on the internet but without a satisfactory (IMO) answer being given. I don't know what is meant by some insults having been thrown in. Calling truthiness of non boolean data "Ugly" is an insult? It is ugly. Yes, I should have double checked the chained assignment before posting and perhaps including some things which weren't changing added negative value. Regarding this comment. 'I use vim, which is very respectable, thank you. You'd like me to use "EditPlus 2" or equivalent', I think you should familiarise yourself with the "map!" function in vi and vim - put it in your .exrc file or .vimrc (vim only). e.g. "map! if if ^M ^Mendif". Regarding the with function, to those not familiar with what I was referring to that is a construct in Delphi and some other languages which works like this: ReallyLongFileDescriptor=open("file") with ReallyLongFileDescriptor: x=readline() // note the lack of "ReallyLongFileDescriptor." print x Delphi is even worse in that you can add more than one prefix in your with statement. Yes, you can put #endif at the end of every "if" statement etc. That requires a checker in the vein of the ccheck of yore to be enforced. These things aren't desirable. From pavol.lisy at gmail.com Wed Jan 11 23:56:28 2017 From: pavol.lisy at gmail.com (Pavol Lisy) Date: Thu, 12 Jan 2017 05:56:28 +0100 Subject: [Python-ideas] Things that won't change (proposed PEP) In-Reply-To: <20170112023740.GS3887@ando.pearwood.info> References: <20170112023740.GS3887@ando.pearwood.info> Message-ID: On 1/12/17, Steven D'Aprano wrote: > This shouldn't need saying, but Python 3 will not be abandoned. Except Python 4 would come. From chris.barker at noaa.gov Thu Jan 12 00:05:39 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 11 Jan 2017 21:05:39 -0800 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: References: <20170105171638.GA4217@phdru.name> <22638.64668.809353.547149@turnbull.sk.tsukuba.ac.jp> Message-ID: It seems to me that having a C locale can mean two things: 1) It really is meant to be ASCII 2) It's mis-configured (or un-configured), meaning the system encoding is unknown. if (2) then utf-8 is a fine default. if (2), then there are two options: 1) Everything on the sytsem really is ASCII -- in which case, utf-8 would "just work" -- no problem. 2) There are non-ascii file names, etc. on this supposedly ASCII system. In which case, do folks expect their Python programs to find these issues and raise errors? They may well expect that their Python program will not let them try to save a non ASCII filename, for instance. But I suspect that they wouldn't want it to raise an obscure encoding error -- but rather would want the app to do somethign friendly. So I see no downside to using utf-8 when the C locale is defined. -CHB On Wed, Jan 11, 2017 at 4:23 PM, INADA Naoki wrote: > > My PEP 540 is different than Nick's PEP 538, even for the POSIX > > locale. I propose to always use the surrogateescape error handler, > > whereas Nick wants to keep the strict error handler for inputs and > > outputs. > > https://www.python.org/dev/peps/pep-0540/#encoding-and-error-handler > > > > The surrogateescape error handler is useful to write programs which > > work as pipes, as cat, grep, sed, ... UNIX program: > > https://www.python.org/dev/peps/pep-0540/#producer- > consumer-model-using-pipes > > > > You can get the behaviour of Nick's PEP 538 using my UTF-8 Strict > > mode. Compare "UTF-8 mode" and "UTF-8 Strict mode" lines in the tables > > of my use case. The UTF-8 mode always works, but can produce mojibake, > > whereas UTF-8 Strict doesn't produce mojibake but can fail depending > > on data and the locale. > > > > IMHO most users prefers usability ("just work") over correctness > > (prevent mojibake). > > > > I'm ?0 to surrogateescape by default. I feel +1 for stdout and -1 for > stdin. > > In output case, surrogateescape is weaker than strict, but it only allows > surrgateescaped binary. If program carefully use surrogateescaped decode, > surrogateescape on stdout is safe enough. > > On the other hand, surrogateescape is very weak for input. It accepts > arbitrary bytes. > It should be used carefully. > > But I agree different encoding handler between stdin/stdout is not > beautiful. > That's why I'm ?0. > > > FYI, when http://bugs.python.org/issue15216 is merged, we can change > error handler easily: ``sys.stdout.set_encoding( > errors='surrogateescape')`` > > So it's controllable from Python. Some program which handles filenames may > prefer surrogateescape, and some program like CGI may prefer strict > UTF-8 because > JSON and HTML5 shouldn't contain arbitrary bytes. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From songofacandy at gmail.com Thu Jan 12 01:03:15 2017 From: songofacandy at gmail.com (INADA Naoki) Date: Thu, 12 Jan 2017 15:03:15 +0900 Subject: [Python-ideas] Things that won't change (proposed PEP) In-Reply-To: <20170112023740.GS3887@ando.pearwood.info> References: <20170112023740.GS3887@ando.pearwood.info> Message-ID: > Built-in functions > > ------------------ > > > > Python is an object-oriented language, but it is not *purely* > > object-oriented. Not everything needs to be `a method of some object `_, > > and functions have their advantages. See the > > `FAQ `_ > > for more detail. > I don't like this FAQ entry. See this issue: https://bugs.python.org/issue27671 From prometheus235 at gmail.com Thu Jan 12 01:42:46 2017 From: prometheus235 at gmail.com (Nick Timkovich) Date: Thu, 12 Jan 2017 00:42:46 -0600 Subject: [Python-ideas] Things that won't change (proposed PEP) In-Reply-To: References: <20170112023740.GS3887@ando.pearwood.info> Message-ID: Why mention sys.ps1 == '>>> ', is that some inside joke I'm unaware of? That is one of the easier things to modify (with a sitecustomize.py or whatever). On Thu, Jan 12, 2017 at 12:03 AM, INADA Naoki wrote: > > Built-in functions > > > > ------------------ > > > > > > > > Python is an object-oriented language, but it is not *purely* > > > > object-oriented. Not everything needs to be `a method of some object < > http://steve-yegge.blogspot.com.au/2006/03/execution-in- > kingdom-of-nouns.html>`_, > > > > and functions have their advantages. See the > > > > `FAQ python-use-methods-for-some-functionality-e-g-list-index- > but-functions-for-other-e-g-len-list>`_ > > > > for more detail. > > > > I don't like this FAQ entry. See this issue: https://bugs.python.org/ > issue27671 > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Jan 12 01:44:00 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 12 Jan 2017 16:44:00 +1000 Subject: [Python-ideas] RFC: PEP 540 version 3 (Add a new UTF-8 mode) In-Reply-To: References: Message-ID: On 12 January 2017 at 08:15, Victor Stinner wrote: > Hi, > > I also implemented my PEP 540, you can now test it! Use the latest > patch attached to: > > http://bugs.python.org/issue29240 > > > I made multiple changes since the first version of my PEP: > > * The UTF-8 Strict mode now only uses strict for inputs and outputs: > it keeps surrogateescape for operating system data. Read the "Use the > strict error handler for operating system data" alternative for the > rationale. > > * The POSIX locale now enables the UTF-8 mode. See the "Don't modify > the encoding of the POSIX locale" alternative for the rationale. > > * Specify the priority between -X utf8, PYTHONUTF8, PYTHONIOENCODING, etc. Thanks Victor, I really like this version, and the next time I update PEP 538 I'm going to replace the en_US.UTF-8 fallback in the current proposal with a dependency on this PEP. My one comment would be that in the summary tables, "Always works" isn't the right phrase to describe potentially corrupting text data instead of throwing an exception :) Instead, I think it would make sense to retitle that column as "Exception?" such that: * the ideal state is "No exception, no mojibake", which is what we'll now get when assuming (or forcing) UTF-8 is the correct thing to do, and will also continue to get when the locale is set appropriately (e.g. when handling GB18030 on Chinese systems) * the problematic behaviour of earlier Python 3.x versions was "Yes exception, no mojibake" when it assumed ASCII instead of UTF-8 * the problematic behaviour of Python 2.x in the specific examples given is "No exception, yes mojibake", and potentially even "Yes exception, yes mojibake" in cases where the implicit ASCII-based decoding could be encountered PEP 538 would then be a follow-on PEP that attempts to resolve the ASCII locale encoding problem not only for CPython itself, but also for any other C/C++ components sharing the same process, or launched in subprocesses that inherit the current environment. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Thu Jan 12 01:59:58 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 12 Jan 2017 16:59:58 +1000 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: <22645.55581.671261.687104@turnbull.sk.tsukuba.ac.jp> References: <20170105171638.GA4217@phdru.name> <20170106044922.GM3887@ando.pearwood.info> <3443043241353663718@unknownmsgid> <22645.49484.967553.775552@turnbull.sk.tsukuba.ac.jp> <22645.55581.671261.687104@turnbull.sk.tsukuba.ac.jp> Message-ID: On 11 January 2017 at 17:05, Stephen J. Turnbull wrote: > Anyway, I need to look more carefully at the actual PEPs and see if > there's something concrete to worry about. But remember, we have > about 18 months to chew over this if necessary FWIW, I'm hoping to backport whatever improved handling of the C locale that we agree on for Python 3.7+ to the initial system Python 3.6.0 release in Fedora 26 [1] - hence the section about redistributor backports in PEP 538. While the problems with the C locale have been known for a while, this latest attempt to do something about it started as an idea I had for a downstream Fedora-specific patch (which became PEP 538), while that PEP in turn served as motivation for Victor to write PEP 540 as an alternative approach that didn't depend on having the C.UTF-8 locale available. With the F26 Alpha at the end of February and the F26 Beta in late April, I'm hoping we can agree on a way forward without requiring months to make a decision :) > -- I'm only asking for a few more days Yeah, while I'd prefer not to see the discussions drag out indefinitely, there's also still plenty of time for folks to consider the PEPs closely and formulate their perspective. Cheers, Nick. [1] https://fedoraproject.org/wiki/Releases/26/Schedule -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From phd at phdru.name Thu Jan 12 02:04:38 2017 From: phd at phdru.name (Oleg Broytman) Date: Thu, 12 Jan 2017 08:04:38 +0100 Subject: [Python-ideas] Things that won't change (proposed PEP) In-Reply-To: References: <20170112023740.GS3887@ando.pearwood.info> Message-ID: <20170112070438.GA16425@phdru.name> On Thu, Jan 12, 2017 at 12:42:46AM -0600, Nick Timkovich wrote: > Why mention sys.ps1 == '>>> ', is that some inside joke I'm unaware of? > That is one of the easier things to modify (with a sitecustomize.py or > whatever). With PYTHONSTARTUP. > On Thu, Jan 12, 2017 at 12:03 AM, INADA Naoki > wrote: > > > > Built-in functions > > > > > > ------------------ > > > > > > > > > > > > Python is an object-oriented language, but it is not *purely* > > > > > > object-oriented. Not everything needs to be `a method of some object < > > http://steve-yegge.blogspot.com.au/2006/03/execution-in- > > kingdom-of-nouns.html>`_, > > > > > > and functions have their advantages. See the > > > > > > `FAQ > python-use-methods-for-some-functionality-e-g-list-index- > > but-functions-for-other-e-g-len-list>`_ > > > > > > for more detail. > > > > > > > I don't like this FAQ entry. See this issue: https://bugs.python.org/ > > issue27671 Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From greg.ewing at canterbury.ac.nz Thu Jan 12 01:48:08 2017 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 12 Jan 2017 19:48:08 +1300 Subject: [Python-ideas] Things that won't change (proposed PEP) In-Reply-To: References: <20170112023740.GS3887@ando.pearwood.info> <20170112031755.GA7523@phdru.name> Message-ID: <587726A8.60107@canterbury.ac.nz> Chris Barker wrote: > Frequently Asked Criticisms Doesn't quite make sense -- one doesn't "ask" criticisms. How about: FCLAP - Frequent Criticisms Levelled Against Python -- Greg From songofacandy at gmail.com Thu Jan 12 03:45:44 2017 From: songofacandy at gmail.com (INADA Naoki) Date: Thu, 12 Jan 2017 17:45:44 +0900 Subject: [Python-ideas] RFC: PEP 540 version 3 (Add a new UTF-8 mode) In-Reply-To: References: Message-ID: > Thanks Victor, I really like this version, and the next time I update > PEP 538 I'm going to replace the en_US.UTF-8 fallback in the current > proposal with a dependency on this PEP. > When using en_US.UTF-8 as fallback, pleas override only LC_CTYPE, instead of LC_ALL. As I described in other thread, LC_COLLATE may cause unintentional performance regression and behavior changes. From p.f.moore at gmail.com Thu Jan 12 03:54:16 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 12 Jan 2017 08:54:16 +0000 Subject: [Python-ideas] Things that won't change (proposed PEP) In-Reply-To: References: <20170112023740.GS3887@ando.pearwood.info> Message-ID: On 12 January 2017 at 04:39, Mikhail V wrote: > And my first though is about "will not change". Like: never ever change or > like: will not change in 10 years or 20 years. Like: "Please don't waste people's time trying to start a discussion about them". In 10 or 20 years, if opinions have changed, the PEP can be updated. Paul From george at fischhof.hu Thu Jan 12 04:05:47 2017 From: george at fischhof.hu (George Fischhof) Date: Thu, 12 Jan 2017 10:05:47 +0100 Subject: [Python-ideas] OS related file operations (copy, move, delete, rename...) should be placed into one module Message-ID: Hi There, OS related file operations (copy, move, delete, rename...) should be placed into one module... As it quite confusing that they are in two moduls (os and shutil). I have read that one is higher level than other, but actually to use them I have to think which function can be found in which module. It is confuse for beginners, and makes the usage more complex instead of make it more simple (as Zen of Python says ;-) ) An alias could good, not to cause incompatibility. Best regards, George -------------- next part -------------- An HTML attachment was scrubbed... URL: From z+py+pyideas at m0g.net Thu Jan 12 04:22:45 2017 From: z+py+pyideas at m0g.net (Guyzmo) Date: Thu, 12 Jan 2017 10:22:45 +0100 Subject: [Python-ideas] Simon's ideas [Was: How to respond to trolling (Guido van Rossum)] In-Reply-To: <31b49e4a-e33e-26e0-5cc1-32187a4cd639@bigpond.com> References: <31b49e4a-e33e-26e0-5cc1-32187a4cd639@bigpond.com> Message-ID: <20170112092245.lml7rfzz3p7rbtze@BuGz.eclipse.m0g.net> Hello Simon, I'm mostly lurking this mailing list and this is my first post, so hello everybody ?. On Thu, Jan 12, 2017 at 12:13:03PM +0800, Simon Lovell wrote: > I feel I have to respond to this one. This discussion hasn't much to do on this mailing list and it's only generating noise. Please, would be kind enough to keep discussing this on python-list (aka comp.lang.python) where it belongs? And eventually, once discussion settles on realistic changes that /can/ be added to python you might want to submit a PEP: http://legacy.python.org/dev/peps/pep-0001/ To quote the above linked document, I believe this applies to your situation: > Asking the Python community first if an idea is original helps prevent > too much time being spent on something that is guaranteed to be > rejected based on prior discussions (searching the internet does not > always do the trick). It also helps to make sure the idea is > applicable to the entire community and not just the author. And read 5 times the following part before posting here again: > Just because an idea sounds good to the author does not mean it will > work for most people in most areas where Python is used. Even though I can only believe this is not your intent, in the end it looks pretty clear that many people, including myself, are being annoyed by these threads making the signal/noise ratio of this list very low. As?I've read the original mail I knew it would end up in a low signal/noise ratio discussion because even I wanted to lecture you, Simon, about languages, grammar and compilers. Instead I killed the original thread (plonk!), as I find little interest in this discussion, but it keeps on respawning as some posters are breaking the threads. So please, be kind and have some netiquette. ? Thank you, -- Bernard `Guyzmo` Pratz From george at fischhof.hu Thu Jan 12 04:04:54 2017 From: george at fischhof.hu (George Fischhof) Date: Thu, 12 Jan 2017 10:04:54 +0100 Subject: [Python-ideas] Settable defaulting to decimal instead of float Message-ID: Hi There, Settable defaulting to decimal instead of float It would be good to be able to use decimal automatically instead of float if there is a setting. For example an environment variable or a flag file. Where and when accuracy is more important than speed, the user could set this flag, and calculate with decimal numbers as learned in the school. I think several people would use this function Best regards, George -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Thu Jan 12 05:30:04 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 12 Jan 2017 10:30:04 +0000 Subject: [Python-ideas] Settable defaulting to decimal instead of float In-Reply-To: References: Message-ID: On 12 January 2017 at 09:04, George Fischhof wrote: > Hi There, > > Settable defaulting to decimal instead of float > > It would be good to be able to use decimal automatically instead of float if > there is a setting. For example an environment variable or a flag file. > > Where and when accuracy is more important than speed, the user could set > this flag, and calculate with decimal numbers as learned in the school. > > I think several people would use this function > > Best regards, > George If by "defaulting" you mean having floating point literals default to decimal, this has been discussed before and would break a lot of code, so is realistically not going to happen. If that's not what you mean, pretty much everything else can be done in your code (use "Decimal()" in place of "float()", etc). It's unlikely that there's a practical suggestion here that hasn't been discussed before and rejected, but if you have something specific to suggest, then you'll have to clarify your proposal with a lot more detail. But I wouldn't bother unless you can also demonstrate that your proposal avoids breaking backward compatibility (and "users can choose whether to use it via a flag" isn't sufficient - it doesn't help library modules for instance) as it will have no chance of being accepted unless it's backward compatible. Regards, Paul From victor.stinner at gmail.com Thu Jan 12 05:28:33 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 12 Jan 2017 11:28:33 +0100 Subject: [Python-ideas] Settable defaulting to decimal instead of float In-Reply-To: References: Message-ID: George requested this feature on the bug tracker: http://bugs.python.org/issue29223 George was asked to start a discusson on this list. I posted the following comment before closing the issue: You are not the first one to propose the idea. 2012: "make decimal the default non-integer instead of float?" https://mail.python.org/pipermail/python-ideas/2012-September/016250.html 2014: "Python Numbers as Human Concept Decimal System" https://mail.python.org/pipermail/python-ideas/2014-March/026436.html Related discussions: 2008: "Decimal literal?" https://mail.python.org/pipermail/python-ideas/2008-December/002379.html 2015: "Python Float Update" https://mail.python.org/pipermail/python-ideas/2015-June/033787.html PEP 240 "Adding a Rational Literal to Python". Victor From mal at egenix.com Thu Jan 12 05:47:48 2017 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 12 Jan 2017 11:47:48 +0100 Subject: [Python-ideas] Settable defaulting to decimal instead of float In-Reply-To: References: Message-ID: On 12.01.2017 10:04, George Fischhof wrote: > Hi There, > > Settable defaulting to decimal instead of float > > It would be good to be able to use decimal automatically instead of float > if there is a setting. For example an environment variable or a flag file. > > Where and when accuracy is more important than speed, the user could set > this flag, and calculate with decimal numbers as learned in the school. > > I think several people would use this function I don't think having this configurable is a good idea. Too much code would break as a result and become unusable for people wanting to use such functionality, so it would be of limited value. The above is similar to what we had in Python 2 for enabling Unicode literals everywhere (the -U option). It was added as experiment at the time. No one used it due to the massive breakage it caused in the stdlib. Why not explicitly code for your use case ? -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jan 12 2017) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From p.f.moore at gmail.com Thu Jan 12 06:20:49 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 12 Jan 2017 11:20:49 +0000 Subject: [Python-ideas] Settable defaulting to decimal instead of float In-Reply-To: References: Message-ID: On 12 January 2017 at 10:28, Victor Stinner wrote: > George requested this feature on the bug tracker: > http://bugs.python.org/issue29223 > > George was asked to start a discusson on this list. I posted the > following comment before closing the issue: > > You are not the first one to propose the idea. OK, but without additional detail (for example, how would the proposed flag work, if the main module imports module A, then would float literals in A be decimal or binary? Both could be what the user wants) it's hard to comment. And as you say, most of this has been discussed before, so I'd like to see references back to the previous discussions in any proposal, with explanations of how the new proposal addresses the objections raised previously. Paul From ncoghlan at gmail.com Thu Jan 12 06:51:53 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 12 Jan 2017 21:51:53 +1000 Subject: [Python-ideas] RFC: PEP 540 version 3 (Add a new UTF-8 mode) In-Reply-To: References: Message-ID: On 12 January 2017 at 18:45, INADA Naoki wrote: >> Thanks Victor, I really like this version, and the next time I update >> PEP 538 I'm going to replace the en_US.UTF-8 fallback in the current >> proposal with a dependency on this PEP. > > When using en_US.UTF-8 as fallback, pleas override only LC_CTYPE, > instead of LC_ALL. Yep, "LC_CTYPE=en_US.UTF-8" is what the current version of PEP 538 has as a fallback if neither C.UTF-8 nor C.utf8 is available. However, that's also the part of PEP 538 that I think can just be dropped entirely and replaced with PEP 540's "assume UTF-8" behaviour instead. I was already considering that as an option when I last updated the PEP (see https://www.python.org/dev/peps/pep-0538/#relationship-with-other-peps ), and now that Victor actually has a working reference implementation for the PEP 540 approach, I think it's the right way to go. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From elazarg at gmail.com Thu Jan 12 06:59:33 2017 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Thu, 12 Jan 2017 11:59:33 +0000 Subject: [Python-ideas] Settable defaulting to decimal instead of float In-Reply-To: References: Message-ID: I think such proposals are special cases of a general theme: a compiler pragma, similar to "from __future__", to make Python support domain-specific syntax in the current file. Whether it's decimal literals or matrix/vector literals etc. I think it will be nice to make some tool, external to Python, that will allow defining such "sibling languages" (transpiled into Python) easily and uniformly. Elazar ?????? ??? ??, 12 ????' 2017, 13:21, ??? Paul Moore ?: > On 12 January 2017 at 10:28, Victor Stinner > wrote: > > George requested this feature on the bug tracker: > > http://bugs.python.org/issue29223 > > > > George was asked to start a discusson on this list. I posted the > > following comment before closing the issue: > > > > You are not the first one to propose the idea. > > OK, but without additional detail (for example, how would the proposed > flag work, if the main module imports module A, then would float > literals in A be decimal or binary? Both could be what the user wants) > it's hard to comment. And as you say, most of this has been discussed > before, so I'd like to see references back to the previous discussions > in any proposal, with explanations of how the new proposal addresses > the objections raised previously. > > Paul > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Jan 12 07:07:52 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 12 Jan 2017 22:07:52 +1000 Subject: [Python-ideas] Settable defaulting to decimal instead of float In-Reply-To: References: Message-ID: On 12 January 2017 at 20:30, Paul Moore wrote: > It's unlikely that there's a practical suggestion here that hasn't > been discussed before and rejected There's one practical decimal-literal-related suggestion which hasn't been rejected yet: adding a true decimal literal based on decimal128 semantics *without* configurable context support (so compile time constant folding can work normally rather than all operations needing to be deferred to runtime). Folks that wanted to fiddle with the context settings would still need to use decimal.Decimal objects, but there'd also be a readily available builtin base10 counterpart to the binary "float" type. As far as I know the main barrier to that approach is simply the lack of folks with the time, interest, and expertise needed to implement, review, and document it, rather than it being an objectionable proposal at the language design level. (There would be some concerns around potential confusion between when to use the default binary literals and when to use the decimal literals, but those concerns arise anyway - the discrepancies between binary and decimal arithmetic are just one of those unfortunate facts of computing at this point) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From stephanh42 at gmail.com Thu Jan 12 07:13:50 2017 From: stephanh42 at gmail.com (Stephan Houben) Date: Thu, 12 Jan 2017 13:13:50 +0100 Subject: [Python-ideas] Settable defaulting to decimal instead of float In-Reply-To: References: Message-ID: Something like: from __syntax__ import decimal_literal which would feed the rest of the file through the "decimal_literal" transpiler. (and not influence anything in other files). Not sure if you would want to support multiple transpilers per file. Note that Racket has something similar with their initial "#lang ..." directive. That only allows a single "language". Possibly wisely so. Stephan 2017-01-12 12:59 GMT+01:00 ????? : > I think such proposals are special cases of a general theme: a compiler > pragma, similar to "from __future__", to make Python support > domain-specific syntax in the current file. Whether it's decimal literals > or matrix/vector literals etc. > > I think it will be nice to make some tool, external to Python, that will > allow defining such "sibling languages" (transpiled into Python) easily and > uniformly. > > Elazar > > ?????? ??? ??, 12 ????' 2017, 13:21, ??? Paul Moore ? >: > >> On 12 January 2017 at 10:28, Victor Stinner >> wrote: >> > George requested this feature on the bug tracker: >> > http://bugs.python.org/issue29223 >> > >> > George was asked to start a discusson on this list. I posted the >> > following comment before closing the issue: >> > >> > You are not the first one to propose the idea. >> >> OK, but without additional detail (for example, how would the proposed >> flag work, if the main module imports module A, then would float >> literals in A be decimal or binary? Both could be what the user wants) >> it's hard to comment. And as you say, most of this has been discussed >> before, so I'd like to see references back to the previous discussions >> in any proposal, with explanations of how the new proposal addresses >> the objections raised previously. >> >> Paul >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Thu Jan 12 07:17:44 2017 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 12 Jan 2017 23:17:44 +1100 Subject: [Python-ideas] Settable defaulting to decimal instead of float In-Reply-To: References: Message-ID: On Thu, Jan 12, 2017 at 11:07 PM, Nick Coghlan wrote: > As far as I know the main barrier to that approach is simply the lack > of folks with the time, interest, and expertise needed to implement, > review, and document it, rather than it being an objectionable > proposal at the language design level. (There would be some concerns > around potential confusion between when to use the default binary > literals and when to use the decimal literals, but those concerns > arise anyway - the discrepancies between binary and decimal arithmetic > are just one of those unfortunate facts of computing at this point) Most of the time one of my students talks to me about decimal vs binary, they're thinking that a decimal literal (or converting the default non-integer literal to be decimal) is a panacea to the "0.1 + 0.2 != 0.3" problem. Perhaps the real solution is a written-up explanation of why binary floating point is actually a good thing, and not just a backward-compatibility requirement? ChrisA From flying-sheep at web.de Thu Jan 12 04:36:58 2017 From: flying-sheep at web.de (Philipp A.) Date: Thu, 12 Jan 2017 09:36:58 +0000 Subject: [Python-ideas] OS related file operations (copy, move, delete, rename...) should be placed into one module In-Reply-To: References: Message-ID: Hi George, While the old ?let?s treat strings as paths? modules are split up like you said, pathlib can do what they do and more: https://docs.python.org/3/library/pathlib.html It?s also prettier and easier to use, especially when using autocompletion (just type ?path.is? and see what you can test the path for) Best, Philipp George Fischhof schrieb am Do., 12. Jan. 2017 um 10:06 Uhr: > Hi There, > > OS related file operations (copy, move, delete, rename...) should be > placed into one module... > As it quite confusing that they are in two moduls (os and shutil). > > I have read that one is higher level than other, but actually to use them > I have to think which function can be found in which module. > > It is confuse for beginners, and makes the usage more complex instead of > make it more simple (as Zen of Python says ;-) ) > > An alias could good, not to cause incompatibility. > > Best regards, > George > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Thu Jan 12 07:20:55 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 12 Jan 2017 12:20:55 +0000 Subject: [Python-ideas] Settable defaulting to decimal instead of float In-Reply-To: References: Message-ID: On 12 January 2017 at 12:07, Nick Coghlan wrote: > On 12 January 2017 at 20:30, Paul Moore wrote: >> It's unlikely that there's a practical suggestion here that hasn't >> been discussed before and rejected > > There's one practical decimal-literal-related suggestion which hasn't > been rejected yet: adding a true decimal literal based on decimal128 > semantics *without* configurable context support (so compile time > constant folding can work normally rather than all operations needing > to be deferred to runtime). AIUI, this would be an explicit decimal literal, something like 1.5D, rather than simply making all *existing* float literals decimal. Which isn't what I understood the OP to be asking for. Anyway, the OP needs to clarify. Paul From stephanh42 at gmail.com Thu Jan 12 07:36:35 2017 From: stephanh42 at gmail.com (Stephan Houben) Date: Thu, 12 Jan 2017 13:36:35 +0100 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: References: <20170105171638.GA4217@phdru.name> <20170106044922.GM3887@ando.pearwood.info> <3443043241353663718@unknownmsgid> <22645.49484.967553.775552@turnbull.sk.tsukuba.ac.jp> <22645.55581.671261.687104@turnbull.sk.tsukuba.ac.jp> Message-ID: Hi Petr, 2017-01-11 12:22 GMT+01:00 Petr Viktorin : > > For example, this may mean that a built-in Python string sort will give you >> a different ordering than invoking the external "sort" command. >> I have been bitten by this kind of issues, leading to spurious "diffs" if >> you try to use sorting to put strings into a canonical order. >> > > AFAIK, this would not be a problem under PEP 538, which effectively treats > the "C" locale as "C.UTF-8". Strings of Unicode codepoints and the > corresponding UTF-8-encoded bytes sort the same way. > ...and this is also something new I learned. > > Is that wrong, or do you have a better example of trouble with using > "C.UTF-8" instead of "C"? After long deliberation, it seems I cannot find any source of trouble. +1 So my feeling is that people are ultimately not being helped by >> Python trying to be "nice", since they will be bitten by locale issues >> anyway. IMHO ultimately better to educate them to configure the locale. >> (I realise that people may reasonably disagree with this assessment ;-) ) >> >> I would then recommend to set to en_US.UTF-8, which is slower and >> less elegant but at least more widely supported. >> > > What about the spurious diffs you'd get when switching from "C" to > "en_US.UTF-8"? > That taught me to explicitly invoke "sort" using LANG=en_US.UTF-8 sort > > I believe the main problem is that the "C" locale really means two very > different things: > > a) Text is encoded as 7-bit ASCII; higher codepoints are an error > b) No encoding was specified > > In both cases, treating "C" as "C.UTF-8" is not bad: > a) For 7-bit "text", there's no real difference between these locales > b) UTF-8 is a much better default than ASCII > > A "C" locale also means that a program should not *output* non-ASCII characters, unless when explicitly being fed in (like in the case of "cat" or "sort" or the "ls" equivalent from PEP-540). So for example, a program might want to print fancy Unicode box characters to show progress, and check sys.stderr.encoding to see if it can do so. However, under a "C" locale it should not do so since for example the terminal is unlikely to display the fancy box characters properly. Note that the current PEP 540 proposal would be that sys.stderr is in UTF-8 /backslashreplace encoding under the "C" locale. I think this may be a minor concern ultimately, but it would be nice if we had some API to at least reliable answer the question "can I safely output non-ASCII myself?" Stephan -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephanh42 at gmail.com Thu Jan 12 07:50:49 2017 From: stephanh42 at gmail.com (Stephan Houben) Date: Thu, 12 Jan 2017 13:50:49 +0100 Subject: [Python-ideas] Settable defaulting to decimal instead of float In-Reply-To: References: Message-ID: Hi Chris, 2017-01-12 13:17 GMT+01:00 Chris Angelico : > > Most of the time one of my students talks to me about decimal vs > binary, they're thinking that a decimal literal (or converting the > default non-integer literal to be decimal) is a panacea to the "0.1 + > 0.2 != 0.3" problem. Indeed. Decimal also doesn't solve the 1/3 issue. I don't understand why people always talk about Decimal, if you want math to work "right" you probably want fractions. (With the understanding that this is for still limited value of "right".) > Perhaps the real solution is a written-up > explanation of why binary floating point is actually a good thing, and > not just a backward-compatibility requirement? > I have sometimes considered writing up "Why the aliens of Epsilon Eridani, whose computers use 13-valued logic, still use floating point numbers with base 2." (Short overview: analysis form first principles shows that the base should be: 1. an integral number > 1 and 2. as small as possible (to minmax the relative rounding error)) The list of candidate bases satisfying these criteria is: 2. Stephan > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From george at fischhof.hu Thu Jan 12 08:09:34 2017 From: george at fischhof.hu (George Fischhof) Date: Thu, 12 Jan 2017 14:09:34 +0100 Subject: [Python-ideas] Settable defaulting to decimal instead of float Message-ID: 2017-01-12 13:13 GMT+01:00 Stephan Houben : > Something like: > > from __syntax__ import decimal_literal > > which would feed the rest of the file through the "decimal_literal" > transpiler. > (and not influence anything in other files). > > Not sure if you would want to support multiple transpilers per file. > > Note that Racket has something similar with their initial "#lang ..." > directive. > That only allows a single "language". Possibly wisely so. > > Stephan > > > 2017-01-12 12:59 GMT+01:00 ????? : > >> I think such proposals are special cases of a general theme: a compiler >> pragma, similar to "from __future__", to make Python support >> domain-specific syntax in the current file. Whether it's decimal literals >> or matrix/vector literals etc. >> >> I think it will be nice to make some tool, external to Python, that will >> allow defining such "sibling languages" (transpiled into Python) easily and >> uniformly. >> >> Elazar >> >> ?????? ??? ??, 12 ????' 2017, 13:21, ??? Paul Moore ?> >: >> >>> On 12 January 2017 at 10:28, Victor Stinner >>> wrote: >>> > George requested this feature on the bug tracker: >>> > http://bugs.python.org/issue29223 >>> > >>> > George was asked to start a discusson on this list. I posted the >>> > following comment before closing the issue: >>> > >>> > You are not the first one to propose the idea. >>> >>> OK, but without additional detail (for example, how would the proposed >>> flag work, if the main module imports module A, then would float >>> literals in A be decimal or binary? Both could be what the user wants) >>> it's hard to comment. And as you say, most of this has been discussed >>> before, so I'd like to see references back to the previous discussions >>> in any proposal, with explanations of how the new proposal addresses >>> the objections raised previously. >>> >>> Paul >> >> Most of the time one of my students talks to me about decimal vs > binary, they're thinking that a decimal literal (or converting the > default non-integer literal to be decimal) is a panacea to the "0.1 + > 0.2 != 0.3" problem. Perhaps the real solution is a written-up > explanation of why binary floating point is actually a good thing, and > not just a backward-compatibility requirement? > > ChrisA from __future__ import use_decimal_instead_of_float or any other import would be very good. The most important thing in my point of view is that I do not want to convert every variable every time to decimal. Accuracy is important for me (yes, 0.1 + 0.2 should equal to 0.3 , no more, no less ;-) ) And if it is mentioned, I would like to ask why binary floating point is "better". It is faster, I agree, but why "better"? Actually I create test automation (I am a senior QA eng), the fastest test case runs for about 1-2 minutes. I do not know the exact time difference between binary and decimal arithmetic, but I do not care with this. My test would run some microseconds faster. It does not matter at a minute range. In the tests I calculate with numbers with 4 decimal digits, and I need exact match. ;-) Actually I have a new colleague, they did image analysis, and the calculated much calculations, and they used a library (not python) that is accurate as well. Becasue accuracy was more important for them as well. BR George -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Thu Jan 12 08:25:54 2017 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 13 Jan 2017 00:25:54 +1100 Subject: [Python-ideas] Settable defaulting to decimal instead of float In-Reply-To: References: Message-ID: On Thu, Jan 12, 2017 at 11:50 PM, Stephan Houben wrote: > 2017-01-12 13:17 GMT+01:00 Chris Angelico : >> >> Most of the time one of my students talks to me about decimal vs >> binary, they're thinking that a decimal literal (or converting the >> default non-integer literal to be decimal) is a panacea to the "0.1 + >> 0.2 != 0.3" problem. > > > Indeed. Decimal also doesn't solve the > 1/3 > issue. > > I don't understand why people always talk about Decimal, > if you want math to work "right" you probably want fractions. > > (With the understanding that this is for still limited value of "right".) My usual go-to response is that if you want perfect arithmetic with no rounding errors, your *ONLY* option is symbolic math, where sqrt(8) returns 2?2. It's pretty obvious that this gets unwieldy really fast, and rounding errors are a fact of life :) Rationals have their own problems (eg it's nearly impossible to eyeball them for size once they get big), and still don't solve everything else. >> Perhaps the real solution is a written-up >> explanation of why binary floating point is actually a good thing, and >> not just a backward-compatibility requirement? > > > I have sometimes considered writing up "Why the aliens of Epsilon Eridani, > whose computers use 13-valued logic, still use floating point numbers > with base 2." > > (Short overview: analysis form first principles shows that the base should > be: > 1. an integral number > 1 and > 2. as small as possible (to minmax the relative rounding error)) > > The list of candidate bases satisfying these criteria is: 2. > That's exactly the sort of thing I'm talking about. Among other things, only binary floating point guarantees that x <= (x+y)/2 <= y for any x <= y. (At least, I think only binary - I know decimal can't ensure that, and I haven't tested everything in between.) You're way more an expert on this than I am - my skill consists of reading what other people have written and echoing it to people :) ChrisA From rosuav at gmail.com Thu Jan 12 08:28:08 2017 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 13 Jan 2017 00:28:08 +1100 Subject: [Python-ideas] Settable defaulting to decimal instead of float In-Reply-To: References: Message-ID: On Fri, Jan 13, 2017 at 12:09 AM, George Fischhof wrote: > from __future__ import use_decimal_instead_of_float > or any other import would be very good. > The most important thing in my point of view is that I do not want to > convert every variable every time to decimal. > Accuracy is important for me (yes, 0.1 + 0.2 should equal to 0.3 , no more, > no less ;-) ) > > And if it is mentioned, I would like to ask why binary floating point is > "better". It is faster, I agree, but why "better"? > > Actually I create test automation (I am a senior QA eng), the fastest test > case runs for about 1-2 minutes. I do not know the exact time difference > between binary and decimal arithmetic, but I do not care with this. My test > would run some microseconds faster. It does not matter at a minute range. > > In the tests I calculate with numbers with 4 decimal digits, and I need > exact match. ;-) > > Actually I have a new colleague, they did image analysis, and the calculated > much calculations, and they used a library (not python) that is accurate as > well. Becasue accuracy was more important for them as well. Sounds like you want integer arithmetic. When you're working with something that goes to a known number of decimal places (eg money), it's often easiest and safest to redefine your unit (eg cents rather than dollars, or millimeters instead of meters) so you can use integers everywhere. Accurate AND fast! ChrisA From tjreedy at udel.edu Thu Jan 12 08:50:30 2017 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 12 Jan 2017 08:50:30 -0500 Subject: [Python-ideas] Settable defaulting to decimal instead of float In-Reply-To: References: Message-ID: On 1/12/2017 8:09 AM, George Fischhof wrote: > And if it is mentioned, I would like to ask why binary floating point is > "better". It is faster, I agree, but why "better"? Binary numbers are more evenly spread out. Consider successive two diget numbers .99, 1.0, 1.1. The difference betweem the first two is .01 and that between the next pair is .1, 10 times as large. This remains true for .999, 1.00, 1.01 or any other fixed number of digits. For binary floats, the gap size only doubles. When I used slide rules, which have about 3 digits of accuracy, some decades ago, this defect of decimal numbers was readily apparent. -- Terry Jan Reedy From george at fischhof.hu Thu Jan 12 09:36:45 2017 From: george at fischhof.hu (George Fischhof) Date: Thu, 12 Jan 2017 15:36:45 +0100 Subject: [Python-ideas] Settable defaulting to decimal instead of float In-Reply-To: References: Message-ID: Thank You, Terry George 2017-01-12 14:50 GMT+01:00 Terry Reedy : > On 1/12/2017 8:09 AM, George Fischhof wrote: > > And if it is mentioned, I would like to ask why binary floating point is >> "better". It is faster, I agree, but why "better"? >> > > Binary numbers are more evenly spread out. Consider successive two diget > numbers .99, 1.0, 1.1. The difference betweem the first two is .01 and > that between the next pair is .1, 10 times as large. This remains true for > .999, 1.00, 1.01 or any other fixed number of digits. For binary floats, > the gap size only doubles. When I used slide rules, which have about 3 > digits of accuracy, some decades ago, this defect of decimal numbers was > readily apparent. > > -- > Terry Jan Reedy > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Thu Jan 12 10:12:07 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 12 Jan 2017 16:12:07 +0100 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: References: <20170105171638.GA4217@phdru.name> <22638.64668.809353.547149@turnbull.sk.tsukuba.ac.jp> Message-ID: 2017-01-12 1:23 GMT+01:00 INADA Naoki : > I'm ?0 to surrogateescape by default. I feel +1 for stdout and -1 for stdin. The use case is to be able to write a Python 3 program which works work UNIX pipes without failing with encoding errors: https://www.python.org/dev/peps/pep-0540/#producer-consumer-model-using-pipes If you want something stricter, there is the UTF-8 Strict mode which prevent mojibake everywhere. I'm not sure that the UTF-8 Strict mode is really useful. When I implemented it, I quickly understood that using strict *everywhere* is just a deadend: it would fail in too many places. https://www.python.org/dev/peps/pep-0540/#use-the-strict-error-handler-for-operating-system-data I'm not even sure yet that a Python 3 with stdin using strict is "usable". > In output case, surrogateescape is weaker than strict, but it only allows > surrgateescaped binary. If program carefully use surrogateescaped decode, > surrogateescape on stdout is safe enough. What do you mean that "carefully use surrogateescaped decode"? The rationale for using surrogateescape on stdout is to support this use case: https://www.python.org/dev/peps/pep-0540/#list-a-directory-into-stdout > On the other hand, surrogateescape is very weak for input. It accepts > arbitrary bytes. > It should be used carefully. In my experience with the Python bug tracker, almost nobody understands Unicode and locales. For the "Producer-consumer model using pipes" use case, encoding issues of Python 3.6 can be a blocker issue. Some developers may prefer a different programming language which doesn't bother them with Unicode: basicall, *all* other programming languages, no? > But I agree different encoding handler between stdin/stdout is not beautiful. > That's why I'm ?0. That's why there are two modes: UTF-8 and UTF-8 Strict. But I'm not 100% sure yet, on which encodings and error handlers should be used ;-) I started to play with my PEP 540 implementation. I already had to update the PEP 540 and its implementation for Windows. On Windows, os.fsdecode/fsencode now uses surrogatepass, not surrogateescape (Python 3.5 uses strict on Windows). Victor From random832 at fastmail.com Thu Jan 12 10:25:40 2017 From: random832 at fastmail.com (Random832) Date: Thu, 12 Jan 2017 10:25:40 -0500 Subject: [Python-ideas] Settable defaulting to decimal instead of float In-Reply-To: References: Message-ID: <1484234740.1453970.845596392.54F1F93C@webmail.messagingengine.com> On Thu, Jan 12, 2017, at 06:20, Paul Moore wrote: > On 12 January 2017 at 10:28, Victor Stinner > wrote: > > George requested this feature on the bug tracker: > > http://bugs.python.org/issue29223 > > > > George was asked to start a discusson on this list. I posted the > > following comment before closing the issue: > > > > You are not the first one to propose the idea. > > OK, but without additional detail (for example, how would the proposed > flag work, if the main module imports module A, then would float > literals in A be decimal or binary? Both could be what the user wants) > it's hard to comment. And as you say, most of this has been discussed > before, so I'd like to see references back to the previous discussions > in any proposal, with explanations of how the new proposal addresses > the objections raised previously. Having them be decimal is impossible and it therefore hadn't even occurred to me it might be what the user wanted. Though what might be interesting would be to have a mode or variant of the language where *the float type* is, say, decimal128. The documentation carefully avoids guaranteeing any specific representation, or even that it is binary, and the existence of float_info.radix suggests that it may not always be the case. Implementing such a thing would be difficult, of course, and making it switchable at runtime would be even harder. Though since PyFloat_FromDouble/PyFloat_AsDouble would still necessarily be part of the API, it would simply be a case of "modules that are unaware that float may use C double for its internal representation may silently lose precision" Question: Is Py_AS_DOUBLE, which directly accesses a struct field, part of the stable ABI? If so, it may be necessary for every float object to continue carrying around a C double as an 'alternate representation'. Defining it unconditionally to have both representations would also make it somewhat easier to make the behavior a runtime switch, since it would simply change which value is considered authoritative, though it would make everything unconditionally slower as every time a float is constructed the alternate representation must be calculated. From victor.stinner at gmail.com Thu Jan 12 10:29:41 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 12 Jan 2017 16:29:41 +0100 Subject: [Python-ideas] RFC: PEP 540 version 3 (Add a new UTF-8 mode) In-Reply-To: References: Message-ID: 2017-01-12 9:45 GMT+01:00 INADA Naoki : > As I described in other thread, LC_COLLATE may cause unintentional performance > regression and behavior changes. Since Python 3 uses mostly text, not bytes, LC_COLLATE should not really impact Python applications. Locales set by setlocale() are not inherited by child processes. At least, I understand that setting the locale in Python doesn't impact the performance of child processes: -- import locale, subprocess locale.setlocale(locale.LC_COLLATE, "fr_FR.UTF-8") subprocess.call("sort long_text.txt > /dev/null", shell=True) -- But the LC_COLLATE locale can be used by C libraries called from Python through Python extensions implemented in C. Victor From victor.stinner at gmail.com Thu Jan 12 10:34:05 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 12 Jan 2017 16:34:05 +0100 Subject: [Python-ideas] Settable defaulting to decimal instead of float In-Reply-To: References: Message-ID: 2017-01-12 13:13 GMT+01:00 Stephan Houben : > Something like: > from __syntax__ import decimal_literal IMHO you can already implement that with a third party library, see for example: https://github.com/lihaoyi/macropy It also reminds me my PEP 511 which would open the gate for any kind of Python preprocessor :-) https://www.python.org/dev/peps/pep-0511/ Victor From turnbull.stephen.fw at u.tsukuba.ac.jp Thu Jan 12 10:44:41 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Fri, 13 Jan 2017 00:44:41 +0900 Subject: [Python-ideas] Things that won't change (proposed PEP) In-Reply-To: <587726A8.60107@canterbury.ac.nz> References: <20170112023740.GS3887@ando.pearwood.info> <20170112031755.GA7523@phdru.name> <587726A8.60107@canterbury.ac.nz> Message-ID: <22647.42089.114486.294200@turnbull.sk.tsukuba.ac.jp> Greg Ewing writes: > Chris Barker wrote: > > Frequently Asked Criticisms > > Doesn't quite make sense -- one doesn't "ask" criticisms. > > How about: > > FCLAP - Frequent Criticisms Levelled Against Python It reads better if you don't insist that they be frequent. (This may only play in America.) From turnbull.stephen.fw at u.tsukuba.ac.jp Thu Jan 12 10:45:06 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Fri, 13 Jan 2017 00:45:06 +0900 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: References: <20170105171638.GA4217@phdru.name> <20170106044922.GM3887@ando.pearwood.info> <3443043241353663718@unknownmsgid> <22645.49484.967553.775552@turnbull.sk.tsukuba.ac.jp> <22645.55581.671261.687104@turnbull.sk.tsukuba.ac.jp> Message-ID: <22647.42114.961750.899368@turnbull.sk.tsukuba.ac.jp> Stephan Houben writes: > I think this may be a minor concern ultimately, but it would be > nice if we had some API to at least reliable answer the question > "can I safely output non-ASCII myself?" You can't; stdout might be a TTY, pipe, or socket in which case you have no way to determine that. From p.f.moore at gmail.com Thu Jan 12 10:48:36 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 12 Jan 2017 15:48:36 +0000 Subject: [Python-ideas] Settable defaulting to decimal instead of float In-Reply-To: References: Message-ID: On 12 January 2017 at 15:34, Victor Stinner wrote: > 2017-01-12 13:13 GMT+01:00 Stephan Houben : >> Something like: >> from __syntax__ import decimal_literal > > IMHO you can already implement that with a third party library, see for example: > https://github.com/lihaoyi/macropy > > It also reminds me my PEP 511 which would open the gate for any kind > of Python preprocessor :-) > https://www.python.org/dev/peps/pep-0511/ PEP 302 (import hooks) pretty much did that years ago :-) Just write your own processor to translate a new filetype into bytecode, and register it as an import hook. There was a web framework that did that for templates not long after PEP 302 got implemented (can't recall the name any more). Paul From turnbull.stephen.fw at u.tsukuba.ac.jp Thu Jan 12 10:50:04 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Fri, 13 Jan 2017 00:50:04 +0900 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: References: <20170105171638.GA4217@phdru.name> <22638.64668.809353.547149@turnbull.sk.tsukuba.ac.jp> Message-ID: <22647.42412.563855.37358@turnbull.sk.tsukuba.ac.jp> Chris Barker writes: > 2) There are non-ascii file names, etc. on this supposedly ASCII system. In > which case, do folks expect their Python programs to find these issues and > raise errors? They may well expect that their Python program will not let > them try to save a non ASCII filename, for instance. Actually, IME, just like you, they expect it to DTRT, which for *them* is to save it in Shift-JIS or Alternativj or UTF-totally-whacked as their other programs do. > So I see no downside to using utf-8 when the C locale is defined. You don't have much incentive to look for one, and I doubt you have the experience of the edge cases (if you do, please correct me), so that does not surprise me. I'm not saying there are such cases here, I just want a little time to look harder. Steve From victor.stinner at gmail.com Thu Jan 12 10:25:56 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 12 Jan 2017 16:25:56 +0100 Subject: [Python-ideas] RFC: PEP 540 version 3 (Add a new UTF-8 mode) In-Reply-To: References: Message-ID: 2017-01-12 9:45 GMT+01:00 INADA Naoki : > When using en_US.UTF-8 as fallback, pleas override only LC_CTYPE, > instead of LC_ALL. > As I described in other thread, LC_COLLATE may cause unintentional performance > regression and behavior changes. Does it work to use a locale with encoding A for LC_CTYPE and a locale with encoding B for LC_MESSAGES (and others)? Is there a risk of mojibake? Or do we expect that the POSIX locale speaks ASCII, and so it should work for use UTF-8 for LC_CTYPE since UTF-8 is able to decode messages encoded ASCII? Victor From phd at phdru.name Thu Jan 12 11:10:01 2017 From: phd at phdru.name (Oleg Broytman) Date: Thu, 12 Jan 2017 17:10:01 +0100 Subject: [Python-ideas] RFC: PEP 540 version 3 (Add a new UTF-8 mode) In-Reply-To: References: Message-ID: <20170112161001.GA19119@phdru.name> On Thu, Jan 12, 2017 at 04:25:56PM +0100, Victor Stinner wrote: > 2017-01-12 9:45 GMT+01:00 INADA Naoki : > > When using en_US.UTF-8 as fallback, pleas override only LC_CTYPE, > > instead of LC_ALL. > > As I described in other thread, LC_COLLATE may cause unintentional performance > > regression and behavior changes. > > Does it work to use a locale with encoding A for LC_CTYPE and a locale > with encoding B for LC_MESSAGES (and others)? Is there a risk of It does when B is a subset of A (ascii and koi8; ascii and utf8, e.g.) > mojibake? Or do we expect that the POSIX locale speaks ASCII, and so > it should work for use UTF-8 for LC_CTYPE since UTF-8 is able to > decode messages encoded ASCII? That works for me: $ echo $LC_CTYPE ru_RU.UTF-8 $ echo $LC_COLLATE ru_RU.UTF-8 $ echo $LANG C $ date Thu Jan 12 19:06:13 MSK 2017 > Victor Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From ericfahlgren at gmail.com Thu Jan 12 11:16:09 2017 From: ericfahlgren at gmail.com (Eric Fahlgren) Date: Thu, 12 Jan 2017 08:16:09 -0800 Subject: [Python-ideas] Things that won't change (proposed PEP) In-Reply-To: References: <20170112023740.GS3887@ando.pearwood.info> <20170112031755.GA7523@phdru.name> Message-ID: On Wed, Jan 11, 2017 at 8:28 PM, Chris Barker wrote: > Many of the things people (newbies, mostly) complain about are simply > taste, or legacy that isn't worth changing. > ?A lot of that sort of thing is idiomatic, so I point people here and say "just do it that way and you'll be happier in the long run."? ? http://stupidpythonideas.blogspot.com/2015/05/why-following-idioms-matters.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Thu Jan 12 11:17:49 2017 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Thu, 12 Jan 2017 08:17:49 -0800 Subject: [Python-ideas] OS related file operations (copy, move, delete, rename...) should be placed into one module In-Reply-To: References: Message-ID: <1682891988816860097@unknownmsgid> I agree that this has been a bit of a wart for a long time. While the old ?let?s treat strings as paths? modules are split up like you said, pathlib can do what they do and more: https://docs.python.org/3/library/pathlib.html Exactly -- this is The Solution. It combines paths themselves with things you are likely to do with paths. It may well lack some nice features. If so, suggestions for that would be the way to go. The usefulness of pathlib has been hampered by the fact that path objects couldn't be used in many stdlib functions. However, that has been remedied in 3.6: - A new file system path protocol has been implemented to support path-like objects . All standard library functions operating on paths have been updated to work with the new protocol. So we have a nice way forward. -CHB It?s also prettier and easier to use, especially when using autocompletion (just type ?path.is? and see what you can test the path for) Best, Philipp George Fischhof schrieb am Do., 12. Jan. 2017 um 10:06 Uhr: > Hi There, > > OS related file operations (copy, move, delete, rename...) should be > placed into one module... > As it quite confusing that they are in two moduls (os and shutil). > > I have read that one is higher level than other, but actually to use them > I have to think which function can be found in which module. > > It is confuse for beginners, and makes the usage more complex instead of > make it more simple (as Zen of Python says ;-) ) > > An alias could good, not to cause incompatibility. > > Best regards, > George > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mehaase at gmail.com Thu Jan 12 11:55:41 2017 From: mehaase at gmail.com (Mark E. Haase) Date: Thu, 12 Jan 2017 11:55:41 -0500 Subject: [Python-ideas] Things that won't change (proposed PEP) In-Reply-To: References: <20170112023740.GS3887@ando.pearwood.info> <20170112031755.GA7523@phdru.name> Message-ID: I don't think an informational PEP would make threads like Python Review shorter and/or more productive. The OP clearly didn't do much research, so it seems unlikely he would read an informational PEP. Moreover, the bikeshedding about what goes into this PEP will inevitably lead to a troll who isn't satisfied with the explanation of a particular item, or notices that a particular item isn't included in the PEP, and then we're right back to the same problem: litigating Python complaints that have already been discussed many times on this list. We can't change everybody on the internet, but we might be able to change our own behavior. In that spirit, maybe we just need a canned reply that can be used when a thread has indicators of low quality: > Hi, this appears to be your first post to python-ideas. This purpose of this list is to discuss speculative language ideas for Python. If an idea gains traction, it can then be discussed and honed into a detailed proposal. Your post does not fit with the purpose of the list, either because it is too broad or because it doesn't contain enough technical details about your proposal. You may wish to improve your proposal by focusing on a single subject, researching historical conversations on that subject, and adding more technical details. Alternatively, you may wish to post on python-list[1] instead, which is a general purpose list that does not have the same constraints as this list. > > As a reminder to other list users, please do not encourage low-quality posts by engaging with them. > > 1. https://mail.python.org/mailman/listinfo/python-list Stack Overflow does something similar, where they have canned responses to low-quality questions. This makes it easy for the community to self-moderate in a respectful manner. On Thu, Jan 12, 2017 at 11:16 AM, Eric Fahlgren wrote: > > > On Wed, Jan 11, 2017 at 8:28 PM, Chris Barker > wrote: > >> Many of the things people (newbies, mostly) complain about are simply >> taste, or legacy that isn't worth changing. >> > > ?A lot of that sort of thing is idiomatic, so I point people here and say > "just do it that way and you'll be happier in the long run."? > > ?http://stupidpythonideas.blogspot.com/2015/05/why- > following-idioms-matters.html > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Thu Jan 12 12:10:35 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 12 Jan 2017 18:10:35 +0100 Subject: [Python-ideas] RFC: PEP 540 version 3 (Add a new UTF-8 mode) In-Reply-To: <20170112161001.GA19119@phdru.name> References: <20170112161001.GA19119@phdru.name> Message-ID: 2017-01-12 17:10 GMT+01:00 Oleg Broytman : >> Does it work to use a locale with encoding A for LC_CTYPE and a locale >> with encoding B for LC_MESSAGES (and others)? Is there a risk of > > It does when B is a subset of A (ascii and koi8; ascii and utf8, e.g.) My question is more when A and B encodings are not compatible. Ah yes, date, thank you for the example. Here is my example using LC_TIME locale to format a date and LC_CTYPE to decode a byte string: date.py: --- import locale, time locale.setlocale(locale.LC_ALL, "") b = time.strftime("%a") encoding=locale.getpreferredencoding() try: u = b.decode(encoding) except UnicodeError: u = '' else: u = repr(u) print("bytes: %r, text: %s, encoding: %r" % (b, u, encoding)) --- When all locales are the same, it works fine: ? (U+baa9) is the expected result $ LC_TIME=ko_KR.euckr LANG=ko_KR.euckr python2 date.py bytes: '\xb8\xf1', text: u'\ubaa9', encoding: 'EUC-KR' You get mojibake if LC_CTYPE uses the Latin1 encoding whereas LC_TIME uses the EUC-KR encoding: you get "??" (U+00b8, U+00f1). $ LC_TIME=ko_KR.euckr LANG=fr_FR python2 date.py bytes: '\xb8\xf1', text: u'\xb8\xf1', encoding: 'ISO-8859-1' The program can also fail with UnicodeDecodeError: $ LC_TIME=ko_KR.euckr LANG=fr_FR.UTF-8 python2 date.py bytes: '\xb8\xf1', text: , encoding: 'UTF-8' Well, since we are talking about the POSIX locale which usually uses ASCII, it shouldn't be an issue in practice for the PEP 538. I was just curious :-) Victor From chris.barker at noaa.gov Thu Jan 12 12:16:04 2017 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Thu, 12 Jan 2017 09:16:04 -0800 Subject: [Python-ideas] Settable defaulting to decimal instead of float In-Reply-To: References: Message-ID: <7565624482040125012@unknownmsgid> While I think some variation of: from __optional__ import decimal_literal Might be reasonable, I'd probably rather see something like: X = 1.1D However: (thank you Chris and Stephen) -- Decimal is NOT a panacea, nor any more "accurate" than binary floating point. It is still floating point, it is still imprecise, it still can't represent all rational numbers, even within a range. The ONLY advantage is that it gives people the warm and fuzzies because they are used to being able to represent 1/10 exactly, while not representing 1/3 exactly. But 1/10 is only special BECAUSE of Decimal representation itself. I actually think that the Decimal docs over-sell its usefulness. For instance, it really isn't more suitable for money than binary floating point if you round your outputs. Decimal does provide variable precision, which does help. With a 64 bit float, you lose cent precision around a trillion dollars. But that's a precision issue, not a binary vs Decimal issue. And a float128 would work fine for more money than I'll ever have to deal with! If you really want to do money right, you should use a money type that is exact and follows the appropriate accounting rules for rounding. Probably fixed point. (There are a couple money packages on pypi -- no idea if they are any good) In short: I'm wary of the idea that most people would be better off with Decimal. It's really a special purpose type, and I think it's better if the issues with floating point precision make themselves obvious sooner than later. -CHB Sorry for the top-post -- I hate this phone email client.... Sent from my iPhone > On Jan 12, 2017, at 7:49 AM, Paul Moore wrote: > >> On 12 January 2017 at 15:34, Victor Stinner wrote: >> 2017-01-12 13:13 GMT+01:00 Stephan Houben : >>> Something like: >>> from __syntax__ import decimal_literal >> >> IMHO you can already implement that with a third party library, see for example: >> https://github.com/lihaoyi/macropy >> >> It also reminds me my PEP 511 which would open the gate for any kind >> of Python preprocessor :-) >> https://www.python.org/dev/peps/pep-0511/ > > PEP 302 (import hooks) pretty much did that years ago :-) Just write > your own processor to translate a new filetype into bytecode, and > register it as an import hook. There was a web framework that did that > for templates not long after PEP 302 got implemented (can't recall the > name any more). > > Paul > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From phd at phdru.name Thu Jan 12 12:18:39 2017 From: phd at phdru.name (Oleg Broytman) Date: Thu, 12 Jan 2017 18:18:39 +0100 Subject: [Python-ideas] RFC: PEP 540 version 3 (Add a new UTF-8 mode) In-Reply-To: References: <20170112161001.GA19119@phdru.name> Message-ID: <20170112171839.GA22400@phdru.name> On Thu, Jan 12, 2017 at 06:10:35PM +0100, Victor Stinner wrote: > 2017-01-12 17:10 GMT+01:00 Oleg Broytman : > >> Does it work to use a locale with encoding A for LC_CTYPE and a locale > >> with encoding B for LC_MESSAGES (and others)? Is there a risk of > > > > It does when B is a subset of A (ascii and koi8; ascii and utf8, e.g.) > > My question is more when A and B encodings are not compatible. [skip time example] > Well, since we are talking about the POSIX locale which usually uses > ASCII, it shouldn't be an issue in practice for the PEP 538. I was > just curious :-) Of course you get mojibake. You can get mojibake even with compatible encodings: $ echo $LC_CTYPE ru_RU.KOI8-R $ LC_TIME=ru_RU.UTF-8 date ???? ?????? 12 20:14:08 MSK 2017 ^^^^^^^^^^^^^^^^^ mojibake! $ echo $LC_CTYPE ru_RU.UTF-8 $ LC_TIME=ru_RU.KOI8-R date ?? ??? 12 20:15:20 MSK 2017 ^^^^^^ mojibake! > Victor Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From chris.barker at noaa.gov Thu Jan 12 12:18:35 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 12 Jan 2017 09:18:35 -0800 Subject: [Python-ideas] Things that won't change (proposed PEP) In-Reply-To: <587726A8.60107@canterbury.ac.nz> References: <20170112023740.GS3887@ando.pearwood.info> <20170112031755.GA7523@phdru.name> <587726A8.60107@canterbury.ac.nz> Message-ID: On Wed, Jan 11, 2017 at 10:48 PM, Greg Ewing wrote: > Chris Barker wrote: > >> Frequently Asked Criticisms >> > > Doesn't quite make sense -- one doesn't "ask" criticisms. > I know, but I like that you can pronounce it the same a "FAQ" > FCLAP - Frequent Criticisms Levelled Against Python Sure -- the title is the least important bit.... -CHB > > -- > Greg > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Thu Jan 12 13:04:43 2017 From: random832 at fastmail.com (Random832) Date: Thu, 12 Jan 2017 13:04:43 -0500 Subject: [Python-ideas] RFC: PEP 540 version 3 (Add a new UTF-8 mode) In-Reply-To: References: <20170112161001.GA19119@phdru.name> Message-ID: <1484244283.1491998.845750680.3CCC838D@webmail.messagingengine.com> On Thu, Jan 12, 2017, at 12:10, Victor Stinner wrote: > 2017-01-12 17:10 GMT+01:00 Oleg Broytman : > >> Does it work to use a locale with encoding A for LC_CTYPE and a locale > >> with encoding B for LC_MESSAGES (and others)? Is there a risk of > > > > It does when B is a subset of A (ascii and koi8; ascii and utf8, e.g.) > > My question is more when A and B encodings are not compatible. > > Ah yes, date, thank you for the example. Here is my example using > LC_TIME locale to format a date and LC_CTYPE to decode a byte string: Time and messages seem to behave differently - everything I tested (including python 2 os.strerror) seems to ignore the LC_MESSAGES encoding and use the LC_CTYPE encoding, including resulting in a bunch of question marks when it's "C". From phd at phdru.name Thu Jan 12 13:13:40 2017 From: phd at phdru.name (Oleg Broytman) Date: Thu, 12 Jan 2017 19:13:40 +0100 Subject: [Python-ideas] RFC: PEP 540 version 3 (Add a new UTF-8 mode) In-Reply-To: <1484244283.1491998.845750680.3CCC838D@webmail.messagingengine.com> References: <20170112161001.GA19119@phdru.name> <1484244283.1491998.845750680.3CCC838D@webmail.messagingengine.com> Message-ID: <20170112181340.GA25123@phdru.name> On Thu, Jan 12, 2017 at 01:04:43PM -0500, Random832 wrote: > On Thu, Jan 12, 2017, at 12:10, Victor Stinner wrote: > > 2017-01-12 17:10 GMT+01:00 Oleg Broytman : > > >> Does it work to use a locale with encoding A for LC_CTYPE and a locale > > >> with encoding B for LC_MESSAGES (and others)? Is there a risk of > > > > > > It does when B is a subset of A (ascii and koi8; ascii and utf8, e.g.) > > > > My question is more when A and B encodings are not compatible. > > > > Ah yes, date, thank you for the example. Here is my example using > > LC_TIME locale to format a date and LC_CTYPE to decode a byte string: > > Time and messages seem to behave differently - everything I tested > (including python 2 os.strerror) seems to ignore the LC_MESSAGES > encoding and use the LC_CTYPE encoding, including resulting in a bunch > of question marks when it's "C". Works for me as expected: $ echo $LC_CTYPE ru_RU.KOI8-R $ LC_MESSAGES=ru_RU.KOI8-R mc mc speaks to me in Russian... $ LC_MESSAGES=C mc ...English. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From abrault at mapgears.com Thu Jan 12 13:17:56 2017 From: abrault at mapgears.com (Alexandre Brault) Date: Thu, 12 Jan 2017 13:17:56 -0500 Subject: [Python-ideas] Things that won't change (proposed PEP) In-Reply-To: References: <20170112023740.GS3887@ando.pearwood.info> <20170112031755.GA7523@phdru.name> <587726A8.60107@canterbury.ac.nz> Message-ID: <0272fe0c-dfb6-e4bf-100c-c535a261ba84@mapgears.com> Frequently Addressed Criticisms would solve both issues, imo Alex On 2017-01-12 12:18 PM, Chris Barker wrote: > On Wed, Jan 11, 2017 at 10:48 PM, Greg Ewing > > wrote: > > Chris Barker wrote: > > Frequently Asked Criticisms > > > Doesn't quite make sense -- one doesn't "ask" criticisms. > > > I know, but I like that you can pronounce it the same a "FAQ" > > > FCLAP - Frequent Criticisms Levelled Against Python > > > Sure -- the title is the least important bit.... > > -CHB > > > > > > -- > Greg > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > > > > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From toddrjen at gmail.com Thu Jan 12 14:09:24 2017 From: toddrjen at gmail.com (Todd) Date: Thu, 12 Jan 2017 14:09:24 -0500 Subject: [Python-ideas] Things that won't change (proposed PEP) In-Reply-To: <20170112023740.GS3887@ando.pearwood.info> References: <20170112023740.GS3887@ando.pearwood.info> Message-ID: On Wed, Jan 11, 2017 at 9:37 PM, Steven D'Aprano wrote: > I have a proposal for an Informational PEP that lists things which will > > not change in Python. If accepted, it could be linked to from the signup > > page for the mailing list, and be the one obvious place to point > > newcomers to if they propose the same old cliches. > > > > Thoughts? > > > > > > > > > > * * * * * * * * * * > > > > > > PEP: XXX > > Title: Things that won't change in Python > > Version: $Revision$ > > Last-Modified: $Date$ > > Author: Steven D'Aprano > > Status: Draft > > Type: Informational > > Content-Type: text/x-rst > > Created: 11-Jan-2017 > > Post-History: 12-Jan-2017 > > > > > > Abstract > > ======== > > > > This PEP documents things which will not change in future versions of > Python. > > > > > > Rationale > > ========= > > > > This PEP hopes to reduce the noise on `Python-Ideas < > https://mail.python.org/mailman/listinfo/python-ideas>`_ > > and other mailing lists. If you have a proposal for future Python > > development, and it requires changing one of the things listed here, it > > is dead in the water and has **no chance of being accepted**, either > because > > the benefit is too little, the cost of changing the language (including > > backwards compatibility) is too high, or simply because it goes against > > the design preferred by the BDFL. > > > > Many of these things are already listed in the `FAQs < > https://docs.python.org/3/faq/design.html>`_. > > You should be familiar with both Python and the FAQs before proposing > > changes to the language. > > > > Just because something is not listed here does not necessarily mean that > > it will be changed. Each proposal will be weighed on its merits, costs > > compared to benefits. Sometimes the decision will come down to a matter > > of subjective taste, in which case the BDFL has the final say. > > > > > > Language Direction > > ================== > > > > Python 3 > > -------- > > > > This shouldn't need saying, but Python 3 will not be abandoned. > > > > > > Python 2.8 > > ---------- > > > > There will be `no official Python 2.8 peps/pep-0404/>`_ , > > although third parties are welcome to fork the language, backport Python > > 3 features, and maintain the hybrid themselves. Just don't call it > > "Python 2.8", or any other name which gives the impression that it > > is maintained by the official Python core developers. > > > > > > Type checking > > ------------- > > > > Duck-typing remains a fundamental part of Python and `type checking < > https://www.python.org/dev/peps/pep-0484/#non-goals>`_ > > will not be mandatory. Even if the Python interpreter someday gains a > > built-in type checker, it will remain optional. > > > > > > Syntax > > ====== > > > > Braces > > ------ > > > > It doesn't matter whether optional or mandatory, whether spelled ``{ }`` > > like in C, or ``BEGIN END`` like in Pascal, braces to delimit code blocks > > are not going to happen. > > > > For another perspective on this, try running ``from __future__ import > braces`` > > at the interactive interpreter. > > > > (There is a *tiny* loophole here: multi-statement lambda, or Ruby-like code > > blocks have not been ruled out. Such a feature may require some sort of > > block delimiter -- but it won't be braces, as they clash with the syntax > > for dicts and sets.) > > > > > > Colons after statements that introduce a block > > ---------------------------------------------- > > > > Statements which introduce a code block, such as ``class``, ``def``, or > > ``if``, require a colon. Colons have been found to increase readability. > > See the `FAQ colons-required-for-the-if-while-def-class-statements>`_ > > for more detail. > > > > > > End statements > > -------------- > > > > Python does not use ``END`` statements following blocks. Given significant > > indentation, they are redundant and add noise to the source code. If you > > really want end markers, use a comment ``# end``. > > > > > > Explicit self > > ------------- > > > > Explicit ``self`` is a feature, not a bug. See the > > `FAQ be-used-explicitly-in-method-definitions-and-calls>`_ > > for more detail. > > > > > > Print function > > -------------- > > > > The ``print`` statement in Python 1 and 2 was a mistake that Guido long > > regretted. Now that it has been corrected in Python 3, it will not be > > reverted back to a statement. See `PEP 3105 peps/pep-3105/>`_ > > for more details. > > > > > > Significant indentation > > ----------------------- > > > > `Significant indentation faq/design.html#why-does-python-use-indentation-for-grouping-of-statements > >`_ > > is a core feature of Python. > > > > > > Other syntax > > ------------ > > > > Python will not use ``$`` as syntax. Guido doesn't like it. (But it > > is okay to use ``$`` in DSLs like template strings and regular > > expressions.) > > > > > > Built-in Functions And Types > > ============================ > > > > Strings > > ------- > > > > Strings are `immutable python-strings-immutable>`_ > > and represent Unicode code points, not bytes. > > > > > > Bools > > ----- > > > > ``bool`` is a subclass of ``int``, with ``True == 1`` and ``False == 0``. > > This is mostly for historical reasons, but the benefit of changing it now > > is too low to be worth breaking backwards compatibility and the enormous > > disruption it would cause. > > > > > > Built-in functions > > ------------------ > > > > Python is an object-oriented language, but it is not *purely* > > object-oriented. Not everything needs to be `a method of some object < > http://steve-yegge.blogspot.com.au/2006/03/execution-in- > kingdom-of-nouns.html>`_, > > and functions have their advantages. See the > > `FAQ python-use-methods-for-some-functionality-e-g-list-index- > but-functions-for-other-e-g-len-list>`_ > > for more detail. > > > > > > Other Language Features > > ======================= > > > > The interactive interpreter > > --------------------------- > > > > The default prompt is ``>>> ``. Guido likes it that way. > > > > > > Copyright > > ========= > > > > This document has been placed in the public domain. > > > > > > > > .. > > Local Variables: > > mode: indented-text > > indent-tabs-mode: nil > > sentence-end-double-space: t > > fill-column: 70 > > coding: utf-8 > > End: > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > I like the idea a lot. Some suggested changes: 1. In the introduction, I might add a sentence saying that flags or some other options to change these behaviors will also not be added. 2. I might combine the "braces" and "end statements" into a single section, perhaps "code blocks". At the very least I would have them adjacent to each other in the list. 3. I would add in the "explicit self" section that "self" will also not be made a keyword. 4. I think either adding a bit more detail about the rationale for the decisions, or perhaps better yet having an entry in the FAQ explaining the rationale for each of these decisions that is linked from here, would help avoid arguments (or maybe it could encourage arguments about the rationale, this is debatable). 5. I would add some examples for the "built-in functions" part, such as "length" and "del". 6. I don't think the "Strings" part makes sense to those not already familiar with unicode. I think a couple more sentences explaining the difference, and mentioning that the python 2 behavior is what won't be used, is necessary so such people understand what they should avoid suggesting. 7. I am not sure what "Python will not use ``$`` as syntax." means. Are you referring to a particular common use of "$", or that it won't be used at all for any reason? If the latter, I would add "?" to that as well. 8. In the "python 3" section, perhaps mention that "python 4" will be a continuation of python 3 and will be a major backwards-compatibility break like python 2 -> 3 was. Some sections I would suggest adding 1. A section about indexing, and include that the built-in "range" follows the indexing rules. You can mention that custom classes can use whatever indexing rules they want but breaking the python convention will make it very, very hard for others to use the class. You can also mention that users can also make a custom range function that follows whatever rules they like. 2. Assignment will never be allowed inside expressions. 3. From what I understand, Guido doesn't want a "range" literal. 4. Things like "range", "map", "dict.keys", "dict.values", and "dict.items" will never go back to the Python 2 behavior of returning lists. Also clarify that the python 3 versions do not return iterators, they return instances of special-purpose, read-only classes (views in the case of the "dict" methods). 5. "and" and "or" are short-circuiting operations that return one of the two values given. They will never be non-short-circuiting and they will never coerce returned values to boolean. 6. "xor" will not be added. 7. "if" will not be added to the ordinary "for" statement. 8. Options to change the behavior of built-in classes in incompatible ways will not be added (I may be wrong about this one, but I think it is a good rule). -------------- next part -------------- An HTML attachment was scrubbed... URL: From toddrjen at gmail.com Thu Jan 12 14:11:31 2017 From: toddrjen at gmail.com (Todd) Date: Thu, 12 Jan 2017 14:11:31 -0500 Subject: [Python-ideas] OS related file operations (copy, move, delete, rename...) should be placed into one module In-Reply-To: <1682891988816860097@unknownmsgid> References: <1682891988816860097@unknownmsgid> Message-ID: On Thu, Jan 12, 2017 at 11:17 AM, Chris Barker - NOAA Federal < chris.barker at noaa.gov> wrote: > I agree that this has been a bit of a wart for a long time. > > While the old ?let?s treat strings as paths? modules are split up like you > said, pathlib can do what they do and more: https://docs.python.org/ > 3/library/pathlib.html > > > Exactly -- this is The Solution. It combines paths themselves with things > you are likely to do with paths. > > It may well lack some nice features. If so, suggestions for that would be > the way to go. > Can such suggestions go here or should someone start a new thread? -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Thu Jan 12 14:33:41 2017 From: srkunze at mail.de (Sven R. Kunze) Date: Thu, 12 Jan 2017 20:33:41 +0100 Subject: [Python-ideas] Things that won't change (proposed PEP) In-Reply-To: <20170112023740.GS3887@ando.pearwood.info> References: <20170112023740.GS3887@ando.pearwood.info> Message-ID: Good evening to everybody, On 12.01.2017 03:37, Steven D'Aprano wrote: > I have a proposal for an Informational PEP that lists things which will > > not change in Python. If accepted, it could be linked to from the signup > > page for the mailing list, and be the one obvious place to point > > newcomers to if they propose the same old cliches. > > > > Thoughts? Let me first express my general thoughts on this topic. First of all, I am anti-censor and pro-change. So you can imagine that I am not overly excited to see such a document form for Python in general. If it's just to prevent spam on this mailing list and to reduce the signal/noise ratio here, I tend to agree with you if this document were to be attached to this mailing list only. I don't think Python as the umbrella term for a language, an ecosystem, etc. will benefit from preventing change and from banning thoughts no matter how strange they may seem first and to some people. Alright, after that's sorted out, I took my time to go through the list below in case the document will be accepted or made official in any form. So, please find my comment inserted there and some general thoughts at the very end. > > PEP: XXX > > Title: Things that won't change in Python This a very absolute-sounding title. Maybe inserting a "(most likely)" between "that" and "won't" will give it the right nudge. > > Version: $Revision$ > > Last-Modified: $Date$ > > Author: Steven D'Aprano > > Status: Draft > > Type: Informational > > Content-Type: text/x-rst > > Created: 11-Jan-2017 > > Post-History: 12-Jan-2017 > > > > > > Abstract > > ======== > > > > This PEP documents things which will not change in future versions of Python. Same "(most likely)" here. > > > > > > Rationale > > ========= > > > > This PEP hopes to reduce the noise on `Python-Ideas `_ > > and other mailing lists. If you have a proposal for future Python > > development, and it requires changing one of the things listed here, it > > is dead in the water and has **no chance of being accepted**, either because > > the benefit is too little, the cost of changing the language (including > > backwards compatibility) is too high, or simply because it goes against > > the design preferred by the BDFL. > > > > Many of these things are already listed in the `FAQs `_. > > You should be familiar with both Python and the FAQs before proposing > > changes to the language. > > > > Just because something is not listed here does not necessarily mean that > > it will be changed. Each proposal will be weighed on its merits, costs > > compared to benefits. Sometimes the decision will come down to a matter > > of subjective taste, in which case the BDFL has the final say. > I like these paragraphs. > > > > Language Direction > > ================== > > > > Python 3 > > -------- > > > > This shouldn't need saying, but Python 3 will not be abandoned. > > Don't think this section is necessary. It's more like a project management decision not a real change. > > > Python 2.8 > > ---------- > > > > There will be `no official Python 2.8 `_ , > > although third parties are welcome to fork the language, backport Python > > 3 features, and maintain the hybrid themselves. Just don't call it > > "Python 2.8", or any other name which gives the impression that it > > is maintained by the official Python core developers. > Same here. > Type checking > > ------------- > [...] Okay. > Syntax > > ====== > > > > Braces > > ------ > > [...] Okay but a bit long. Especially the loophole description plays against the intention of the document; which is natural because we talk about change here. So, nobody knows; and neither do the authors of this document. Not saying, the loophole description should be removed. It's a perfect summary of the current situation but shifts the focus of the document and dilutes its core message. > Colons after statements that introduce a block > > ---------------------------------------------- > > > > [...] Okay. > End statements > > -------------- > > [...] Okay. > Explicit self > > ------------- > > [...] Okay. > Print function > > -------------- > > [...] Works for me, although the newbies I know of definitely disagree here. ;-) > Significant indentation > > ----------------------- > > [...] Okay. > Other syntax > > ------------ > > [...] Okay. > Built-in Functions And Types > > ============================ > > Strings > > ------- > > > [...] Okay. > Bools > > ----- > > [...] Okay. > Built-in functions > > ------------------ > > > > Python is an object-oriented language, but it is not *purely* > > object-oriented. Not everything needs to be `a method of some object `_, > > and functions have their advantages. See the > > `FAQ `_ > > for more detail. This is about questioning built-in functions in general, right? That didn't come across fast. Maybe, something like this could help: "and functions (especially the built-in functions) have their advantages". > Other Language Features > > ======================= > > > > The interactive interpreter > > --------------------------- > > [...] Really? Who cares anyway? Nevermind, it's okay. > Copyright > > ========= > > > > This document has been placed in the public domain. > Generally speaking, I would rather describe this document as the "Guideline for Python-Ideas" which should be promoted upfront. I also get the impression that this document could use some conciseness and focus if you were to push it forward as "things that won't change". Otherwise, it just looks more like a guideline. (which I welcome as you can imagine) Furthermore, I still don't know if an informational PEP is the right platform this kind of document especially considering the title, the circumstances by which it was born and the purpose it is supposed to serve. Best regards, Sven From pavol.lisy at gmail.com Thu Jan 12 14:36:13 2017 From: pavol.lisy at gmail.com (Pavol Lisy) Date: Thu, 12 Jan 2017 20:36:13 +0100 Subject: [Python-ideas] Things that won't change (proposed PEP) In-Reply-To: References: <20170112023740.GS3887@ando.pearwood.info> Message-ID: On 1/12/17, Todd wrote: > 5. "and" and "or" are short-circuiting operations that return one of the > two values given. They will never be non-short-circuiting and they will > never coerce returned values to boolean. This one bring question how deep this PEP could be because we could show what is "impossible" to change but also how to satisfy some needs without changing language. & and | are non-short-circuiting and thanks to fact that boolean is subclass of int (see other section of this PEP!) we could use it nicely. From p.f.moore at gmail.com Thu Jan 12 14:48:46 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 12 Jan 2017 19:48:46 +0000 Subject: [Python-ideas] Things that won't change (proposed PEP) In-Reply-To: References: <20170112023740.GS3887@ando.pearwood.info> Message-ID: On 12 January 2017 at 19:33, Sven R. Kunze wrote: > This a very absolute-sounding title. Maybe inserting a "(most likely)" > between "that" and "won't" will give it the right nudge. Well, the whole point of the document is to stop people falsely hoping that they can "persuade" people to change things, so absolute statements seem appropriate here. Paul From toddrjen at gmail.com Thu Jan 12 14:51:53 2017 From: toddrjen at gmail.com (Todd) Date: Thu, 12 Jan 2017 14:51:53 -0500 Subject: [Python-ideas] Things that won't change (proposed PEP) In-Reply-To: References: <20170112023740.GS3887@ando.pearwood.info> Message-ID: On Thu, Jan 12, 2017 at 2:33 PM, Sven R. Kunze wrote: > Good evening to everybody, > > On 12.01.2017 03:37, Steven D'Aprano wrote: > >> I have a proposal for an Informational PEP that lists things which will >> >> not change in Python. If accepted, it could be linked to from the signup >> >> page for the mailing list, and be the one obvious place to point >> >> newcomers to if they propose the same old cliches. >> >> >> >> Thoughts? >> > > > Let me first express my general thoughts on this topic. > > First of all, I am anti-censor and pro-change. > > So you can imagine that I am not overly excited to see such a document > form for Python in general. If it's just to prevent spam on this mailing > list and to reduce the signal/noise ratio here, I tend to agree with you if > this document were to be attached to this mailing list only. > > I don't think Python as the umbrella term for a language, an ecosystem, > etc. will benefit from preventing change and from banning thoughts no > matter how strange they may seem first and to some people. > > > Alright, after that's sorted out, I took my time to go through the list > below in case the document will be accepted or made official in any form. > So, please find my comment inserted there and some general thoughts at the > very end. > > There is no "censorship" or "banning thoughts" going on here. Even with this PEP, people are free to think about and talk about how Python could work differently all they want. What this PEP does is tell them that certain decisions have been made about how the Python language is going to work, so they should be aware that such talk isn't going to actually result in any changes to the language. It is a matter about being honest and realistic about what Python is and is not, about what parts of the language are considered defining features. No one will be banned from telling python developers they think one of these features should be changed, but they can know ahead of time that such requests won't be productive and will be able to include that information in their decision about how much time to spend on such requests. So I think the importance of this PEP extend beyond just the python-ideas list. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikhailwas at gmail.com Thu Jan 12 16:48:52 2017 From: mikhailwas at gmail.com (Mikhail V) Date: Thu, 12 Jan 2017 22:48:52 +0100 Subject: [Python-ideas] Things that won't change (proposed PEP) In-Reply-To: References: <20170112023740.GS3887@ando.pearwood.info> Message-ID: On 12 January 2017 at 20:09, Todd wrote: > > 4. I think either adding a bit more detail about the rationale for the > decisions, > Would be nice. But then someone must tinker with it. 7. I am not sure what "Python will not use ``$`` as syntax." means. Are > you referring to a particular common use of "$", or that it won't be used > at all for any reason? If the latter, I would add "?" to that as well. > I think this means that $ sign does make the code look ugly and distracts from reading. So it is avoided not to mess up readability. Mikhail -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Thu Jan 12 17:25:22 2017 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 12 Jan 2017 17:25:22 -0500 Subject: [Python-ideas] OS related file operations (copy, move, delete, rename...) should be placed into one module In-Reply-To: References: <1682891988816860097@unknownmsgid> Message-ID: On 1/12/2017 2:11 PM, Todd wrote: > On Thu, Jan 12, 2017 at 11:17 AM, Chris Barker - NOAA Federal > > wrote: > > I agree that this has been a bit of a wart for a long time. > >> While the old ?let?s treat strings as paths? modules are split up >> like you said, pathlib can do what they do and >> more: https://docs.python.org/3/library/pathlib.html >> > > Exactly -- this is The Solution. It combines paths themselves with > things you are likely to do with paths. > > It may well lack some nice features. If so, suggestions for that > would be the way to go. > > > Can such suggestions go here or should someone start a new thread? Start a new thread for 'Add x to pathlib'. -- Terry Jan Reedy From tjreedy at udel.edu Thu Jan 12 17:47:43 2017 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 12 Jan 2017 17:47:43 -0500 Subject: [Python-ideas] RFC: PEP 540 version 3 (Add a new UTF-8 mode) In-Reply-To: References: <20170112161001.GA19119@phdru.name> Message-ID: On 1/12/2017 12:10 PM, Victor Stinner wrote: > Ah yes, date, thank you for the example. Here is my example using > LC_TIME locale to format a date and LC_CTYPE to decode a byte string: > > date.py: > --- > import locale, time > locale.setlocale(locale.LC_ALL, "") > b = time.strftime("%a") > encoding=locale.getpreferredencoding() > try: > u = b.decode(encoding) > except UnicodeError: > u = '' > else: > u = repr(u) > print("bytes: %r, text: %s, encoding: %r" % (b, u, encoding)) Since b is a string, b.decode is an AttributeError on 3.x. What am I missing? Was this for 2.x only? -- Terry Jan Reedy From brett at python.org Thu Jan 12 17:39:54 2017 From: brett at python.org (Brett Cannon) Date: Thu, 12 Jan 2017 22:39:54 +0000 Subject: [Python-ideas] How to respond to trolling (Guido van Rossum) In-Reply-To: <31b49e4a-e33e-26e0-5cc1-32187a4cd639@bigpond.com> References: <31b49e4a-e33e-26e0-5cc1-32187a4cd639@bigpond.com> Message-ID: On Wed, 11 Jan 2017 at 20:56 Simon Lovell wrote: > I feel I have to respond to this one. > And as list admin I feel I now have to reply to this to help explain why people reacted the way they have. > > > More than half of what I suggested could have and should be implemented. > It's this sort of attitude which puts people off. It is your *opinion* that it should be implemented, not a matter of fact as you have stated it. Just because something could be done doesn't mean it should be done. You're allowed to have your opinion, but stating it as anything but your opinion does not engender anyone to your opinion. > In particular the truthiness of non-boolean data and the lack of a > reasonable SQL syntax. Several other points have been discussed > endlessly on the internet but without a satisfactory (IMO) answer being > given. I disagree, but that's fine since, as you said, that's your opinion and you're allowed to not like the decisions we have made in designing Python. > I don't know what is meant by some insults having been thrown in. > Calling truthiness of non boolean data "Ugly" is an insult? It is ugly. > Now *that *is insulting to me. Once again, you are allowed to disagree and say you don't like how truthiness is handled in Python, but you flat-out stating something is ugly insults all the time and effort that me and the other core developers have put into Python to try and make it the best language we can with the constraints we have to work within. Put another way, would you find it reasonable to walk up to me at a conference and just say straight to my face "the way truthiness is implemented is ugly"? Or would you more likely come up to me and say "I don't happen to like how truthiness is implemented, could we have a chat as to why it is the way it is so I can understand how it came to be this way?" Notice how the former puts you on offensive footing like you're lecturing me while the latter is you asking a question to try and understand why something is the way it is that you happen to not like. One approach is respectful of the volunteer effort me and everyone else puts into Python, the other is not. This list exists to be open to people's ideas, but those ideas must be communicated in a considerate, respectful manner or else they will be ignored (and those three tenants are directly from the Code of Conduct). So I am politely asking you -- and reminding everyone else -- to simply be respectful and considerate of everyone here who is trying to have an open conversation. My rule of thumb is to talk as if you're asking a complete stranger to do you a favour (which you in fact are since you're asking strangers to read your email and to take its contents seriously). If we all did that then we wouldn't have issues here with how people communicate. -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Thu Jan 12 18:14:58 2017 From: random832 at fastmail.com (Random832) Date: Thu, 12 Jan 2017 18:14:58 -0500 Subject: [Python-ideas] RFC: PEP 540 version 3 (Add a new UTF-8 mode) In-Reply-To: <20170112181340.GA25123@phdru.name> References: <20170112161001.GA19119@phdru.name> <1484244283.1491998.845750680.3CCC838D@webmail.messagingengine.com> <20170112181340.GA25123@phdru.name> Message-ID: <1484262898.1562380.846108992.478C5D03@webmail.messagingengine.com> On Thu, Jan 12, 2017, at 13:13, Oleg Broytman wrote: > Works for me as expected: > > $ echo $LC_CTYPE > ru_RU.KOI8-R > > $ LC_MESSAGES=ru_RU.KOI8-R mc > > mc speaks to me in Russian... > > $ LC_MESSAGES=C mc I meant LC_CTYPE=C. Or, for that matter, UTF-8 etc. From random832 at fastmail.com Thu Jan 12 18:21:43 2017 From: random832 at fastmail.com (Random832) Date: Thu, 12 Jan 2017 18:21:43 -0500 Subject: [Python-ideas] How to respond to trolling (Guido van Rossum) In-Reply-To: References: <31b49e4a-e33e-26e0-5cc1-32187a4cd639@bigpond.com> Message-ID: <1484263303.1563492.846110464.0DD718F5@webmail.messagingengine.com> On Thu, Jan 12, 2017, at 17:39, Brett Cannon wrote: > On Wed, 11 Jan 2017 at 20:56 Simon Lovell wrote: > > I don't know what is meant by some insults having been thrown in. > > Calling truthiness of non boolean data "Ugly" is an insult? It is ugly. > > Now *that *is insulting to me. Once again, you are allowed to disagree > and > say you don't like how truthiness is handled in Python, but you flat-out > stating something is ugly insults all the time and effort that me and the > other core developers have put into Python to try and make it the best > language we can with the constraints we have to work within. Just out of curiosity... in your estimation, what is a "wart", and why is the term "wart" used for it? I mean, this is an accepted term that the Python community uses to refer to things, that is not generally regarded to be cause for an accusation of personally insulting anyone, right? I haven't stepped into an alternate universe? The only thing that "python features regarded as 'warts'" and "the skin condition called 'warts'" have in common, to connect them to even allow such an analogy to form, is that they are both regarded as negative to a commonly held sense of aesthetics - or, in a word, that they are 'ugly'. From ethan at stoneleaf.us Thu Jan 12 18:40:33 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 12 Jan 2017 15:40:33 -0800 Subject: [Python-ideas] How to respond to trolling (Guido van Rossum) In-Reply-To: <1484263303.1563492.846110464.0DD718F5@webmail.messagingengine.com> References: <31b49e4a-e33e-26e0-5cc1-32187a4cd639@bigpond.com> <1484263303.1563492.846110464.0DD718F5@webmail.messagingengine.com> Message-ID: <587813F1.1080502@stoneleaf.us> On 01/12/2017 03:21 PM, Random832 wrote: > Just out of curiosity... in your estimation, what is a "wart", and why > is the term "wart" used for it? I mean, this is an accepted term that > the Python community uses to refer to things [...] I do not see any difference between calling something a "wart" and calling something "ugly". The sticking point in this case is highlighted by your statement, "an accepted term *by the Python community*" [emphasis added]. In other words, it is equally offensive for a stranger to come in and start branding this or that as warts as it is for that same stranger to come in and start declaring this or that as ugly. -- ~Ethan~ From greg.ewing at canterbury.ac.nz Thu Jan 12 17:20:25 2017 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 13 Jan 2017 11:20:25 +1300 Subject: [Python-ideas] Things that won't change (proposed PEP) In-Reply-To: <22647.42089.114486.294200@turnbull.sk.tsukuba.ac.jp> References: <20170112023740.GS3887@ando.pearwood.info> <20170112031755.GA7523@phdru.name> <587726A8.60107@canterbury.ac.nz> <22647.42089.114486.294200@turnbull.sk.tsukuba.ac.jp> Message-ID: <58780129.1000305@canterbury.ac.nz> Stephen J. Turnbull wrote: > Greg Ewing writes: > > FCLAP - Frequent Criticisms Levelled Against Python > > It reads better if you don't insist that they be frequent. (This may > only play in America.) Criticisms Frequently Levelled Against Python would be another possibility... -- Greg From phd at phdru.name Thu Jan 12 19:51:05 2017 From: phd at phdru.name (Oleg Broytman) Date: Fri, 13 Jan 2017 01:51:05 +0100 Subject: [Python-ideas] RFC: PEP 540 version 3 (Add a new UTF-8 mode) In-Reply-To: <1484262898.1562380.846108992.478C5D03@webmail.messagingengine.com> References: <20170112161001.GA19119@phdru.name> <1484244283.1491998.845750680.3CCC838D@webmail.messagingengine.com> <20170112181340.GA25123@phdru.name> <1484262898.1562380.846108992.478C5D03@webmail.messagingengine.com> Message-ID: <20170113005105.GA9483@phdru.name> On Thu, Jan 12, 2017 at 06:14:58PM -0500, Random832 wrote: > On Thu, Jan 12, 2017, at 13:13, Oleg Broytman wrote: > > Works for me as expected: > > > > $ echo $LC_CTYPE > > ru_RU.KOI8-R > > > > $ LC_MESSAGES=ru_RU.KOI8-R mc > > > > mc speaks to me in Russian... > > > > $ LC_MESSAGES=C mc > > I meant LC_CTYPE=C. Or, for that matter, UTF-8 etc. $ LC_CTYPE=C LC_MESSAGES=ru_RU.KOI8-R mc Brouhaha! mc tries to talk in Russian but converts Russian texts to ascii. Everything is "?????" :-D $ echo $LC_CTYPE ru_RU.UTF-8 $ LC_MESSAGES=ru_RU.UTF-8 mc Russian in utf-8, no problem. What did you expect? I think I did it not the way you wanted. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From python at lucidity.plus.com Thu Jan 12 19:57:55 2017 From: python at lucidity.plus.com (Erik) Date: Fri, 13 Jan 2017 00:57:55 +0000 Subject: [Python-ideas] Things that won't change (proposed PEP) In-Reply-To: References: <20170112023740.GS3887@ando.pearwood.info> Message-ID: <6c7283be-fbd0-bfb4-28d1-a95a64a8e1f1@lucidity.plus.com> On 12/01/17 19:51, Todd wrote: > > On Thu, Jan 12, 2017 at 2:33 PM, Sven R. Kunze > wrote: > First of all, I am anti-censor and pro-change. > There is no "censorship" or "banning thoughts" going on here. Even with > this PEP, people are free to think about and talk about how Python could > work differently all they want. What this PEP does is tell them that > certain decisions have been made about how the Python language is going > to work, so they should be aware that such talk isn't going to actually > result in any changes to the language. By saying that "these are things that will not change", then you _are_ sort of banning talk about them (if, as you assert, "such talk isn't going to actually result in any changes to the language" then you are saying don't waste your breath, we won't even consider your arguments). I think I get Sven's point. A long time ago, someone probably said "Python will never have any sort of type declarations.". But now there is type hinting. It's not the same thing, I know, but such a declaration in a PEP might have prevented people from even spending time considering hinting. Instead, if the PEP collected - for each 'frequently' suggested change - a summary of the reasons WHY each aspect is designed the way it is (with links to archived discussions or whatever) then that IMO that would be a good resource to cite in a canned response to such suggestions. It's not that "these things will never change", it's more of a "you need to provide a solid argument why your suggestion is different to, and better than, the cited suggestions that have already been rejected". Probably a lot of work to gather all the references though. But it could start out with one or two and grow from there. Add to it as and when people bring up the same old stuff next time. E. From chris.barker at noaa.gov Thu Jan 12 19:58:07 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 12 Jan 2017 16:58:07 -0800 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: <22647.42412.563855.37358@turnbull.sk.tsukuba.ac.jp> References: <20170105171638.GA4217@phdru.name> <22638.64668.809353.547149@turnbull.sk.tsukuba.ac.jp> <22647.42412.563855.37358@turnbull.sk.tsukuba.ac.jp> Message-ID: On Thu, Jan 12, 2017 at 7:50 AM, Stephen J. Turnbull wrote: > > So I see no downside to using utf-8 when the C locale is defined. > > You don't have much incentive to look for one, and I doubt you have > the experience of the edge cases (if you do, please correct me), so > that does not surprise me. > that's correct. I left out a sentence: This is a good time for others' with experience with the ugly edge cases to speak up! The real challenge is that "output" has three (at least :-) ) use cases: 1) Passing on data the came from input from the same system: Victors' "Unix pipe style". In that case, if a supposedly ASCII-based system has non ascii data, most users would want it to get passed through unchanged. They not likely to expect their python program to enforce their locale (unless it was a program designed to do that - but then it could be explicit about things). 2) The program generating data itself: the mentioned "outputting boxes to the console" example. I think that folks writing these programs should consider whether they really need non-ascii output -- but if they do do this -- I"d image most folks would rather see weird characters in the console than have the program crash. So these both point to utf-8 (with surrogateescape) 3) A program getting input from a user, or a data file, or...... (like a filename, etc). This may be a program intended to be able to handle unicode filenames, etc. (this is my use-case :-) ) -- what should it do when run on an ASCII-only system? This is the tough one -- if the C-locale indicated "non configured", then users would likely want the _something_ written to the FS, rather than a program crash: so utf-8. However, if the system really is properly configured to be ASCII only, then they may want a program to never write non-ascii to the filesystem. However, ultimately, I think it's up to the application developer, rather than to Python itself (Or the sysadmin for the OS that it's running on) to know whether the app is supposed to support non-ascii filenames, etc. i.e. one should expect that running a unicode-aware app on an ascii-only filesystem is going to lead to problems anyway. So I think the "never crash" option is the better one in this imperfect trade-off. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Thu Jan 12 20:09:00 2017 From: brett at python.org (Brett Cannon) Date: Fri, 13 Jan 2017 01:09:00 +0000 Subject: [Python-ideas] How to respond to trolling (Guido van Rossum) In-Reply-To: <1484263303.1563492.846110464.0DD718F5@webmail.messagingengine.com> References: <31b49e4a-e33e-26e0-5cc1-32187a4cd639@bigpond.com> <1484263303.1563492.846110464.0DD718F5@webmail.messagingengine.com> Message-ID: On Thu, 12 Jan 2017 at 15:22 Random832 wrote: > On Thu, Jan 12, 2017, at 17:39, Brett Cannon wrote: > > On Wed, 11 Jan 2017 at 20:56 Simon Lovell > wrote: > > > I don't know what is meant by some insults having been thrown in. > > > Calling truthiness of non boolean data "Ugly" is an insult? It is ugly. > > > > Now *that *is insulting to me. Once again, you are allowed to disagree > > and > > say you don't like how truthiness is handled in Python, but you flat-out > > stating something is ugly insults all the time and effort that me and the > > other core developers have put into Python to try and make it the best > > language we can with the constraints we have to work within. > > Just out of curiosity... in your estimation, what is a "wart", and why > is the term "wart" used for it? That term has been used since before I got involved in Python so I don't know its history. To me, a "wart" is a design misstep; there were reasons at the time for the design but it has not held up as necessarily the best decision. So to me "wart" is not as bad as "ugly" as it tacitly acknowledges circumstances were quite possibly different back then and 20/20 hindsight is not something we have when making a decision. As a community we have collectively agreed some things are warts in Python because enough people over time have shared the opinion that something was a design misstep. > I mean, this is an accepted term that > the Python community uses to refer to things, that is not generally > regarded to be cause for an accusation of personally insulting anyone, > right? I haven't stepped into an alternate universe? You're focusing on the word and not how the word was presented. The fact that Simon started his email with a blanket statement basically saying his ideas were great and right automatically shows arrogance. And then continuing to say that something is ugly matter-of-factly just continued on that theme. I can normally mentally insert an "I think" phrase for people when they make a blanket statement like that when the rest of the email was reasonable, but the posturing of the email as a whole just didn't all for that. We can argue what adjective or noun could have been used forever, but the fact that it was delivered as if in judgment over those who put the time and effort to make the decision all those years ago doesn't ever feel good to the people being judged and ridiculed (and I know this can seem small, but as one of the people being judged regularly I can attest that the constant ridicule contributes to burnout). -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Thu Jan 12 20:12:56 2017 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Thu, 12 Jan 2017 17:12:56 -0800 Subject: [Python-ideas] OS related file operations (copy, move, delete, rename...) should be placed into one module In-Reply-To: References: <1682891988816860097@unknownmsgid> Message-ID: <-7368988991825800210@unknownmsgid> > On Jan 12, 2017, at 2:26 PM, Terry Reedy wrote: > > > Start a new thread for 'Add x to pathlib'. > And do take some time to see if a given suggestion has already been discussed first. -CHB From guido at python.org Thu Jan 12 20:26:06 2017 From: guido at python.org (Guido van Rossum) Date: Thu, 12 Jan 2017 17:26:06 -0800 Subject: [Python-ideas] Things that won't change (proposed PEP) In-Reply-To: <6c7283be-fbd0-bfb4-28d1-a95a64a8e1f1@lucidity.plus.com> References: <20170112023740.GS3887@ando.pearwood.info> <6c7283be-fbd0-bfb4-28d1-a95a64a8e1f1@lucidity.plus.com> Message-ID: I've not followed this discussion closely, but I would assume that for most things on the "will never change" list the explanation is simply that the cost of changing it while maintaining backward compatibility is too high compared to the benefit of fixing the problem (regardless of whether it really is a problem). People who come in with enthusiastic proposals to fix some pet peeve usually don't have the experience needed to appreciate the difficulty in maintaining backwards compatibility. (A really weird disconnect from reality happens when this is mentioned in the same breath as "please fix the Python 2 vs. 3 problem". :-) I would also guess that for things that are actually controversial (meaning some people hate a feature that other people love), it's much easier to explain why it's too late to change than it is to provide an objective argument for why the status quo is better. Often the status quo is not better per se, it's just better because it's the status quo. from __future__ import no_colons # :-) On Thu, Jan 12, 2017 at 4:57 PM, Erik wrote: > On 12/01/17 19:51, Todd wrote: > >> >> On Thu, Jan 12, 2017 at 2:33 PM, Sven R. Kunze > > wrote: >> First of all, I am anti-censor and pro-change. >> > > There is no "censorship" or "banning thoughts" going on here. Even with >> this PEP, people are free to think about and talk about how Python could >> work differently all they want. What this PEP does is tell them that >> certain decisions have been made about how the Python language is going >> to work, so they should be aware that such talk isn't going to actually >> result in any changes to the language. >> > > By saying that "these are things that will not change", then you _are_ > sort of banning talk about them (if, as you assert, "such talk isn't going > to actually result in any changes to the language" then you are saying > don't waste your breath, we won't even consider your arguments). > > I think I get Sven's point. A long time ago, someone probably said "Python > will never have any sort of type declarations.". But now there is type > hinting. It's not the same thing, I know, but such a declaration in a PEP > might have prevented people from even spending time considering hinting. > > Instead, if the PEP collected - for each 'frequently' suggested change - a > summary of the reasons WHY each aspect is designed the way it is (with > links to archived discussions or whatever) then that IMO that would be a > good resource to cite in a canned response to such suggestions. > > It's not that "these things will never change", it's more of a "you need > to provide a solid argument why your suggestion is different to, and better > than, the cited suggestions that have already been rejected". > > Probably a lot of work to gather all the references though. But it could > start out with one or two and grow from there. Add to it as and when people > bring up the same old stuff next time. > > E. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Thu Jan 12 20:27:50 2017 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Thu, 12 Jan 2017 17:27:50 -0800 Subject: [Python-ideas] Things that won't change (proposed PEP) In-Reply-To: <6c7283be-fbd0-bfb4-28d1-a95a64a8e1f1@lucidity.plus.com> References: <20170112023740.GS3887@ando.pearwood.info> <6c7283be-fbd0-bfb4-28d1-a95a64a8e1f1@lucidity.plus.com> Message-ID: <-521888281998820830@unknownmsgid> > By saying that "these are things that will not change", I agree -- these are not exactly " things that will not change" as they are: "Things that have been discussed (often ad nausium) and considered and definitively rejected" And many of them are: "Part of what makes Python Python" I think some wordsmithing is in order to make that clear. -CHB From guido at python.org Thu Jan 12 20:29:18 2017 From: guido at python.org (Guido van Rossum) Date: Thu, 12 Jan 2017 17:29:18 -0800 Subject: [Python-ideas] How to respond to trolling (Guido van Rossum) In-Reply-To: References: <31b49e4a-e33e-26e0-5cc1-32187a4cd639@bigpond.com> <1484263303.1563492.846110464.0DD718F5@webmail.messagingengine.com> Message-ID: AFAIK the term comes from a piece by Andrew Kuchling titled "Python warts". The topic now has its own wiki page: https://wiki.python.org/moin/PythonWarts I believe that most of the warts are not even design missteps -- they are emergent misfeatures, meaning nobody could have predicted how things would work out. On Thu, Jan 12, 2017 at 5:09 PM, Brett Cannon wrote: > > > On Thu, 12 Jan 2017 at 15:22 Random832 wrote: > >> On Thu, Jan 12, 2017, at 17:39, Brett Cannon wrote: >> > On Wed, 11 Jan 2017 at 20:56 Simon Lovell >> wrote: >> > > I don't know what is meant by some insults having been thrown in. >> > > Calling truthiness of non boolean data "Ugly" is an insult? It is >> ugly. >> > >> > Now *that *is insulting to me. Once again, you are allowed to disagree >> > and >> > say you don't like how truthiness is handled in Python, but you flat-out >> > stating something is ugly insults all the time and effort that me and >> the >> > other core developers have put into Python to try and make it the best >> > language we can with the constraints we have to work within. >> >> Just out of curiosity... in your estimation, what is a "wart", and why >> is the term "wart" used for it? > > > That term has been used since before I got involved in Python so I don't > know its history. To me, a "wart" is a design misstep; there were reasons > at the time for the design but it has not held up as necessarily the best > decision. So to me "wart" is not as bad as "ugly" as it tacitly > acknowledges circumstances were quite possibly different back then and > 20/20 hindsight is not something we have when making a decision. As a > community we have collectively agreed some things are warts in Python > because enough people over time have shared the opinion that something was > a design misstep. > > >> I mean, this is an accepted term that >> the Python community uses to refer to things, that is not generally >> regarded to be cause for an accusation of personally insulting anyone, >> right? I haven't stepped into an alternate universe? > > > You're focusing on the word and not how the word was presented. The fact > that Simon started his email with a blanket statement basically saying his > ideas were great and right automatically shows arrogance. And then > continuing to say that something is ugly matter-of-factly just continued on > that theme. I can normally mentally insert an "I think" phrase for people > when they make a blanket statement like that when the rest of the email was > reasonable, but the posturing of the email as a whole just didn't all for > that. > > We can argue what adjective or noun could have been used forever, but the > fact that it was delivered as if in judgment over those who put the time > and effort to make the decision all those years ago doesn't ever feel good > to the people being judged and ridiculed (and I know this can seem small, > but as one of the people being judged regularly I can attest that the > constant ridicule contributes to burnout). > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From songofacandy at gmail.com Thu Jan 12 21:01:01 2017 From: songofacandy at gmail.com (INADA Naoki) Date: Fri, 13 Jan 2017 11:01:01 +0900 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: References: <20170105171638.GA4217@phdru.name> <22638.64668.809353.547149@turnbull.sk.tsukuba.ac.jp> Message-ID: On Fri, Jan 13, 2017 at 12:12 AM, Victor Stinner wrote: > 2017-01-12 1:23 GMT+01:00 INADA Naoki : >> I'm ?0 to surrogateescape by default. I feel +1 for stdout and -1 for stdin. > > The use case is to be able to write a Python 3 program which works > work UNIX pipes without failing with encoding errors: > https://www.python.org/dev/peps/pep-0540/#producer-consumer-model-using-pipes > > If you want something stricter, there is the UTF-8 Strict mode which > prevent mojibake everywhere. I'm not sure that the UTF-8 Strict mode > is really useful. When I implemented it, I quickly understood that > using strict *everywhere* is just a deadend: it would fail in too many > places. > https://www.python.org/dev/peps/pep-0540/#use-the-strict-error-handler-for-operating-system-data > > I'm not even sure yet that a Python 3 with stdin using strict is "usable". > I want http://bugs.python.org/issue15216 is merged in 3.7. It allows application select error handler by straightforward API. So, the problem is "which should be default"? * Program like `ls` can opt-in surrogateescape. * Program want to output valid UTF-8 can opt-out surrogateescape. And I feel former is better, regarding to Python's Zen. But it's not a strong opinion. > >> In output case, surrogateescape is weaker than strict, but it only allows >> surrgateescaped binary. If program carefully use surrogateescaped decode, >> surrogateescape on stdout is safe enough. > > What do you mean that "carefully use surrogateescaped decode"? > > The rationale for using surrogateescape on stdout is to support this use case: > https://www.python.org/dev/peps/pep-0540/#list-a-directory-into-stdout Application which is intended to output surrogateescaped data (filenames) should use surrogateescape, surely. But some application is intended to live in UTF-8 world. For example, think about application reads UTF-8 CSV, and insert it into database. When there is CSV encoded by Shift_JIS accidentally, and it is passed to stdin, error is better than insert it into database silently. > >> On the other hand, surrogateescape is very weak for input. It accepts >> arbitrary bytes. >> It should be used carefully. > > In my experience with the Python bug tracker, almost nobody > understands Unicode and locales. For the "Producer-consumer model > using pipes" use case, encoding issues of Python 3.6 can be a blocker > issue. Some developers may prefer a different programming language > which doesn't bother them with Unicode: basicall, *all* other > programming languages, no? > I agree. Some developer prefer other language (or Python 2) to Python 3, because of "Unicode by default doesn't fit to POSIX". Both of "strict by default" and "weak by default" have downside. > >> But I agree different encoding handler between stdin/stdout is not beautiful. >> That's why I'm ?0. > > That's why there are two modes: UTF-8 and UTF-8 Strict. But I'm not > 100% sure yet, on which encodings and error handlers should be used > ;-) I started to play with my PEP 540 implementation. I already had to > update the PEP 540 and its implementation for Windows. On Windows, > os.fsdecode/fsencode now uses surrogatepass, not surrogateescape > (Python 3.5 uses strict on Windows). > > Victor From songofacandy at gmail.com Thu Jan 12 21:10:44 2017 From: songofacandy at gmail.com (INADA Naoki) Date: Fri, 13 Jan 2017 11:10:44 +0900 Subject: [Python-ideas] RFC: PEP 540 version 3 (Add a new UTF-8 mode) In-Reply-To: <1484244283.1491998.845750680.3CCC838D@webmail.messagingengine.com> References: <20170112161001.GA19119@phdru.name> <1484244283.1491998.845750680.3CCC838D@webmail.messagingengine.com> Message-ID: >> >> My question is more when A and B encodings are not compatible. >> >> Ah yes, date, thank you for the example. Here is my example using >> LC_TIME locale to format a date and LC_CTYPE to decode a byte string: > > Time and messages seem to behave differently - everything I tested > (including python 2 os.strerror) seems to ignore the LC_MESSAGES > encoding and use the LC_CTYPE encoding, including resulting in a bunch > of question marks when it's "C". > _______________________________________________ For date command, it only sees LC_TIME. LC_CTYPE=en_US.UTF-8 LC_TIME=ja_JP.eucjp date shows mojibake. But it's not a problem, because changing LC_CTYPE from C to C.UTF-8 doesn't break anything. It's broken at start. Use UTF-8 everywhere, anytime is best way to avoid mojibake. From phd at phdru.name Thu Jan 12 21:17:44 2017 From: phd at phdru.name (Oleg Broytman) Date: Fri, 13 Jan 2017 03:17:44 +0100 Subject: [Python-ideas] RFC: PEP 540 version 3 (Add a new UTF-8 mode) In-Reply-To: References: <20170112161001.GA19119@phdru.name> <1484244283.1491998.845750680.3CCC838D@webmail.messagingengine.com> Message-ID: <20170113021744.GA17888@phdru.name> On Fri, Jan 13, 2017 at 11:10:44AM +0900, INADA Naoki wrote: > Use UTF-8 everywhere, anytime is best way to avoid mojibake. When you're alone in the Universe -- yes, it helps. But if other people, protocols and data formats use different encodings it doesn't matter which encoding you use -- you'll have mojibake anyway. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From turnbull.stephen.fw at u.tsukuba.ac.jp Thu Jan 12 21:40:09 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Fri, 13 Jan 2017 11:40:09 +0900 Subject: [Python-ideas] RFC: PEP 540 version 3 (Add a new UTF-8 mode) In-Reply-To: References: <20170112161001.GA19119@phdru.name> <1484244283.1491998.845750680.3CCC838D@webmail.messagingengine.com> Message-ID: <22648.15881.572533.515071@turnbull.sk.tsukuba.ac.jp> INADA Naoki writes: > But it's not a problem, because changing LC_CTYPE from C to C.UTF-8 > doesn't break anything. It's broken at start. > Use UTF-8 everywhere, anytime is best way to avoid mojibake. Please stop repeating this; it is invalid as an argument. Everybody using Python 3 (which is the only topic for this list) already knows that use of a common universal encoding -- in practice, UTF-8 -- is the way forward (and Windows users also know that the Windows API is a major exception, which proves the rule by being a different *Unicode* transformation format). It is not part of this discussion. The problem is that not everybody does this yet, even today (in fact, that's the source of the problem on containers, people are using the C locale, not C.utf-8!), and some of us have to use or interoperate with systems that don't, even if our own systems do. If your position really is "Screw them, they're stupid -- let them fix their broken systems, it's not our problem", I can understand that but we'll have to agree to disagree. My position is that we need to (1) determine if this change actually can cause problems for Python users on such systems or interoperating with such systems (2) determine how serious the problems are with the "force UTF-8 in certain situations" approach vs. the status quo (3) compare the damage both ways, (4) if there is a conflict, consider whether a modified proposal would work as well or better in more circumstances. I think that is consistent with past Python practice on encoding issues. From turnbull.stephen.fw at u.tsukuba.ac.jp Thu Jan 12 21:40:24 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Fri, 13 Jan 2017 11:40:24 +0900 Subject: [Python-ideas] Settable defaulting to decimal instead of float In-Reply-To: <7565624482040125012@unknownmsgid> References: <7565624482040125012@unknownmsgid> Message-ID: <22648.15896.121393.873005@turnbull.sk.tsukuba.ac.jp> Chris Barker - NOAA Federal writes: > However: (thank you Chris and Stephen) -- I think you mean "Stephan". :-) From turnbull.stephen.fw at u.tsukuba.ac.jp Thu Jan 12 21:40:37 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Fri, 13 Jan 2017 11:40:37 +0900 Subject: [Python-ideas] How to respond to trolling (Guido van Rossum) In-Reply-To: References: <31b49e4a-e33e-26e0-5cc1-32187a4cd639@bigpond.com> <1484263303.1563492.846110464.0DD718F5@webmail.messagingengine.com> Message-ID: <22648.15909.962623.527971@turnbull.sk.tsukuba.ac.jp> Guido van Rossum writes: > AFAIK the term comes from a piece by Andrew Kuchling titled "Python warts". > The topic now has its own wiki page: > https://wiki.python.org/moin/PythonWarts > > I believe that most of the warts are not even design missteps -- they are > emergent misfeatures, meaning nobody could have predicted how things would > work out. More like surgical scars than warts, as I see it. From turnbull.stephen.fw at u.tsukuba.ac.jp Thu Jan 12 21:43:30 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Fri, 13 Jan 2017 11:43:30 +0900 Subject: [Python-ideas] Things that won't change (proposed PEP) In-Reply-To: References: <20170112023740.GS3887@ando.pearwood.info> <20170112031755.GA7523@phdru.name> Message-ID: <22648.16082.439222.627813@turnbull.sk.tsukuba.ac.jp> Mark E. Haase writes: > I don't think an informational PEP would make threads like Python Review > shorter and/or more productive. The OP clearly didn't do much research, so > it seems unlikely he would read an informational PEP. But just saying "do your research" (which is quite reasonable without the informational PEP) is much less friendly than including the URL to the informational PEP in the kind of "canned response" you suggest. That's what Steven is aiming at. I'm not sure that a PEP is the best format, as the normal PEP process is not a good match for something that is likely to need to be updated as "good syntax" is discovered for ideas formerly considered un-Pythonic and other languages come up with neat new ideas that don't have obvious Pythonic syntax. Andrew Barnert's blog post (thanks, Chris!) http://stupidpythonideas.blogspot.com/2015/05/why-following-idioms-matters.html is a good start, and Nick Coghlan's "Curious Efficiency" blog has related material, I think. Perhaps pointers to those would be good. > Moreover, the bikeshedding about what goes into this PEP will > inevitably lead to a troll who isn't satisfied with the explanation > of a particular item, or notices that a particular item isn't > included in the PEP, and then we're right back to the same problem: > litigating Python complaints that have already been discussed many > times on this list. I don't see why that has to be the case. The canned response here is "Thank you for your suggestion. The issue tracker is right over that-a-way." A suggestion for your canned response: > Hi, this appears to be your first post to python-ideas. Unfortunately, there are a number of folks around who enjoy discussing non-starters to death. That's insulting to them, and therefore against the spirit of the CoC. I'd remove that, and write > This purpose of this list is to discuss speculative language ideas > for Python. If an idea gains traction, it can then be discussed and > honed into a detailed proposal. followed by Your post does not present a clear, coherent proposal. and > Your post does not fit with the purpose of the list, either because > it is too broad or because it doesn't contain enough technical > details about your proposal. You may wish to improve your proposal > by focusing on a single subject, researching historical > conversations on that subject, and adding more technical > details. Alternatively, you may wish to post on python-list[1] > instead, which is a general purpose list that does not have the > same constraints as this list. Of course this presentation is broken, the grammar can be improved easily. > Stack Overflow does something similar, where they have canned > responses to low-quality questions. This makes it easy for the > community to self-moderate in a respectful manner. We have a few of those already. This would be a useful addition. From songofacandy at gmail.com Thu Jan 12 22:50:01 2017 From: songofacandy at gmail.com (INADA Naoki) Date: Fri, 13 Jan 2017 12:50:01 +0900 Subject: [Python-ideas] RFC: PEP 540 version 3 (Add a new UTF-8 mode) In-Reply-To: <22648.15881.572533.515071@turnbull.sk.tsukuba.ac.jp> References: <20170112161001.GA19119@phdru.name> <1484244283.1491998.845750680.3CCC838D@webmail.messagingengine.com> <22648.15881.572533.515071@turnbull.sk.tsukuba.ac.jp> Message-ID: On Fri, Jan 13, 2017 at 11:40 AM, Stephen J. Turnbull wrote: > INADA Naoki writes: > > > But it's not a problem, because changing LC_CTYPE from C to C.UTF-8 > > doesn't break anything. It's broken at start. > > Use UTF-8 everywhere, anytime is best way to avoid mojibake. > > Please stop repeating this; it is invalid as an argument. Sorry, I meant "If LC_CTYPE is C or C.UTF-8, all other LC_* should be ASCII or UTF-8. Otherwise, mojibake is not avoidable regardless PEP 538 or 540." I didn't meant forcing C.UTF-8 for LC_TIME too. As you can read, no one propose "Drop non-UTF-8 locale support completely". > > The problem is that not everybody does this yet, even today (in fact, > that's the source of the problem on containers, people are using the C > locale, not C.utf-8!), No. C locale doesn't forbid using UTF-8. It doesn't determine terminal encoding too. It's just a Python's behavior, and it's unlike many other languages commonly used. That's why many people bitten by this problem. For example, vim uses latin-1 by default for C locale. Since C locale means nothing about terminal/file/stdio encoding, using most common byte transparent encoding seems reasonable choice. Off course, vim can be configured to use UTF-8, regardless LC_CTYPE. > and some of us have to use or interoperate with > systems that don't, even if our own systems do. > > If your position really is "Screw them, they're stupid -- let them fix > their broken systems, it's not our problem", I never said such thing. > I can understand that but > we'll have to agree to disagree. My position is that we need to > > (1) determine if this change actually can cause problems for Python > users on such systems or interoperating with such systems Sure. That's what I tested. "If people using non UTF-8 LC_TIME and LC_CTYPE=C, thare is mojibake already. This change doesn't break anything." > (2) determine how serious the problems are with the "force UTF-8 in > certain situations" approach vs. the status quo > (3) compare the damage both ways, > (4) if there is a conflict, consider whether a modified proposal would > work as well or better in more circumstances. > > I think that is consistent with past Python practice on encoding > issues. Sure. And there is unavoidable conflict, default behavior should be for more common usage. From songofacandy at gmail.com Thu Jan 12 22:53:52 2017 From: songofacandy at gmail.com (INADA Naoki) Date: Fri, 13 Jan 2017 12:53:52 +0900 Subject: [Python-ideas] PEP 540: Add a new UTF-8 mode In-Reply-To: References: <20170105171638.GA4217@phdru.name> Message-ID: >> If we chose "Always use UTF-8 for fs encoding", I think >> PYTHONFSENCODING envvar should be >> added again. (It should be used from startup: decoding command line argument). > > Last time I implemented PYTHONFSENCODING, I had many major issues: > https://mail.python.org/pipermail/python-dev/2010-October/104509.html > > Do you mean that these issues are now outdated and that you have an > idea how to fix them? > Just a idea: Only "ascii", "utf-8" (default) and "locale" is allowed for PYTHONFSENCODING. From ncoghlan at gmail.com Thu Jan 12 23:04:32 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 13 Jan 2017 14:04:32 +1000 Subject: [Python-ideas] Things that won't change (proposed PEP) In-Reply-To: <22648.16082.439222.627813@turnbull.sk.tsukuba.ac.jp> References: <20170112023740.GS3887@ando.pearwood.info> <20170112031755.GA7523@phdru.name> <22648.16082.439222.627813@turnbull.sk.tsukuba.ac.jp> Message-ID: On 13 January 2017 at 12:43, Stephen J. Turnbull wrote: > Mark E. Haase writes: > > > I don't think an informational PEP would make threads like Python Review > > shorter and/or more productive. The OP clearly didn't do much research, so > > it seems unlikely he would read an informational PEP. > > But just saying "do your research" (which is quite reasonable without > the informational PEP) is much less friendly than including the URL to > the informational PEP in the kind of "canned response" you suggest. > That's what Steven is aiming at. > > I'm not sure that a PEP is the best format, as the normal PEP process is > not a good match for something that is likely to need to be updated as > "good syntax" is discovered for ideas formerly considered un-Pythonic > and other languages come up with neat new ideas that don't have > obvious Pythonic syntax. Andrew Barnert's blog post (thanks, Chris!) > http://stupidpythonideas.blogspot.com/2015/05/why-following-idioms-matters.html > is a good start, and Nick Coghlan's "Curious Efficiency" blog has > related material, I think. Perhaps pointers to those would be good. Expanding on https://docs.python.org/devguide/langchanges.html would likely be a more useful format than an informational PEP. As a starting point, https://docs.python.org/devguide/faq.html#suggesting-changes should likely be consolidated into that page, and the FAQ entry simplified into a link to a new subsection on that page. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From turnbull.stephen.fw at u.tsukuba.ac.jp Thu Jan 12 23:24:37 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Fri, 13 Jan 2017 13:24:37 +0900 Subject: [Python-ideas] RFC: PEP 540 version 3 (Add a new UTF-8 mode) In-Reply-To: References: <20170112161001.GA19119@phdru.name> <1484244283.1491998.845750680.3CCC838D@webmail.messagingengine.com> <22648.15881.572533.515071@turnbull.sk.tsukuba.ac.jp> Message-ID: <22648.22149.382228.267664@turnbull.sk.tsukuba.ac.jp> INADA Naoki writes: > No. C locale doesn't forbid using UTF-8. I'm sorry, but I believe you are completely misunderstanding what this discussion is about. I don't have time to deal with it any more. From srkunze at mail.de Fri Jan 13 10:44:15 2017 From: srkunze at mail.de (Sven R. Kunze) Date: Fri, 13 Jan 2017 16:44:15 +0100 Subject: [Python-ideas] How to respond to trolling (Guido van Rossum) In-Reply-To: <22648.15909.962623.527971@turnbull.sk.tsukuba.ac.jp> References: <31b49e4a-e33e-26e0-5cc1-32187a4cd639@bigpond.com> <1484263303.1563492.846110464.0DD718F5@webmail.messagingengine.com> <22648.15909.962623.527971@turnbull.sk.tsukuba.ac.jp> Message-ID: Moreover, when I read "explicit self" is a wart, then I think, "you have absolutely no idea how fantastic 'explicit self' is". Thus, inferring from a single data-point these seems to be personal "dislike lists". In this regard, I tend to prefer Guido's one before any others if there is even one. On 13.01.2017 03:40, Stephen J. Turnbull wrote: > Guido van Rossum writes: > > > AFAIK the term comes from a piece by Andrew Kuchling titled "Python warts". > > The topic now has its own wiki page: > > https://wiki.python.org/moin/PythonWarts > > > > I believe that most of the warts are not even design missteps -- they are > > emergent misfeatures, meaning nobody could have predicted how things would > > work out. > > More like surgical scars than warts, as I see it. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From chris.barker at noaa.gov Fri Jan 13 12:38:53 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 13 Jan 2017 09:38:53 -0800 Subject: [Python-ideas] Settable defaulting to decimal instead of float In-Reply-To: <22648.15896.121393.873005@turnbull.sk.tsukuba.ac.jp> References: <7565624482040125012@unknownmsgid> <22648.15896.121393.873005@turnbull.sk.tsukuba.ac.jp> Message-ID: On Thu, Jan 12, 2017 at 6:40 PM, Stephen J. Turnbull < turnbull.stephen.fw at u.tsukuba.ac.jp> wrote: > > However: (thank you Chris and Stephen) -- > > I think you mean "Stephan". :-) > Yes -- should have looked back at the thread! -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Fri Jan 13 15:08:45 2017 From: brett at python.org (Brett Cannon) Date: Fri, 13 Jan 2017 20:08:45 +0000 Subject: [Python-ideas] Things that won't change (proposed PEP) In-Reply-To: References: <20170112023740.GS3887@ando.pearwood.info> <20170112031755.GA7523@phdru.name> <22648.16082.439222.627813@turnbull.sk.tsukuba.ac.jp> Message-ID: On Thu, 12 Jan 2017 at 20:05 Nick Coghlan wrote: > On 13 January 2017 at 12:43, Stephen J. Turnbull > wrote: > > Mark E. Haase writes: > > > > > I don't think an informational PEP would make threads like Python > Review > > > shorter and/or more productive. The OP clearly didn't do much > research, so > > > it seems unlikely he would read an informational PEP. > > > > But just saying "do your research" (which is quite reasonable without > > the informational PEP) is much less friendly than including the URL to > > the informational PEP in the kind of "canned response" you suggest. > > That's what Steven is aiming at. > > > > I'm not sure that a PEP is the best format, as the normal PEP process is > > not a good match for something that is likely to need to be updated as > > "good syntax" is discovered for ideas formerly considered un-Pythonic > > and other languages come up with neat new ideas that don't have > > obvious Pythonic syntax. Andrew Barnert's blog post (thanks, Chris!) > > > http://stupidpythonideas.blogspot.com/2015/05/why-following-idioms-matters.html > > is a good start, and Nick Coghlan's "Curious Efficiency" blog has > > related material, I think. Perhaps pointers to those would be good. > > Expanding on https://docs.python.org/devguide/langchanges.html would > likely be a more useful format than an informational PEP. > > As a starting point, > https://docs.python.org/devguide/faq.html#suggesting-changes should > likely be consolidated into that page, and the FAQ entry simplified > into a link to a new subsection on that page. > Do realize the FAQ is gutted in the github branch so make sure you look at that version of the devguide to know what's planned: https://cpython-devguide.readthedocs.io/en/github/index.html . -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Fri Jan 13 17:40:03 2017 From: barry at python.org (Barry Warsaw) Date: Fri, 13 Jan 2017 17:40:03 -0500 Subject: [Python-ideas] Things that won't change (proposed PEP) References: <20170112023740.GS3887@ando.pearwood.info> <20170112031755.GA7523@phdru.name> <587726A8.60107@canterbury.ac.nz> <22647.42089.114486.294200@turnbull.sk.tsukuba.ac.jp> <58780129.1000305@canterbury.ac.nz> Message-ID: <20170113174003.46dc5b24@subdivisions.wooz.org> On Jan 13, 2017, at 11:20 AM, Greg Ewing wrote: >Criticisms Frequently Levelled Against Python Missteps Or Nonfeatures Guido Obviously Ordered, Saddling Everyone (Yes, okay, I know python's aren't venomous, but never let facts get in the way of a good, bad, tortured, or mentally mushed Friday-evening backronym.) Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 801 bytes Desc: OpenPGP digital signature URL: From mathieu.tortuyaux at gmail.com Sun Jan 15 00:24:29 2017 From: mathieu.tortuyaux at gmail.com (Mathieu TORTUYAUX) Date: Sun, 15 Jan 2017 00:24:29 -0500 Subject: [Python-ideas] Python dependancies In-Reply-To: References: Message-ID: Hello everyone, I'm used to work with python and contribute to open-source projects. And now, many projects need to run with dependancies. So I wondering, if it could be a good idea to integrate a sniffer into Python to detecte if project's dependancies are up to date. And each time Python project is run developer will be aware if dependancies are up to date. I think isn't the first time that this idea is submitted. So I am looking forward your feedbacks ! Mathieu Tortuyaux -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sun Jan 15 03:30:51 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 15 Jan 2017 19:30:51 +1100 Subject: [Python-ideas] Python dependancies In-Reply-To: References: Message-ID: <20170115083051.GU3887@ando.pearwood.info> On Sun, Jan 15, 2017 at 12:24:29AM -0500, Mathieu TORTUYAUX wrote: > Hello everyone, > > I'm used to work with python and contribute to open-source projects. And > now, many projects need to run with dependancies. So I wondering, if it > could be a good idea to integrate a sniffer into Python to detecte if > project's dependancies are up to date. I think such a sniffer would be an excellent third-party project. When you say "up to date", do you mean that the dependencies are all up to date from your operating system's repositories? (yum, or apt-get, or similar.) The last thing I want is some program complaining that I'm not using version 2.9 of a library when my OS package management only supports 2.6. > And each time Python project is run developer will be aware if dependancies > are up to date. That would be awful. It would be pure noise. If I'm running an old version of something, its because I want, or need, to run an old version. Or because I just don't care -- why should I run the latest version just because it is the latest version? A sniffer that I can run when I want to run it would be useful. A sniffer that runs automatically would be a PITA. -- Steve From pavol.lisy at gmail.com Sun Jan 15 06:09:27 2017 From: pavol.lisy at gmail.com (Pavol Lisy) Date: Sun, 15 Jan 2017 12:09:27 +0100 Subject: [Python-ideas] Python dependancies In-Reply-To: <20170115083051.GU3887@ando.pearwood.info> References: <20170115083051.GU3887@ando.pearwood.info> Message-ID: On 1/15/17, Steven D'Aprano wrote: > On Sun, Jan 15, 2017 at 12:24:29AM -0500, Mathieu TORTUYAUX wrote: >> And each time Python project is run developer will be aware if >> dependancies >> are up to date. > > That would be awful. It would be pure noise. If I'm running an old > version of something, its because I want, or need, to run an old > version. Or because I just don't care -- why should I run the latest > version just because it is the latest version? Maybe because security updates? > A sniffer that I can run when I want to run it would be useful. A > sniffer that runs automatically would be a PITA. It depends. I think OS level update (for example: apt-get dist-upgrade) could be fine. (if you like to stay on old version you could freeze it manually - but you are risk incompatibilities) virtual environment (for example: conda update --all) could be fine too. Next one I personally could not propose outside of test virtual environment: pip install yolk3k yolk --upgrade # this upgrade all packages ( I checked yolk -U on my anaconda environment and it found one beta and one release candidate version as upgradeable targets: ipywidgets 5.2.2 (6.0.0.beta6) statsmodels 0.6.1 (0.8.0rc1) is it acceptable for you Mathieu? Instead of develop your project you could start to work like beta-tester. :) ) Whole thing depends on what we like to optimize. If we want that our open source project works fine only on latest versions of dependencies then it could be unusable for many many people! If we like to work only for people on bleeding edge or if we like to reduce our testing environments only to "newest" versions then some "edge" distribution could help. I mean it is safer to propose run "conda update --all" (because it is much more tested environment) for users than run "yolk --upgrade". So Mathieu - maybe you could try install Anaconda and run regularly "conda update --all" and check if it satisfy your expectations. Or maybe you could check https://pip.pypa.io/en/stable/user_guide/#requirements-files (sorry if I write something too obvious for you) and you and your colleagues could manage your requirements file manually. From jelle.zijlstra at gmail.com Sun Jan 15 10:20:27 2017 From: jelle.zijlstra at gmail.com (Jelle Zijlstra) Date: Sun, 15 Jan 2017 07:20:27 -0800 Subject: [Python-ideas] Python dependancies In-Reply-To: References: Message-ID: 2017-01-14 21:24 GMT-08:00 Mathieu TORTUYAUX : > Hello everyone, > > I'm used to work with python and contribute to open-source projects. And > now, many projects need to run with dependancies. So I wondering, if it > could be a good idea to integrate a sniffer into Python to detecte if > project's dependancies are up to date. > pip already supports something like this: `pip list --outdated` will print out installed packages for which a more recent version is available. > And each time Python project is run developer will be aware if > dependancies are up to date. > > > I think isn't the first time that this idea is submitted. So I am looking > forward your feedbacks ! > > Mathieu Tortuyaux > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From offline at offby1.net Sun Jan 15 11:40:26 2017 From: offline at offby1.net (Chris Rose) Date: Sun, 15 Jan 2017 08:40:26 -0800 Subject: [Python-ideas] [PEP-0541] On issues with reclaiming namespaces in PyPi Message-ID: I want to address one gap in the PEP regarding reclaiming abandoned names: Version reuse. The problem with reusing names is that existing applications or installations that reference the old one, unless they pin the version name precisely. Even in that case, I foresee issues with version collision, especially if the abandoned project was well-versioned in the same model (semver or otherwise) that the new project uses. I'm deeply concerned by the idea of installer code suddenly picking up a new project... with possibly different dependencies on its own, either with old or clashing versions. I recognize it's going to be rare, but these incidents will definitely impact the repeatability of builds depending on PyPi. I think the criteria for reuse of a name must include usage limits; if the package is being downloaded on a steady basis by accounts that can't be shown to belong to known integration systems, reuse should not be allowed. -- Chris R. ====== Not to be taken literally, internally, or seriously. Twitter: http://twitter.com/offby1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From juraj.sukop at gmail.com Sun Jan 15 12:25:59 2017 From: juraj.sukop at gmail.com (Juraj Sukop) Date: Sun, 15 Jan 2017 18:25:59 +0100 Subject: [Python-ideas] Fused multiply-add (FMA) Message-ID: Hello! Fused multiply-add (henceforth FMA) is an operation which calculates the product of two numbers and then the sum of the product and a third number with just one floating-point rounding. More concretely: r = x*y + z The value of `r` is the same as if the RHS was calculated with infinite precision and the rounded to a 32-bit single-precision or 64-bit double-precision floating-point number [1]. Even though one FMA CPU instruction might be calculated faster than the two separate instructions for multiply and add, its main advantage comes from the increased precision of numerical computations that involve the accumulation of products. Examples which benefit from using FMA are: dot product [2], compensated arithmetic [3], polynomial evaluation [4], matrix multiplication, Newton's method and many more [5]. C99 includes `fma` function to `math.h` [6] and emulates the calculation if the FMA instruction is not present on the host CPU [7]. PEP 7 states that "Python versions greater than or equal to 3.6 use C89 with several select C99 features" and that "Future C99 features may be added to this list in the future depending on compiler support" [8]. This proposal is then about adding new `fma` function with the following signature to `math` module: math.fma(x, y, z) '''Return a float representing the result of the operation `x*y + z` with single rounding error, as defined by the platform C library. The result is the same as if the operation was carried with infinite precision and rounded to a floating-point number.''' There is a simple module for Python 3 demonstrating the fused multiply-add operation which was build with simple `python3 setup.py build` under Linux [9]. Any feedback is greatly appreciated! Juraj Sukop [1] https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation [2] S. Graillat, P. Langlois, N. Louvet. Accurate dot products with FMA. 2006 [3] S. Graillat, Accurate Floating Point Product and Exponentiation. 2007. [4] S. Graillat, P. Langlois, N. Louvet. Improving the compensated Horner scheme with a Fused Multiply and Add. 2006 [5] J.-M. Muller, N. Brisebarre, F. de Dinechin, C.-P. Jeannerod, V. Lef?vre, G. Melquiond, N. Revol, D. Stehl?, S. Torres. Handbook of Floating-Point Arithmetic. 2010. Chapter 5 [6] ISO/IEC 9899:TC3, "7.12.13.1 The fma functions", Committee Draft - Septermber 7, 2007 [7] https://git.musl-libc.org/cgit/musl/tree/src/math/fma.c [8] https://www.python.org/dev/peps/pep-0007/ [9] https://github.com/sukop/fma -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephanh42 at gmail.com Sun Jan 15 13:52:49 2017 From: stephanh42 at gmail.com (Stephan Houben) Date: Sun, 15 Jan 2017 19:52:49 +0100 Subject: [Python-ideas] Fused multiply-add (FMA) In-Reply-To: References: Message-ID: Hi Juraj, I think this would be a very useful addition to the `math' module. The gating issue is probably C compiler support. The most important non-C99 C compiler for Python is probably MS Visual Studio. And that one appears to support it: https://msdn.microsoft.com/en-us/library/mt720715.aspx So +1 on the proposal. Stephan 2017-01-15 18:25 GMT+01:00 Juraj Sukop : > Hello! > > Fused multiply-add (henceforth FMA) is an operation which calculates the > product of two numbers and then the sum of the product and a third number > with just one floating-point rounding. More concretely: > > r = x*y + z > > The value of `r` is the same as if the RHS was calculated with infinite > precision and the rounded to a 32-bit single-precision or 64-bit > double-precision floating-point number [1]. > > Even though one FMA CPU instruction might be calculated faster than the > two separate instructions for multiply and add, its main advantage comes > from the increased precision of numerical computations that involve the > accumulation of products. Examples which benefit from using FMA are: dot > product [2], compensated arithmetic [3], polynomial evaluation [4], matrix > multiplication, Newton's method and many more [5]. > > C99 includes `fma` function to `math.h` [6] and emulates the calculation > if the FMA instruction is not present on the host CPU [7]. PEP 7 states > that "Python versions greater than or equal to 3.6 use C89 with several > select C99 features" and that "Future C99 features may be added to this > list in the future depending on compiler support" [8]. > > This proposal is then about adding new `fma` function with the following > signature to `math` module: > > math.fma(x, y, z) > > '''Return a float representing the result of the operation `x*y + z` with > single rounding error, as defined by the platform C library. The result is > the same as if the operation was carried with infinite precision and > rounded to a floating-point number.''' > > There is a simple module for Python 3 demonstrating the fused multiply-add > operation which was build with simple `python3 setup.py build` under Linux > [9]. > > Any feedback is greatly appreciated! > > Juraj Sukop > > [1] https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation > [2] S. Graillat, P. Langlois, N. Louvet. Accurate dot products with FMA. > 2006 > [3] S. Graillat, Accurate Floating Point Product and Exponentiation. 2007. > [4] S. Graillat, P. Langlois, N. Louvet. Improving the compensated Horner > scheme with a Fused Multiply and Add. 2006 > [5] J.-M. Muller, N. Brisebarre, F. de Dinechin, C.-P. Jeannerod, V. > Lef?vre, G. Melquiond, N. Revol, D. Stehl?, S. Torres. Handbook of > Floating-Point Arithmetic. 2010. Chapter 5 > [6] ISO/IEC 9899:TC3, "7.12.13.1 The fma functions", Committee Draft - > Septermber 7, 2007 > [7] https://git.musl-libc.org/cgit/musl/tree/src/math/fma.c > [8] https://www.python.org/dev/peps/pep-0007/ > [9] https://github.com/sukop/fma > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sun Jan 15 14:10:30 2017 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 16 Jan 2017 06:10:30 +1100 Subject: [Python-ideas] Fused multiply-add (FMA) In-Reply-To: References: Message-ID: On Mon, Jan 16, 2017 at 4:25 AM, Juraj Sukop wrote: > There is a simple module for Python 3 demonstrating the fused multiply-add > operation which was build with simple `python3 setup.py build` under Linux > [9]. > > Any feedback is greatly appreciated! +1. Just tried it out, and apart from dropping a pretty little SystemError when I fat-finger the args wrong (a trivial matter of adding more argument checking), it looks good. Are there any possible consequences (not counting performance) of the fall-back? I don't understand all the code in what you linked to, but I think what's happening is that it goes to great lengths to avoid intermediate rounding, so the end result is always going to be the same. If that's the case, yeah, definite +1 on the proposal. ChrisA From ethan at stoneleaf.us Sun Jan 15 15:26:42 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 15 Jan 2017 12:26:42 -0800 Subject: [Python-ideas] [PEP-0541] On issues with reclaiming namespaces in PyPi In-Reply-To: References: Message-ID: <587BDB02.2050901@stoneleaf.us> On 01/15/2017 08:40 AM, Chris Rose wrote: > I want to address one gap in the PEP regarding reclaiming abandoned names. This should be on the DistUtils sig where the PEP is being discussed. -- ~Ethan~ From dickinsm at gmail.com Mon Jan 16 02:41:29 2017 From: dickinsm at gmail.com (Mark Dickinson) Date: Mon, 16 Jan 2017 07:41:29 +0000 Subject: [Python-ideas] Fused multiply-add (FMA) In-Reply-To: References: Message-ID: On Sun, Jan 15, 2017 at 5:25 PM, Juraj Sukop wrote: > This proposal is then about adding new `fma` function with the following > signature to `math` module: > > math.fma(x, y, z) Sounds good to me. Please could you open an issue on the bug tracker (http://bugs.python.org)? Thanks, Mark From victor.stinner at gmail.com Mon Jan 16 03:45:47 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 16 Jan 2017 09:45:47 +0100 Subject: [Python-ideas] Fused multiply-add (FMA) In-Reply-To: References: Message-ID: 2017-01-15 18:25 GMT+01:00 Juraj Sukop : > C99 includes `fma` function to `math.h` [6] and emulates the calculation if > the FMA instruction is not present on the host CPU [7]. If even the libc function has a fallback on x*y followed by +z, it's fine to add such function to the Python stdlib. It means that Python can do the same if the libc lacks a fma() function. In the math module, the trend is more to implement missing functions or add special code to workaround bugs or limitations of libc functions. Victor From stephanh42 at gmail.com Mon Jan 16 05:01:23 2017 From: stephanh42 at gmail.com (Stephan Houben) Date: Mon, 16 Jan 2017 11:01:23 +0100 Subject: [Python-ideas] Fused multiply-add (FMA) In-Reply-To: References: Message-ID: Hi Victor, The fallback implementations in the various libc take care to preserve the correct rounding behaviour. Let me stress that *fused* multiply-add means the specific rounding behaviour as defined in the standard IEEE-754 2008 (i.e. with guaranteed *no* intermediate rounding). So the following would not be a valid FMA fallback double bad_fma(double x, double y, double z) { return x*y + z; } Now in practice, people want FMA for two reasons. 1. They need the additional precision. 2. They want the performance of a hardware FMA instruction. Now, admittedly, the second category would be satisfied with the bad_fma fallback. However, I don't think 2. is a very compelling reason for fma *in pure Python code*, since the performance advantage would probably be dwarfed by interpreter overhead. So I would estimate that approx. 100% of the target audience of math.fma would want to use it for the increased accuracy. So providing a fallback which does not, in fact, give that accuracy would not make people happy. Upshot: if we want to provide a software fallback in the Python code, we need to do something slow and complicated like musl does. Possibly by actually using the musl code. Either that, or we rely on the Python-external libc implementation always. Stephan 2017-01-16 9:45 GMT+01:00 Victor Stinner : > 2017-01-15 18:25 GMT+01:00 Juraj Sukop : > > C99 includes `fma` function to `math.h` [6] and emulates the calculation > if > > the FMA instruction is not present on the host CPU [7]. > > If even the libc function has a fallback on x*y followed by +z, it's > fine to add such function to the Python stdlib. It means that Python > can do the same if the libc lacks a fma() function. In the math > module, the trend is more to implement missing functions or add > special code to workaround bugs or limitations of libc functions. > > Victor > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Mon Jan 16 06:04:48 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 16 Jan 2017 22:04:48 +1100 Subject: [Python-ideas] Fused multiply-add (FMA) In-Reply-To: References: Message-ID: <20170116110447.GV3887@ando.pearwood.info> On Mon, Jan 16, 2017 at 11:01:23AM +0100, Stephan Houben wrote: [...] > So the following would not be a valid FMA fallback > > double bad_fma(double x, double y, double z) { > return x*y + z; > } [...] > Upshot: if we want to provide a software fallback in the Python code, we > need to do something slow and complicated like musl does. I don't know about complicated. I think this is pretty simple: from fractions import Fraction def fma(x, y, z): # Return x*y + z with only a single rounding. return float(Fraction(x)*Fraction(y) + Fraction(z)) When speed is not the number one priority and accuracy is important, its hard to beat the fractions module. -- Steve From stephanh42 at gmail.com Mon Jan 16 10:02:38 2017 From: stephanh42 at gmail.com (Stephan Houben) Date: Mon, 16 Jan 2017 16:02:38 +0100 Subject: [Python-ideas] Fused multiply-add (FMA) In-Reply-To: <20170116110447.GV3887@ando.pearwood.info> References: <20170116110447.GV3887@ando.pearwood.info> Message-ID: Hi Steve, Very good! Here is a version which also handles the nan's, infinities, negative zeros properly. =============== import math from fractions import Fraction def fma2(x, y, z): if math.isfinite(x) and math.isfinite(y) and math.isfinite(z): result = float(Fraction(x)*Fraction(y) + Fraction(z)) if not result and not z: result = math.copysign(result, x*y+z) else: result = x * y + z assert not math.isfinite(result) return result =========================== Stephan 2017-01-16 12:04 GMT+01:00 Steven D'Aprano : > On Mon, Jan 16, 2017 at 11:01:23AM +0100, Stephan Houben wrote: > > [...] > > So the following would not be a valid FMA fallback > > > > double bad_fma(double x, double y, double z) { > > return x*y + z; > > } > [...] > > Upshot: if we want to provide a software fallback in the Python code, we > > need to do something slow and complicated like musl does. > > I don't know about complicated. I think this is pretty simple: > > from fractions import Fraction > > def fma(x, y, z): > # Return x*y + z with only a single rounding. > return float(Fraction(x)*Fraction(y) + Fraction(z)) > > > When speed is not the number one priority and accuracy is important, > its hard to beat the fractions module. > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gvanrossum at gmail.com Mon Jan 16 13:06:33 2017 From: gvanrossum at gmail.com (Guido van Rossum) Date: Mon, 16 Jan 2017 10:06:33 -0800 Subject: [Python-ideas] Fused multiply-add (FMA) In-Reply-To: References: <20170116110447.GV3887@ando.pearwood.info> Message-ID: Does numpy support this? --Guido (mobile) On Jan 16, 2017 7:27 AM, "Stephan Houben" wrote: > Hi Steve, > > Very good! > Here is a version which also handles the nan's, infinities, > negative zeros properly. > > =============== > import math > from fractions import Fraction > > def fma2(x, y, z): > if math.isfinite(x) and math.isfinite(y) and math.isfinite(z): > result = float(Fraction(x)*Fraction(y) + Fraction(z)) > if not result and not z: > result = math.copysign(result, x*y+z) > else: > result = x * y + z > assert not math.isfinite(result) > return result > =========================== > > Stephan > > > 2017-01-16 12:04 GMT+01:00 Steven D'Aprano : > >> On Mon, Jan 16, 2017 at 11:01:23AM +0100, Stephan Houben wrote: >> >> [...] >> > So the following would not be a valid FMA fallback >> > >> > double bad_fma(double x, double y, double z) { >> > return x*y + z; >> > } >> [...] >> > Upshot: if we want to provide a software fallback in the Python code, we >> > need to do something slow and complicated like musl does. >> >> I don't know about complicated. I think this is pretty simple: >> >> from fractions import Fraction >> >> def fma(x, y, z): >> # Return x*y + z with only a single rounding. >> return float(Fraction(x)*Fraction(y) + Fraction(z)) >> >> >> When speed is not the number one priority and accuracy is important, >> its hard to beat the fractions module. >> >> >> -- >> Steve >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mathieu.tortuyaux at gmail.com Mon Jan 16 13:29:12 2017 From: mathieu.tortuyaux at gmail.com (Mathieu TORTUYAUX) Date: Mon, 16 Jan 2017 10:29:12 -0800 (PST) Subject: [Python-ideas] Python dependancies In-Reply-To: References: Message-ID: <54648f4a-74aa-420d-80f4-7752decbc1f7@googlegroups.com> Thank you everyone for those feedbacks ! So I made a Django version to check if dependencies are up-to-date, using pip lib and get_outdated method. :) Le dimanche 15 janvier 2017 00:25:26 UTC-5, Mathieu TORTUYAUX a ?crit : > > Hello everyone, > > I'm used to work with python and contribute to open-source projects. And > now, many projects need to run with dependancies. So I wondering, if it > could be a good idea to integrate a sniffer into Python to detecte if > project's dependancies are up to date. > And each time Python project is run developer will be aware if > dependancies are up to date. > > > I think isn't the first time that this idea is submitted. So I am looking > forward your feedbacks ! > > Mathieu Tortuyaux > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screenshot from 2017-01-15 23-09-31.png Type: image/png Size: 30718 bytes Desc: not available URL: From mertz at gnosis.cx Mon Jan 16 13:44:06 2017 From: mertz at gnosis.cx (David Mertz) Date: Mon, 16 Jan 2017 10:44:06 -0800 Subject: [Python-ideas] Fused multiply-add (FMA) In-Reply-To: References: <20170116110447.GV3887@ando.pearwood.info> Message-ID: My understanding is that NumPy does NOT currently support a direct FMA operation "natively." However, higher-level routines like `numpy.linalg.solve` that are linked to MKL or BLAS DO take advantage of FMA within the underlying libraries. On Mon, Jan 16, 2017 at 10:06 AM, Guido van Rossum wrote: > Does numpy support this? > > --Guido (mobile) > > On Jan 16, 2017 7:27 AM, "Stephan Houben" wrote: > >> Hi Steve, >> >> Very good! >> Here is a version which also handles the nan's, infinities, >> negative zeros properly. >> >> =============== >> import math >> from fractions import Fraction >> >> def fma2(x, y, z): >> if math.isfinite(x) and math.isfinite(y) and math.isfinite(z): >> result = float(Fraction(x)*Fraction(y) + Fraction(z)) >> if not result and not z: >> result = math.copysign(result, x*y+z) >> else: >> result = x * y + z >> assert not math.isfinite(result) >> return result >> =========================== >> >> Stephan >> >> >> 2017-01-16 12:04 GMT+01:00 Steven D'Aprano : >> >>> On Mon, Jan 16, 2017 at 11:01:23AM +0100, Stephan Houben wrote: >>> >>> [...] >>> > So the following would not be a valid FMA fallback >>> > >>> > double bad_fma(double x, double y, double z) { >>> > return x*y + z; >>> > } >>> [...] >>> > Upshot: if we want to provide a software fallback in the Python code, >>> we >>> > need to do something slow and complicated like musl does. >>> >>> I don't know about complicated. I think this is pretty simple: >>> >>> from fractions import Fraction >>> >>> def fma(x, y, z): >>> # Return x*y + z with only a single rounding. >>> return float(Fraction(x)*Fraction(y) + Fraction(z)) >>> >>> >>> When speed is not the number one priority and accuracy is important, >>> its hard to beat the fractions module. >>> >>> >>> -- >>> Steve >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >>> >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg at krypto.org Mon Jan 16 14:28:22 2017 From: greg at krypto.org (Gregory P. Smith) Date: Mon, 16 Jan 2017 19:28:22 +0000 Subject: [Python-ideas] Fused multiply-add (FMA) In-Reply-To: References: <20170116110447.GV3887@ando.pearwood.info> Message-ID: Is there a good reason not to detect single expression multiply adds and just emit a new FMA bytecode? Is our goal for floats to strictly match the result of the same operations coded in unoptimized C using doubles? Or can we be more precise on occasion? I guess a similar question may be asked of all C compilers as they too could emit an FMA instruction on such expressions... If they don't do it by default, that suggests we match them and not do it either. Regardless +1 on adding math.fma() either way as it is an expression of precise intent. -gps On Mon, Jan 16, 2017, 10:44 AM David Mertz wrote: > My understanding is that NumPy does NOT currently support a direct FMA > operation "natively." However, higher-level routines like > `numpy.linalg.solve` that are linked to MKL or BLAS DO take advantage of > FMA within the underlying libraries. > > On Mon, Jan 16, 2017 at 10:06 AM, Guido van Rossum > wrote: > > Does numpy support this? > > --Guido (mobile) > > On Jan 16, 2017 7:27 AM, "Stephan Houben" wrote: > > Hi Steve, > > Very good! > Here is a version which also handles the nan's, infinities, > negative zeros properly. > > =============== > import math > from fractions import Fraction > > def fma2(x, y, z): > if math.isfinite(x) and math.isfinite(y) and math.isfinite(z): > result = float(Fraction(x)*Fraction(y) + Fraction(z)) > if not result and not z: > result = math.copysign(result, x*y+z) > else: > result = x * y + z > assert not math.isfinite(result) > return result > =========================== > > Stephan > > > 2017-01-16 12:04 GMT+01:00 Steven D'Aprano : > > On Mon, Jan 16, 2017 at 11:01:23AM +0100, Stephan Houben wrote: > > [...] > > So the following would not be a valid FMA fallback > > > > double bad_fma(double x, double y, double z) { > > return x*y + z; > > } > [...] > > Upshot: if we want to provide a software fallback in the Python code, we > > need to do something slow and complicated like musl does. > > I don't know about complicated. I think this is pretty simple: > > from fractions import Fraction > > def fma(x, y, z): > # Return x*y + z with only a single rounding. > return float(Fraction(x)*Fraction(y) + Fraction(z)) > > > When speed is not the number one priority and accuracy is important, > its hard to beat the fractions module. > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > > > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From rhodri at kynesim.co.uk Mon Jan 16 14:32:19 2017 From: rhodri at kynesim.co.uk (Rhodri James) Date: Mon, 16 Jan 2017 19:32:19 +0000 Subject: [Python-ideas] Python dependancies In-Reply-To: <54648f4a-74aa-420d-80f4-7752decbc1f7@googlegroups.com> References: <54648f4a-74aa-420d-80f4-7752decbc1f7@googlegroups.com> Message-ID: On 16/01/17 18:29, Mathieu TORTUYAUX wrote: > Thank you everyone for those feedbacks ! > > So I made a Django version to check if dependencies are up-to-date, using > pip lib and get_outdated method. :) Mathieu, please do not attach images to posts to this list/newsgroup. It is text-only, and some of the gateways will helpfully strip off the attachment. Please cut and paste error messages instead. For the benefit of those watching in black and white :-), here's the text in that image: $ python manage.py runserver Performing system checks... System check identified no issues (0 silenced). 1 package(s) is (are) not up-to-date Package Version Latest Type ---------- ------- ------ ----- hypothesis 3.5.0 3.6.1 sdist January 16, 2017 - 04:08:57 Django version 1.11, using settings 'test_dep.settings' Starting development server at http://127.0.0.1:8000/ Quit the server with CONTROL-C -- Rhodri James *-* Kynesim Ltd From srkunze at mail.de Mon Jan 16 15:21:00 2017 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 16 Jan 2017 21:21:00 +0100 Subject: [Python-ideas] Fused multiply-add (FMA) In-Reply-To: References: <20170116110447.GV3887@ando.pearwood.info> Message-ID: <255409c9-2dd6-a1eb-518b-8424595a507d@mail.de> On 16.01.2017 20:28, Gregory P. Smith wrote: > > Is there a good reason not to detect single expression multiply adds > and just emit a new FMA bytecode? > Same question here. From stephanh42 at gmail.com Tue Jan 17 04:21:09 2017 From: stephanh42 at gmail.com (Stephan Houben) Date: Tue, 17 Jan 2017 10:21:09 +0100 Subject: [Python-ideas] Fused multiply-add (FMA) In-Reply-To: References: <20170116110447.GV3887@ando.pearwood.info> Message-ID: Hi Gregory, 2017-01-16 20:28 GMT+01:00 Gregory P. Smith : > Is there a good reason not to detect single expression multiply adds and > just emit a new FMA bytecode? > Yes ;-) See below. > Is our goal for floats to strictly match the result of the same operations > coded in unoptimized C using doubles? > I think it should be. This determinism is a feature, i.e. it is of value to some, although not to everybody. The cost of this determinism if a possible loss of performance, but as I already mentioned in an earlier mail, I do not believe this cost would be observable in pure Python code. And anyway, people who care about numerical performance to that extent are all using Numpy. > Or can we be more precise on occasion? > Being more precise on occasion is only valuable if the occasion can be predicted/controlled by the programmer. (In this I assume you are not proposing that x*y+z would be guaranteed to produce an FMA on *all* platforms, even those lacking a hardware FMA. That would be very expensive.) Generally speaking, there are two reasons why people may *not* want an FMA operation. 1. They need their results to be reproducible across compilers/platforms. (the most common reason) 2. The correctness of their algorithm depends on the intermediate rounding step being done. As an example of the second, take for example the cross product of two 2D vectors: def cross(a, b): return a[0]*b[1] - b[0] * a[1] In exact mathematics, this operation has the property that cross(a, b) == -cross(b,a). In the current Python implementation, this property is preserved. Synthesising an FMA would destroy it. I guess a similar question may be asked of all C compilers as they too > could emit an FMA instruction on such expressions... If they don't do it by > default, that suggests we match them and not do it either. > C99 has defined #pragma's to let the programmer indicate if they care about the strict FP model or not. So in C99 I can express the following three options: 1. I need an FMA, give it to me even if it needs to be emulated expensively in software: fma(x, y, z) 2. I do NOT want an FMA, please do intermediate rounding: #pragma STDC FP_CONTRACT OFF x*y + z 3. I don't care if you do intermediate rounding or not, just give me what is fastest: #pragma STDC FP_CONTRACT ON x*y + z Note that a conforming compiler can simply ignore FP_CONTRACT as long as it never generates an FMA for "x*y+z". This is what GCC does in -std mode. It's what I would recommend for Python. Regardless +1 on adding math.fma() either way as it is an expression of > precise intent. > Yep. Stephan > -gps > > On Mon, Jan 16, 2017, 10:44 AM David Mertz wrote: > >> My understanding is that NumPy does NOT currently support a direct FMA >> operation "natively." However, higher-level routines like >> `numpy.linalg.solve` that are linked to MKL or BLAS DO take advantage of >> FMA within the underlying libraries. >> >> On Mon, Jan 16, 2017 at 10:06 AM, Guido van Rossum >> wrote: >> >> Does numpy support this? >> >> --Guido (mobile) >> >> On Jan 16, 2017 7:27 AM, "Stephan Houben" wrote: >> >> Hi Steve, >> >> Very good! >> Here is a version which also handles the nan's, infinities, >> negative zeros properly. >> >> =============== >> import math >> from fractions import Fraction >> >> def fma2(x, y, z): >> if math.isfinite(x) and math.isfinite(y) and math.isfinite(z): >> result = float(Fraction(x)*Fraction(y) + Fraction(z)) >> if not result and not z: >> result = math.copysign(result, x*y+z) >> else: >> result = x * y + z >> assert not math.isfinite(result) >> return result >> =========================== >> >> Stephan >> >> >> 2017-01-16 12:04 GMT+01:00 Steven D'Aprano : >> >> On Mon, Jan 16, 2017 at 11:01:23AM +0100, Stephan Houben wrote: >> >> [...] >> > So the following would not be a valid FMA fallback >> > >> > double bad_fma(double x, double y, double z) { >> > return x*y + z; >> > } >> [...] >> > Upshot: if we want to provide a software fallback in the Python code, we >> > need to do something slow and complicated like musl does. >> >> I don't know about complicated. I think this is pretty simple: >> >> from fractions import Fraction >> >> def fma(x, y, z): >> # Return x*y + z with only a single rounding. >> return float(Fraction(x)*Fraction(y) + Fraction(z)) >> >> >> When speed is not the number one priority and accuracy is important, >> its hard to beat the fractions module. >> >> >> -- >> Steve >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> >> >> >> -- >> Keeping medicines from the bloodstreams of the sick; food >> from the bellies of the hungry; books from the hands of the >> uneducated; technology from the underdeveloped; and putting >> advocates of freedom in prisons. Intellectual property is >> to the 21st century what the slave trade was to the 16th. >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From xavier.combelle at gmail.com Tue Jan 17 10:04:29 2017 From: xavier.combelle at gmail.com (Xavier Combelle) Date: Tue, 17 Jan 2017 16:04:29 +0100 Subject: [Python-ideas] Fused multiply-add (FMA) In-Reply-To: References: <20170116110447.GV3887@ando.pearwood.info> Message-ID: <72b9a44d-939e-2748-9e44-2b49295a4ce7@gmail.com> > Generally speaking, there are two reasons why people may *not* want an > FMA operation. > 1. They need their results to be reproducible across > compilers/platforms. (the most common reason) > The reproducibility of floating point calculation is very hard to reach a good survey of the problem is https://randomascii.wordpress.com/2013/07/16/floating-point-determinism/ it mention the fma problem but it only a part of a biggest picture -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephanh42 at gmail.com Tue Jan 17 10:48:17 2017 From: stephanh42 at gmail.com (Stephan Houben) Date: Tue, 17 Jan 2017 16:48:17 +0100 Subject: [Python-ideas] Fused multiply-add (FMA) In-Reply-To: <72b9a44d-939e-2748-9e44-2b49295a4ce7@gmail.com> References: <20170116110447.GV3887@ando.pearwood.info> <72b9a44d-939e-2748-9e44-2b49295a4ce7@gmail.com> Message-ID: Hi Xavier, In this bright age of IEEE-754 compatible CPUs, it is certainly possible to achieve reproducible FP. I worked for a company whose software produced bit-identical results on various CPUs (x86, Sparc, Itanium) and OSes (Linux, Solaris, Windows). The trick is to closely RTFM for your CPU and compiler, in particular all those nice appendices related to "FPU control words" and "FP consistency models". For example, if the author of that article had done so, he might have learned about the "precision control" field of the x87 status register, which you can set so that all intermediate operations are always represented as 64-bits doubles. So no double roundings from double-extended precision. (Incidentally, the x87-internal double-extended precision is another fine example where being "more precise on occasion" usually does not help.) Frankly not very impressed with that article. I could go in detail but that's off-topic, and I will try to fight the "somebody is *wrong* on the Internet" urge. Stephan 2017-01-17 16:04 GMT+01:00 Xavier Combelle : > > Generally speaking, there are two reasons why people may *not* want an FMA > operation. > 1. They need their results to be reproducible across compilers/platforms. > (the most common reason) > > The reproducibility of floating point calculation is very hard to reach a > good survey of the problem is https://randomascii.wordpress. > com/2013/07/16/floating-point-determinism/ it mention the fma problem but > it only a part of a biggest picture > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg at krypto.org Tue Jan 17 12:16:04 2017 From: greg at krypto.org (Gregory P. Smith) Date: Tue, 17 Jan 2017 17:16:04 +0000 Subject: [Python-ideas] Fused multiply-add (FMA) In-Reply-To: References: <20170116110447.GV3887@ando.pearwood.info> <72b9a44d-939e-2748-9e44-2b49295a4ce7@gmail.com> Message-ID: Makes sense, thanks! math.fma() it is. :) On Tue, Jan 17, 2017, 7:48 AM Stephan Houben wrote: > Hi Xavier, > > In this bright age of IEEE-754 compatible CPUs, > it is certainly possible to achieve reproducible FP. > I worked for a company whose software produced bit-identical > results on various CPUs (x86, Sparc, Itanium) and OSes (Linux, Solaris, > Windows). > > The trick is to closely RTFM for your CPU and compiler, in particular all > those nice > appendices related to "FPU control words" and "FP consistency models". > > For example, if the author of that article had done so, he might have > learned > about the "precision control" field of the x87 status register, which you > can set > so that all intermediate operations are always represented as 64-bits > doubles. > So no double roundings from double-extended precision. > > (Incidentally, the x87-internal double-extended precision is another fine > example where > being "more precise on occasion" usually does not help.) > > Frankly not very impressed with that article. > I could go in detail but that's off-topic, and I will try to fight > the "somebody is *wrong* on the Internet" urge. > > Stephan > > 2017-01-17 16:04 GMT+01:00 Xavier Combelle : > > > Generally speaking, there are two reasons why people may *not* want an FMA > operation. > 1. They need their results to be reproducible across compilers/platforms. > (the most common reason) > > The reproducibility of floating point calculation is very hard to reach a > good survey of the problem is > https://randomascii.wordpress.com/2013/07/16/floating-point-determinism/ > it mention the fma problem but it only a part of a biggest picture > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From xavier.combelle at gmail.com Tue Jan 17 13:12:34 2017 From: xavier.combelle at gmail.com (Xavier Combelle) Date: Tue, 17 Jan 2017 19:12:34 +0100 Subject: [Python-ideas] Fused multiply-add (FMA) In-Reply-To: References: <20170116110447.GV3887@ando.pearwood.info> <72b9a44d-939e-2748-9e44-2b49295a4ce7@gmail.com> Message-ID: <203dbd4f-8baa-c03a-2f19-8ac5563bc52a@gmail.com> I never said it was impossible, just very hard. Le 17/01/2017 ? 16:48, Stephan Houben a ?crit : > Hi Xavier, > > In this bright age of IEEE-754 compatible CPUs, > it is certainly possible to achieve reproducible FP. > I worked for a company whose software produced bit-identical > results on various CPUs (x86, Sparc, Itanium) and OSes (Linux, > Solaris, Windows). > > The trick is to closely RTFM for your CPU and compiler, in particular > all those nice > appendices related to "FPU control words" and "FP consistency models". > > For example, if the author of that article had done so, he might have > learned > about the "precision control" field of the x87 status register, which > you can set > so that all intermediate operations are always represented as 64-bits > doubles. > So no double roundings from double-extended precision. > > (Incidentally, the x87-internal double-extended precision is another > fine example where > being "more precise on occasion" usually does not help.) > > Frankly not very impressed with that article. > I could go in detail but that's off-topic, and I will try to fight > the "somebody is *wrong* on the Internet" urge. > > Stephan > > 2017-01-17 16:04 GMT+01:00 Xavier Combelle >: > > >> Generally speaking, there are two reasons why people may *not* >> want an FMA operation. >> 1. They need their results to be reproducible across >> compilers/platforms. (the most common reason) >> > The reproducibility of floating point calculation is very hard to > reach a good survey of the problem is > https://randomascii.wordpress.com/2013/07/16/floating-point-determinism/ > > it mention the fma problem but it only a part of a biggest picture > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cs at zip.com.au Wed Jan 18 00:21:04 2017 From: cs at zip.com.au (Cameron Simpson) Date: Wed, 18 Jan 2017 16:21:04 +1100 Subject: [Python-ideas] proposal: "python -m foo" should bind sys.modules['foo'] In-Reply-To: References: Message-ID: <20170118052104.GA86569@cskk.homeip.net> Trying to get back to speed with PEP-499... On 06Aug2015 13:26, Nick Coghlan wrote: >On 6 August 2015 at 10:07, Cameron Simpson wrote: >> I suspect "How Reloading Will Work" would need to track both module.__name__ >> and module.__spec__.name to reattach the module to both entires in >> sys.modules. > >Conveniently, the fact that reloading rewrites the global namespace of >the existing module, rather than creating the new module, means that >the dual references won't create any new problems relating to multiple >references - we already hit those issues due to the fact that modules >refer directly to each from their module namespaces. [...] >> Also, where do I find to source for runpy to preruse? > >It's a standard library module: >https://hg.python.org/cpython/file/default/Lib/runpy.py > >"_run_module_as_main" is the module level function that powers the "-m" switch. > >Actually *implementing* this change should be as simple as changing the line: > > main_globals = sys.modules["__main__"].__dict__ > >to instead be: > > main_module = sys.modules["__main__"] > sys.modules[mod_spec.name] = main_module > main_globals = main_module.__dict__ I'd just like to check that my thinking is correct here. The above looks very easy, but Joseph Jevnik pointed out that packages already do this correctly (and slightly differently, as __main__ is the main module and __init__ is what is in sys.modules): https://bitbucket.org/cameron_simpson/pep-0499/commits/3efcd9b54e238a1ff7f5c5df805df13 I'm about to try this: [~/s/cpython-pep499(hg)]fleet*> hg diff diff --git a/Lib/runpy.py b/Lib/runpy.py --- a/Lib/runpy.py +++ b/Lib/runpy.py @@ -186,7 +186,10 @@ def _run_module_as_main(mod_name, alter_ except _Error as exc: msg = "%s: %s" % (sys.executable, exc) sys.exit(msg) - main_globals = sys.modules["__main__"].__dict__ + main_module = sys.modules["__main__"] + if not main_module.is_package(mod_spec.name): + sys.modules[mod_spec.name] = main_module + main_globals = main_module.__dict__ if alter_argv: sys.argv[0] = mod_spec.origin return _run_code(code, main_globals, None, locally. Does this seem sound? Cheers, Cameron Simpson From elizabeth at interlinked.me Wed Jan 18 05:24:39 2017 From: elizabeth at interlinked.me (Elizabeth Myers) Date: Wed, 18 Jan 2017 04:24:39 -0600 Subject: [Python-ideas] Ideas for improving the struct module Message-ID: <3fed1808-58a6-2f9a-487f-db865c5b3c56@interlinked.me> Hello, I've noticed a lot of binary protocols require variable length bytestrings (with or without a null terminator), but it is not easy to unpack these in Python without first reading the desired length, or reading bytes until a null terminator is reached. I've noticed the netstruct library (https://github.com/stendec/netstruct) has a format specifier, $, which assumes the previous type to pack/unpack is the string's length. This is an interesting idea in of itself, but doesn't handle the null-terminated string chase. I know $ is similar to pascal strings, but sometimes you need more than 255 characters :p. For null-terminated strings, it may be simpler to have a specifier for those. I propose 0, but this point can be bikeshedded over endlessly if desired ;) (I thought about using n/N but they're :P). It's worth noting that (maybe one of?) Perl's equivalent to the struct module, whose name escapes me atm, has a module which can handle this case. I can't remember if it handled variable length or zero-terminated though; maybe it did both. Perl is more or less my 10th language. :p This pain point is an annoyance imo and would greatly simplify a lot of code if implemented, or something like it. I'd be happy to take a look at implementing it if the idea is received sufficiently warmly. -- Elizabeth From xdegaye at gmail.com Wed Jan 18 10:51:21 2017 From: xdegaye at gmail.com (Xavier de Gaye) Date: Wed, 18 Jan 2017 16:51:21 +0100 Subject: [Python-ideas] PEP 538: Coercing the legacy C locale to C.UTF-8 In-Reply-To: References: Message-ID: > On Android, the locale settings are of limited relevance (due to most > applications running in the UTF-16-LE based Dalvik environment) and there's > limited value in preserving backwards compatibility with other locale aware > C/C++ components in the same process (since it's a relatively new target > platform for CPython), so CPython bypasses the operating system provided APIs > and hardcodes the use of UTF-8 (similar to its behaviour on Apple platforms). FWIW the default locale seems to be UTF-8 for java applications, the public abstract class Charset Android documentation [1] says for the defaultCharset() method: "Android note: The Android platform default is always UTF-8." and wide character functions in the NDK use the UTF-8 encoding whatever the locale set by setlocale(), see the test run by Chi Hsuan Yen in [2]. Xavier [1] https://developer.android.com/reference/java/nio/charset/Charset.html [2] http://bugs.python.org/issue26928#msg281110 From spitz.dan.l at gmail.com Wed Jan 18 12:08:02 2017 From: spitz.dan.l at gmail.com (Daniel Spitz) Date: Wed, 18 Jan 2017 17:08:02 +0000 Subject: [Python-ideas] Ideas for improving the struct module In-Reply-To: <3fed1808-58a6-2f9a-487f-db865c5b3c56@interlinked.me> References: <3fed1808-58a6-2f9a-487f-db865c5b3c56@interlinked.me> Message-ID: +1 on the idea of supporting variable-length strings with the length encoded in the preceding packed element! Several months ago I was trying to write a parser and writer of PostgreSQL's COPY ... WITH BINARY format. I started out trying to implement it in pure python using the struct module. Due to the existence of variable-length strings encoded in precisely the way you mention, it was not possible to parse an entire row of data without invoking any pure-python-level logic. This made the implementation infeasibly slow. I had to switch to using cython to get it done fast enough (implementation is here: https://github.com/spitz-dan-l/postgres-binary-parser). I believe that with this single change ($, or whatever format specifier one wishes to use), assuming it were implemented efficiently in c, I could have avoided using cython and gotten a satisfactory level of performance with the struct module and python/numpy's already-performant bytestring manipulation faculties. -Dan Spitz On Wed, Jan 18, 2017 at 5:32 AM Elizabeth Myers wrote: > Hello, > > I've noticed a lot of binary protocols require variable length > bytestrings (with or without a null terminator), but it is not easy to > unpack these in Python without first reading the desired length, or > reading bytes until a null terminator is reached. > > I've noticed the netstruct library > (https://github.com/stendec/netstruct) has a format specifier, $, which > assumes the previous type to pack/unpack is the string's length. This is > an interesting idea in of itself, but doesn't handle the null-terminated > string chase. I know $ is similar to pascal strings, but sometimes you > need more than 255 characters :p. > > For null-terminated strings, it may be simpler to have a specifier for > those. I propose 0, but this point can be bikeshedded over endlessly if > desired ;) (I thought about using n/N but they're :P). > > It's worth noting that (maybe one of?) Perl's equivalent to the struct > module, whose name escapes me atm, has a module which can handle this > case. I can't remember if it handled variable length or zero-terminated > though; maybe it did both. Perl is more or less my 10th language. :p > > This pain point is an annoyance imo and would greatly simplify a lot of > code if implemented, or something like it. I'd be happy to take a look > at implementing it if the idea is received sufficiently warmly. > > -- > Elizabeth > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Wed Jan 18 20:27:20 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 19 Jan 2017 12:27:20 +1100 Subject: [Python-ideas] Ideas for improving the struct module In-Reply-To: <3fed1808-58a6-2f9a-487f-db865c5b3c56@interlinked.me> References: <3fed1808-58a6-2f9a-487f-db865c5b3c56@interlinked.me> Message-ID: <20170119012719.GB7345@ando.pearwood.info> On Wed, Jan 18, 2017 at 04:24:39AM -0600, Elizabeth Myers wrote: > Hello, > > I've noticed a lot of binary protocols require variable length > bytestrings (with or without a null terminator), but it is not easy to > unpack these in Python without first reading the desired length, or > reading bytes until a null terminator is reached. This sounds like a fairly straight-forward feature request for the struct module, which probably could go straight to the bug tracker. Unfortunately I can't *quite* work out what the feature request is :-) If you're asking for struct to support Pascal strings, with a single byte (0...255) for the length, it already does with format code "p". I was going to suggest P for "large" Pascal string, with the length given by *two* bytes rather than one (0...65535), but P is already in use. Are you proposing the "$" format code from netstruct? That would be interesting, as it would allow format codes: B$ standard Pascal string, like p I$ Pascal string with a two-byte length L$ Pascal string with a four-byte length 4294967295 bytes should be enough for anyone :-) Another common format is "ASCIIZ", or a one-byte Pascal string including a null terminator. People actually use this: http://stackoverflow.com/questions/11850950/unpacking-a-struct-ending-with-an-asciiz-string Which just leaves C-style null terminated strings. c/n/N are all already in use; I guess that C (for C-string) or S (for c-String) are possibilities. All of these seem like perfectly reasonable formats for the struct module to support. They're all in use. struct already supports variable-width formats. I think its just a matter of raising one or more feature requests, and then doing the work. I guess this is just my long-winded way of saying +1. -- Steve From dickinsm at gmail.com Thu Jan 19 03:31:03 2017 From: dickinsm at gmail.com (Mark Dickinson) Date: Thu, 19 Jan 2017 08:31:03 +0000 Subject: [Python-ideas] Ideas for improving the struct module In-Reply-To: <20170119012719.GB7345@ando.pearwood.info> References: <3fed1808-58a6-2f9a-487f-db865c5b3c56@interlinked.me> <20170119012719.GB7345@ando.pearwood.info> Message-ID: On Thu, Jan 19, 2017 at 1:27 AM, Steven D'Aprano wrote: > [...] struct already supports > variable-width formats. Unfortunately, that's not really true: the Pascal strings it supports are in some sense variable length, but are stored in a fixed-width field. The internals of the struct module rely on each field starting at a fixed offset, computable directly from the format string. I don't think variable-length fields would be a good fit for the current design of the struct module. For the OPs use-case, I'd suggest a library that sits on top of the struct module, rather than an expansion to the struct module itself. -- Mark From rhodri at kynesim.co.uk Thu Jan 19 06:58:04 2017 From: rhodri at kynesim.co.uk (Rhodri James) Date: Thu, 19 Jan 2017 11:58:04 +0000 Subject: [Python-ideas] Ideas for improving the struct module In-Reply-To: References: <3fed1808-58a6-2f9a-487f-db865c5b3c56@interlinked.me> <20170119012719.GB7345@ando.pearwood.info> Message-ID: <6c517754-8f89-74e5-ebe0-57d8bc81004d@kynesim.co.uk> On 19/01/17 08:31, Mark Dickinson wrote: > On Thu, Jan 19, 2017 at 1:27 AM, Steven D'Aprano wrote: >> [...] struct already supports >> variable-width formats. > > Unfortunately, that's not really true: the Pascal strings it supports > are in some sense variable length, but are stored in a fixed-width > field. The internals of the struct module rely on each field starting > at a fixed offset, computable directly from the format string. I don't > think variable-length fields would be a good fit for the current > design of the struct module. > > For the OPs use-case, I'd suggest a library that sits on top of the > struct module, rather than an expansion to the struct module itself. Unfortunately as the OP explained, this makes the struct module a poor fit for protocol decoding, even as a base layer for something. It's one of the things I use python for quite frequently, and I always end up rolling my own and discarding struct entirely. -- Rhodri James *-* Kynesim Ltd From elizabeth at interlinked.me Thu Jan 19 07:47:40 2017 From: elizabeth at interlinked.me (Elizabeth Myers) Date: Thu, 19 Jan 2017 06:47:40 -0600 Subject: [Python-ideas] Ideas for improving the struct module In-Reply-To: <6c517754-8f89-74e5-ebe0-57d8bc81004d@kynesim.co.uk> References: <3fed1808-58a6-2f9a-487f-db865c5b3c56@interlinked.me> <20170119012719.GB7345@ando.pearwood.info> <6c517754-8f89-74e5-ebe0-57d8bc81004d@kynesim.co.uk> Message-ID: <281a3c57-25e7-f71c-bc2f-bda10c880c2f@interlinked.me> On 19/01/17 05:58, Rhodri James wrote: > On 19/01/17 08:31, Mark Dickinson wrote: >> On Thu, Jan 19, 2017 at 1:27 AM, Steven D'Aprano >> wrote: >>> [...] struct already supports >>> variable-width formats. >> >> Unfortunately, that's not really true: the Pascal strings it supports >> are in some sense variable length, but are stored in a fixed-width >> field. The internals of the struct module rely on each field starting >> at a fixed offset, computable directly from the format string. I don't >> think variable-length fields would be a good fit for the current >> design of the struct module. >> >> For the OPs use-case, I'd suggest a library that sits on top of the >> struct module, rather than an expansion to the struct module itself. > > Unfortunately as the OP explained, this makes the struct module a poor > fit for protocol decoding, even as a base layer for something. It's one > of the things I use python for quite frequently, and I always end up > rolling my own and discarding struct entirely. > Yes, for variable-length fields the struct module is worse than useless: it actually reduces clarity a little. Consider: >>> test_bytes = b'\x00\x00\x00\x0chello world!' With this, you can do: >>> length = int.from_bytes(test_bytes[:4], 'big') >>> string = test_bytes[4:length] or you can do: >>> length = struct.unpack_from('!I', test_bytes)[0] >>> string = struct.unpack_from('{}s'.format(length), test_bytes, 4)[0] Which looks more readable without consulting the docs? ;) Building anything on top of the struct library like this would lead to worse-looking code for minimal gains in efficiency. To quote Jamie Zawinksi, it is like building a bookshelf out of mashed potatoes as it stands. If we had an extension similar to netstruct: >>> length, string = struct.unpack('!I$', test_bytes) MUCH improved readability, and also less verbose. :) From elizabeth at interlinked.me Thu Jan 19 13:08:39 2017 From: elizabeth at interlinked.me (Elizabeth Myers) Date: Thu, 19 Jan 2017 12:08:39 -0600 Subject: [Python-ideas] Ideas for improving the struct module In-Reply-To: <281a3c57-25e7-f71c-bc2f-bda10c880c2f@interlinked.me> References: <3fed1808-58a6-2f9a-487f-db865c5b3c56@interlinked.me> <20170119012719.GB7345@ando.pearwood.info> <6c517754-8f89-74e5-ebe0-57d8bc81004d@kynesim.co.uk> <281a3c57-25e7-f71c-bc2f-bda10c880c2f@interlinked.me> Message-ID: <3c707925-5d67-fd26-80d0-72138b9ba4a5@interlinked.me> On 19/01/17 06:47, Elizabeth Myers wrote: > On 19/01/17 05:58, Rhodri James wrote: >> On 19/01/17 08:31, Mark Dickinson wrote: >>> On Thu, Jan 19, 2017 at 1:27 AM, Steven D'Aprano >>> wrote: >>>> [...] struct already supports >>>> variable-width formats. >>> >>> Unfortunately, that's not really true: the Pascal strings it supports >>> are in some sense variable length, but are stored in a fixed-width >>> field. The internals of the struct module rely on each field starting >>> at a fixed offset, computable directly from the format string. I don't >>> think variable-length fields would be a good fit for the current >>> design of the struct module. >>> >>> For the OPs use-case, I'd suggest a library that sits on top of the >>> struct module, rather than an expansion to the struct module itself. >> >> Unfortunately as the OP explained, this makes the struct module a poor >> fit for protocol decoding, even as a base layer for something. It's one >> of the things I use python for quite frequently, and I always end up >> rolling my own and discarding struct entirely. >> > > Yes, for variable-length fields the struct module is worse than useless: > it actually reduces clarity a little. Consider: > >>>> test_bytes = b'\x00\x00\x00\x0chello world!' > > With this, you can do: > >>>> length = int.from_bytes(test_bytes[:4], 'big') >>>> string = test_bytes[4:length] > > or you can do: > >>>> length = struct.unpack_from('!I', test_bytes)[0] >>>> string = struct.unpack_from('{}s'.format(length), test_bytes, 4)[0] > > Which looks more readable without consulting the docs? ;) > > Building anything on top of the struct library like this would lead to > worse-looking code for minimal gains in efficiency. To quote Jamie > Zawinksi, it is like building a bookshelf out of mashed potatoes as it > stands. > > If we had an extension similar to netstruct: > >>>> length, string = struct.unpack('!I$', test_bytes) > > MUCH improved readability, and also less verbose. :) I also didn't mention that when you are unpacking iteratively (e.g., you have multiple strings), the code becomes a bit more hairy: >>> test_bytes = b'\x00\x05hello\x00\x07goodbye\x00\x04test' >>> offset = 0 >>> while offset < len(test_bytes): ... length = struct.unpack_from('!H', test_bytes, offset)[0] ... offset += 2 ... string = struct.unpack_from('{}s'.format(length), test_bytes, offset)[0] ... offset += length It actually gets a lot worse when you have to unpack a set of strings in a context-sensitive manner. You have to be sure to update the offset constantly so you can always unpack strings appropriately. Yuck! It's worth mentioning that a few years ago, a coworker and I found ourselves needing variable length strings in the context of a binary protocol (DHCP), and wound up abandoning the struct module entirely because it was unsuitable. My co-worker said the same thing I did: "it's like building a bookshelf out of mashed potatoes." I do understand it might require a possible major rewrite or major changes the struct module, but in the long run, I think it's worth it (especially because the struct module is not all that big in scope). As it stands, the struct module simply is not suited for protocols where you have variable-length strings, and in my experience, that is the vast majority of modern binary protocols on the Internet. -- Elizabeth From rosuav at gmail.com Thu Jan 19 13:16:28 2017 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 20 Jan 2017 05:16:28 +1100 Subject: [Python-ideas] Ideas for improving the struct module In-Reply-To: <3c707925-5d67-fd26-80d0-72138b9ba4a5@interlinked.me> References: <3fed1808-58a6-2f9a-487f-db865c5b3c56@interlinked.me> <20170119012719.GB7345@ando.pearwood.info> <6c517754-8f89-74e5-ebe0-57d8bc81004d@kynesim.co.uk> <281a3c57-25e7-f71c-bc2f-bda10c880c2f@interlinked.me> <3c707925-5d67-fd26-80d0-72138b9ba4a5@interlinked.me> Message-ID: On Fri, Jan 20, 2017 at 5:08 AM, Elizabeth Myers wrote: > I do understand it might require a possible major rewrite or major > changes the struct module, but in the long run, I think it's worth it > (especially because the struct module is not all that big in scope). As > it stands, the struct module simply is not suited for protocols where > you have variable-length strings, and in my experience, that is the vast > majority of modern binary protocols on the Internet. > To be fair, the name "struct" implies a C-style structure, which _does_ have a fixed size, or at least fixed offsets for its members (the last member can be variable-sized). A quick search of PyPI shows up a struct-variant specifically designed for network protocols: https://pypi.python.org/pypi/netstruct/1.1.2 It even uses the dollar sign as you describe. So perhaps what you're looking for is this module coming into the stdlib? ChrisA From jsbueno at python.org.br Thu Jan 19 13:17:44 2017 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Thu, 19 Jan 2017 16:17:44 -0200 Subject: [Python-ideas] Ideas for improving the struct module In-Reply-To: <3c707925-5d67-fd26-80d0-72138b9ba4a5@interlinked.me> References: <3fed1808-58a6-2f9a-487f-db865c5b3c56@interlinked.me> <20170119012719.GB7345@ando.pearwood.info> <6c517754-8f89-74e5-ebe0-57d8bc81004d@kynesim.co.uk> <281a3c57-25e7-f71c-bc2f-bda10c880c2f@interlinked.me> <3c707925-5d67-fd26-80d0-72138b9ba4a5@interlinked.me> Message-ID: I am for upgrading struct to these, if possible. But besides my +1, I am writting in to remember folks thatthere is another "struct" model in the stdlib: ctypes.Structure - For reading a lot of records with the same structure it is much more handy than struct, since it gives one a suitable Python object on instantiation. However, it also can't handle variable lenght fields automatically. But maybe, the improvement could be made on that side, or another package altogether taht works more like it than current "struct". On 19 January 2017 at 16:08, Elizabeth Myers wrote: > On 19/01/17 06:47, Elizabeth Myers wrote: >> On 19/01/17 05:58, Rhodri James wrote: >>> On 19/01/17 08:31, Mark Dickinson wrote: >>>> On Thu, Jan 19, 2017 at 1:27 AM, Steven D'Aprano >>>> wrote: >>>>> [...] struct already supports >>>>> variable-width formats. >>>> >>>> Unfortunately, that's not really true: the Pascal strings it supports >>>> are in some sense variable length, but are stored in a fixed-width >>>> field. The internals of the struct module rely on each field starting >>>> at a fixed offset, computable directly from the format string. I don't >>>> think variable-length fields would be a good fit for the current >>>> design of the struct module. >>>> >>>> For the OPs use-case, I'd suggest a library that sits on top of the >>>> struct module, rather than an expansion to the struct module itself. >>> >>> Unfortunately as the OP explained, this makes the struct module a poor >>> fit for protocol decoding, even as a base layer for something. It's one >>> of the things I use python for quite frequently, and I always end up >>> rolling my own and discarding struct entirely. >>> >> >> Yes, for variable-length fields the struct module is worse than useless: >> it actually reduces clarity a little. Consider: >> >>>>> test_bytes = b'\x00\x00\x00\x0chello world!' >> >> With this, you can do: >> >>>>> length = int.from_bytes(test_bytes[:4], 'big') >>>>> string = test_bytes[4:length] >> >> or you can do: >> >>>>> length = struct.unpack_from('!I', test_bytes)[0] >>>>> string = struct.unpack_from('{}s'.format(length), test_bytes, 4)[0] >> >> Which looks more readable without consulting the docs? ;) >> >> Building anything on top of the struct library like this would lead to >> worse-looking code for minimal gains in efficiency. To quote Jamie >> Zawinksi, it is like building a bookshelf out of mashed potatoes as it >> stands. >> >> If we had an extension similar to netstruct: >> >>>>> length, string = struct.unpack('!I$', test_bytes) >> >> MUCH improved readability, and also less verbose. :) > > I also didn't mention that when you are unpacking iteratively (e.g., you > have multiple strings), the code becomes a bit more hairy: > >>>> test_bytes = b'\x00\x05hello\x00\x07goodbye\x00\x04test' >>>> offset = 0 >>>> while offset < len(test_bytes): > ... length = struct.unpack_from('!H', test_bytes, offset)[0] > ... offset += 2 > ... string = struct.unpack_from('{}s'.format(length), test_bytes, > offset)[0] > ... offset += length > > It actually gets a lot worse when you have to unpack a set of strings in > a context-sensitive manner. You have to be sure to update the offset > constantly so you can always unpack strings appropriately. Yuck! > > It's worth mentioning that a few years ago, a coworker and I found > ourselves needing variable length strings in the context of a binary > protocol (DHCP), and wound up abandoning the struct module entirely > because it was unsuitable. My co-worker said the same thing I did: "it's > like building a bookshelf out of mashed potatoes." > > I do understand it might require a possible major rewrite or major > changes the struct module, but in the long run, I think it's worth it > (especially because the struct module is not all that big in scope). As > it stands, the struct module simply is not suited for protocols where > you have variable-length strings, and in my experience, that is the vast > majority of modern binary protocols on the Internet. > > -- > Elizabeth > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From prometheus235 at gmail.com Thu Jan 19 13:41:46 2017 From: prometheus235 at gmail.com (Nick Timkovich) Date: Thu, 19 Jan 2017 12:41:46 -0600 Subject: [Python-ideas] Ideas for improving the struct module In-Reply-To: References: <3fed1808-58a6-2f9a-487f-db865c5b3c56@interlinked.me> <20170119012719.GB7345@ando.pearwood.info> <6c517754-8f89-74e5-ebe0-57d8bc81004d@kynesim.co.uk> <281a3c57-25e7-f71c-bc2f-bda10c880c2f@interlinked.me> <3c707925-5d67-fd26-80d0-72138b9ba4a5@interlinked.me> Message-ID: ctypes.Structure is *literally* the interface to the C struct that as Chris mentions has fixed offsets for all members. I don't think that should (can?) be altered. In file formats (beyond net protocols) the string size + variable length string motif comes up often and I am frequently re-implementing the two-line read-an-int + read-{}.format-bytes. On Thu, Jan 19, 2017 at 12:17 PM, Joao S. O. Bueno wrote: > I am for upgrading struct to these, if possible. > > But besides my +1, I am writting in to remember folks thatthere is another > "struct" model in the stdlib: > > ctypes.Structure - > > For reading a lot of records with the same structure it is much more handy > than > struct, since it gives one a suitable Python object on instantiation. > > However, it also can't handle variable lenght fields automatically. > > But maybe, the improvement could be made on that side, or another package > altogether taht works more like it than current "struct". > > > > On 19 January 2017 at 16:08, Elizabeth Myers > wrote: > > On 19/01/17 06:47, Elizabeth Myers wrote: > >> On 19/01/17 05:58, Rhodri James wrote: > >>> On 19/01/17 08:31, Mark Dickinson wrote: > >>>> On Thu, Jan 19, 2017 at 1:27 AM, Steven D'Aprano > > >>>> wrote: > >>>>> [...] struct already supports > >>>>> variable-width formats. > >>>> > >>>> Unfortunately, that's not really true: the Pascal strings it supports > >>>> are in some sense variable length, but are stored in a fixed-width > >>>> field. The internals of the struct module rely on each field starting > >>>> at a fixed offset, computable directly from the format string. I don't > >>>> think variable-length fields would be a good fit for the current > >>>> design of the struct module. > >>>> > >>>> For the OPs use-case, I'd suggest a library that sits on top of the > >>>> struct module, rather than an expansion to the struct module itself. > >>> > >>> Unfortunately as the OP explained, this makes the struct module a poor > >>> fit for protocol decoding, even as a base layer for something. It's > one > >>> of the things I use python for quite frequently, and I always end up > >>> rolling my own and discarding struct entirely. > >>> > >> > >> Yes, for variable-length fields the struct module is worse than useless: > >> it actually reduces clarity a little. Consider: > >> > >>>>> test_bytes = b'\x00\x00\x00\x0chello world!' > >> > >> With this, you can do: > >> > >>>>> length = int.from_bytes(test_bytes[:4], 'big') > >>>>> string = test_bytes[4:length] > >> > >> or you can do: > >> > >>>>> length = struct.unpack_from('!I', test_bytes)[0] > >>>>> string = struct.unpack_from('{}s'.format(length), test_bytes, 4)[0] > >> > >> Which looks more readable without consulting the docs? ;) > >> > >> Building anything on top of the struct library like this would lead to > >> worse-looking code for minimal gains in efficiency. To quote Jamie > >> Zawinksi, it is like building a bookshelf out of mashed potatoes as it > >> stands. > >> > >> If we had an extension similar to netstruct: > >> > >>>>> length, string = struct.unpack('!I$', test_bytes) > >> > >> MUCH improved readability, and also less verbose. :) > > > > I also didn't mention that when you are unpacking iteratively (e.g., you > > have multiple strings), the code becomes a bit more hairy: > > > >>>> test_bytes = b'\x00\x05hello\x00\x07goodbye\x00\x04test' > >>>> offset = 0 > >>>> while offset < len(test_bytes): > > ... length = struct.unpack_from('!H', test_bytes, offset)[0] > > ... offset += 2 > > ... string = struct.unpack_from('{}s'.format(length), test_bytes, > > offset)[0] > > ... offset += length > > > > It actually gets a lot worse when you have to unpack a set of strings in > > a context-sensitive manner. You have to be sure to update the offset > > constantly so you can always unpack strings appropriately. Yuck! > > > > It's worth mentioning that a few years ago, a coworker and I found > > ourselves needing variable length strings in the context of a binary > > protocol (DHCP), and wound up abandoning the struct module entirely > > because it was unsuitable. My co-worker said the same thing I did: "it's > > like building a bookshelf out of mashed potatoes." > > > > I do understand it might require a possible major rewrite or major > > changes the struct module, but in the long run, I think it's worth it > > (especially because the struct module is not all that big in scope). As > > it stands, the struct module simply is not suited for protocols where > > you have variable-length strings, and in my experience, that is the vast > > majority of modern binary protocols on the Internet. > > > > -- > > Elizabeth > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Jan 19 13:50:26 2017 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 19 Jan 2017 10:50:26 -0800 Subject: [Python-ideas] Ideas for improving the struct module In-Reply-To: References: <3fed1808-58a6-2f9a-487f-db865c5b3c56@interlinked.me> <20170119012719.GB7345@ando.pearwood.info> <6c517754-8f89-74e5-ebe0-57d8bc81004d@kynesim.co.uk> <281a3c57-25e7-f71c-bc2f-bda10c880c2f@interlinked.me> <3c707925-5d67-fd26-80d0-72138b9ba4a5@interlinked.me> Message-ID: I haven't had a chance to use it myself yet, but I've heard good things about https://construct.readthedocs.io/en/latest/ It's certainly far more comprehensive than struct for this and other problems. As usual, there's some tension between adding stuff to the stdlib versus using more specialized third-party packages. The existence of packages like construct doesn't automatically mean that we should stop improving the stdlib, but OTOH not every useful thing can or should be in the stdlib. Personally, I find myself parsing uleb128-prefixed strings more often than u4-prefixed strings. On Jan 19, 2017 10:42 AM, "Nick Timkovich" wrote: > ctypes.Structure is *literally* the interface to the C struct that as > Chris mentions has fixed offsets for all members. I don't think that should > (can?) be altered. > > In file formats (beyond net protocols) the string size + variable length > string motif comes up often and I am frequently re-implementing the > two-line read-an-int + read-{}.format-bytes. > > On Thu, Jan 19, 2017 at 12:17 PM, Joao S. O. Bueno > wrote: > >> I am for upgrading struct to these, if possible. >> >> But besides my +1, I am writting in to remember folks thatthere is >> another >> "struct" model in the stdlib: >> >> ctypes.Structure - >> >> For reading a lot of records with the same structure it is much more >> handy than >> struct, since it gives one a suitable Python object on instantiation. >> >> However, it also can't handle variable lenght fields automatically. >> >> But maybe, the improvement could be made on that side, or another package >> altogether taht works more like it than current "struct". >> >> >> >> On 19 January 2017 at 16:08, Elizabeth Myers >> wrote: >> > On 19/01/17 06:47, Elizabeth Myers wrote: >> >> On 19/01/17 05:58, Rhodri James wrote: >> >>> On 19/01/17 08:31, Mark Dickinson wrote: >> >>>> On Thu, Jan 19, 2017 at 1:27 AM, Steven D'Aprano < >> steve at pearwood.info> >> >>>> wrote: >> >>>>> [...] struct already supports >> >>>>> variable-width formats. >> >>>> >> >>>> Unfortunately, that's not really true: the Pascal strings it supports >> >>>> are in some sense variable length, but are stored in a fixed-width >> >>>> field. The internals of the struct module rely on each field starting >> >>>> at a fixed offset, computable directly from the format string. I >> don't >> >>>> think variable-length fields would be a good fit for the current >> >>>> design of the struct module. >> >>>> >> >>>> For the OPs use-case, I'd suggest a library that sits on top of the >> >>>> struct module, rather than an expansion to the struct module itself. >> >>> >> >>> Unfortunately as the OP explained, this makes the struct module a poor >> >>> fit for protocol decoding, even as a base layer for something. It's >> one >> >>> of the things I use python for quite frequently, and I always end up >> >>> rolling my own and discarding struct entirely. >> >>> >> >> >> >> Yes, for variable-length fields the struct module is worse than >> useless: >> >> it actually reduces clarity a little. Consider: >> >> >> >>>>> test_bytes = b'\x00\x00\x00\x0chello world!' >> >> >> >> With this, you can do: >> >> >> >>>>> length = int.from_bytes(test_bytes[:4], 'big') >> >>>>> string = test_bytes[4:length] >> >> >> >> or you can do: >> >> >> >>>>> length = struct.unpack_from('!I', test_bytes)[0] >> >>>>> string = struct.unpack_from('{}s'.format(length), test_bytes, 4)[0] >> >> >> >> Which looks more readable without consulting the docs? ;) >> >> >> >> Building anything on top of the struct library like this would lead to >> >> worse-looking code for minimal gains in efficiency. To quote Jamie >> >> Zawinksi, it is like building a bookshelf out of mashed potatoes as it >> >> stands. >> >> >> >> If we had an extension similar to netstruct: >> >> >> >>>>> length, string = struct.unpack('!I$', test_bytes) >> >> >> >> MUCH improved readability, and also less verbose. :) >> > >> > I also didn't mention that when you are unpacking iteratively (e.g., you >> > have multiple strings), the code becomes a bit more hairy: >> > >> >>>> test_bytes = b'\x00\x05hello\x00\x07goodbye\x00\x04test' >> >>>> offset = 0 >> >>>> while offset < len(test_bytes): >> > ... length = struct.unpack_from('!H', test_bytes, offset)[0] >> > ... offset += 2 >> > ... string = struct.unpack_from('{}s'.format(length), test_bytes, >> > offset)[0] >> > ... offset += length >> > >> > It actually gets a lot worse when you have to unpack a set of strings in >> > a context-sensitive manner. You have to be sure to update the offset >> > constantly so you can always unpack strings appropriately. Yuck! >> > >> > It's worth mentioning that a few years ago, a coworker and I found >> > ourselves needing variable length strings in the context of a binary >> > protocol (DHCP), and wound up abandoning the struct module entirely >> > because it was unsuitable. My co-worker said the same thing I did: "it's >> > like building a bookshelf out of mashed potatoes." >> > >> > I do understand it might require a possible major rewrite or major >> > changes the struct module, but in the long run, I think it's worth it >> > (especially because the struct module is not all that big in scope). As >> > it stands, the struct module simply is not suited for protocols where >> > you have variable-length strings, and in my experience, that is the vast >> > majority of modern binary protocols on the Internet. >> > >> > -- >> > Elizabeth >> > _______________________________________________ >> > Python-ideas mailing list >> > Python-ideas at python.org >> > https://mail.python.org/mailman/listinfo/python-ideas >> > Code of Conduct: http://python.org/psf/codeofconduct/ >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From prometheus235 at gmail.com Thu Jan 19 14:20:08 2017 From: prometheus235 at gmail.com (Nick Timkovich) Date: Thu, 19 Jan 2017 13:20:08 -0600 Subject: [Python-ideas] Ideas for improving the struct module In-Reply-To: References: <3fed1808-58a6-2f9a-487f-db865c5b3c56@interlinked.me> <20170119012719.GB7345@ando.pearwood.info> <6c517754-8f89-74e5-ebe0-57d8bc81004d@kynesim.co.uk> <281a3c57-25e7-f71c-bc2f-bda10c880c2f@interlinked.me> <3c707925-5d67-fd26-80d0-72138b9ba4a5@interlinked.me> Message-ID: Construct has radical API changes and should remain apart. It feels to me like a straw-man to introduce a large library to the discussion as justification for it being too-specialized. This proposal to me seems much more modest: add another format character (or two) to the existing set of a dozen or so that will be packed/unpacked just like the others. It also has demonstrable use in various formats/protocols. On Thu, Jan 19, 2017 at 12:50 PM, Nathaniel Smith wrote: > I haven't had a chance to use it myself yet, but I've heard good things > about > > https://construct.readthedocs.io/en/latest/ > > It's certainly far more comprehensive than struct for this and other > problems. > > As usual, there's some tension between adding stuff to the stdlib versus > using more specialized third-party packages. The existence of packages like > construct doesn't automatically mean that we should stop improving the > stdlib, but OTOH not every useful thing can or should be in the stdlib. > > Personally, I find myself parsing uleb128-prefixed strings more often than > u4-prefixed strings. > > On Jan 19, 2017 10:42 AM, "Nick Timkovich" > wrote: > >> ctypes.Structure is *literally* the interface to the C struct that as >> Chris mentions has fixed offsets for all members. I don't think that should >> (can?) be altered. >> >> In file formats (beyond net protocols) the string size + variable length >> string motif comes up often and I am frequently re-implementing the >> two-line read-an-int + read-{}.format-bytes. >> >> On Thu, Jan 19, 2017 at 12:17 PM, Joao S. O. Bueno > > wrote: >> >>> I am for upgrading struct to these, if possible. >>> >>> But besides my +1, I am writting in to remember folks thatthere is >>> another >>> "struct" model in the stdlib: >>> >>> ctypes.Structure - >>> >>> For reading a lot of records with the same structure it is much more >>> handy than >>> struct, since it gives one a suitable Python object on instantiation. >>> >>> However, it also can't handle variable lenght fields automatically. >>> >>> But maybe, the improvement could be made on that side, or another package >>> altogether taht works more like it than current "struct". >>> >>> >>> >>> On 19 January 2017 at 16:08, Elizabeth Myers >>> wrote: >>> > On 19/01/17 06:47, Elizabeth Myers wrote: >>> >> On 19/01/17 05:58, Rhodri James wrote: >>> >>> On 19/01/17 08:31, Mark Dickinson wrote: >>> >>>> On Thu, Jan 19, 2017 at 1:27 AM, Steven D'Aprano < >>> steve at pearwood.info> >>> >>>> wrote: >>> >>>>> [...] struct already supports >>> >>>>> variable-width formats. >>> >>>> >>> >>>> Unfortunately, that's not really true: the Pascal strings it >>> supports >>> >>>> are in some sense variable length, but are stored in a fixed-width >>> >>>> field. The internals of the struct module rely on each field >>> starting >>> >>>> at a fixed offset, computable directly from the format string. I >>> don't >>> >>>> think variable-length fields would be a good fit for the current >>> >>>> design of the struct module. >>> >>>> >>> >>>> For the OPs use-case, I'd suggest a library that sits on top of the >>> >>>> struct module, rather than an expansion to the struct module itself. >>> >>> >>> >>> Unfortunately as the OP explained, this makes the struct module a >>> poor >>> >>> fit for protocol decoding, even as a base layer for something. It's >>> one >>> >>> of the things I use python for quite frequently, and I always end up >>> >>> rolling my own and discarding struct entirely. >>> >>> >>> >> >>> >> Yes, for variable-length fields the struct module is worse than >>> useless: >>> >> it actually reduces clarity a little. Consider: >>> >> >>> >>>>> test_bytes = b'\x00\x00\x00\x0chello world!' >>> >> >>> >> With this, you can do: >>> >> >>> >>>>> length = int.from_bytes(test_bytes[:4], 'big') >>> >>>>> string = test_bytes[4:length] >>> >> >>> >> or you can do: >>> >> >>> >>>>> length = struct.unpack_from('!I', test_bytes)[0] >>> >>>>> string = struct.unpack_from('{}s'.format(length), test_bytes, >>> 4)[0] >>> >> >>> >> Which looks more readable without consulting the docs? ;) >>> >> >>> >> Building anything on top of the struct library like this would lead to >>> >> worse-looking code for minimal gains in efficiency. To quote Jamie >>> >> Zawinksi, it is like building a bookshelf out of mashed potatoes as it >>> >> stands. >>> >> >>> >> If we had an extension similar to netstruct: >>> >> >>> >>>>> length, string = struct.unpack('!I$', test_bytes) >>> >> >>> >> MUCH improved readability, and also less verbose. :) >>> > >>> > I also didn't mention that when you are unpacking iteratively (e.g., >>> you >>> > have multiple strings), the code becomes a bit more hairy: >>> > >>> >>>> test_bytes = b'\x00\x05hello\x00\x07goodbye\x00\x04test' >>> >>>> offset = 0 >>> >>>> while offset < len(test_bytes): >>> > ... length = struct.unpack_from('!H', test_bytes, offset)[0] >>> > ... offset += 2 >>> > ... string = struct.unpack_from('{}s'.format(length), test_bytes, >>> > offset)[0] >>> > ... offset += length >>> > >>> > It actually gets a lot worse when you have to unpack a set of strings >>> in >>> > a context-sensitive manner. You have to be sure to update the offset >>> > constantly so you can always unpack strings appropriately. Yuck! >>> > >>> > It's worth mentioning that a few years ago, a coworker and I found >>> > ourselves needing variable length strings in the context of a binary >>> > protocol (DHCP), and wound up abandoning the struct module entirely >>> > because it was unsuitable. My co-worker said the same thing I did: >>> "it's >>> > like building a bookshelf out of mashed potatoes." >>> > >>> > I do understand it might require a possible major rewrite or major >>> > changes the struct module, but in the long run, I think it's worth it >>> > (especially because the struct module is not all that big in scope). As >>> > it stands, the struct module simply is not suited for protocols where >>> > you have variable-length strings, and in my experience, that is the >>> vast >>> > majority of modern binary protocols on the Internet. >>> > >>> > -- >>> > Elizabeth >>> > _______________________________________________ >>> > Python-ideas mailing list >>> > Python-ideas at python.org >>> > https://mail.python.org/mailman/listinfo/python-ideas >>> > Code of Conduct: http://python.org/psf/codeofconduct/ >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >>> >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Thu Jan 19 15:18:23 2017 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 19 Jan 2017 20:18:23 +0000 Subject: [Python-ideas] Ideas for improving the struct module In-Reply-To: <281a3c57-25e7-f71c-bc2f-bda10c880c2f@interlinked.me> References: <3fed1808-58a6-2f9a-487f-db865c5b3c56@interlinked.me> <20170119012719.GB7345@ando.pearwood.info> <6c517754-8f89-74e5-ebe0-57d8bc81004d@kynesim.co.uk> <281a3c57-25e7-f71c-bc2f-bda10c880c2f@interlinked.me> Message-ID: <45867e80-26ed-1b29-85f0-aa4d65768ea2@mrabarnett.plus.com> On 2017-01-19 12:47, Elizabeth Myers wrote: > On 19/01/17 05:58, Rhodri James wrote: >> On 19/01/17 08:31, Mark Dickinson wrote: >>> On Thu, Jan 19, 2017 at 1:27 AM, Steven D'Aprano >>> wrote: >>>> [...] struct already supports >>>> variable-width formats. >>> >>> Unfortunately, that's not really true: the Pascal strings it supports >>> are in some sense variable length, but are stored in a fixed-width >>> field. The internals of the struct module rely on each field starting >>> at a fixed offset, computable directly from the format string. I don't >>> think variable-length fields would be a good fit for the current >>> design of the struct module. >>> >>> For the OPs use-case, I'd suggest a library that sits on top of the >>> struct module, rather than an expansion to the struct module itself. >> >> Unfortunately as the OP explained, this makes the struct module a poor >> fit for protocol decoding, even as a base layer for something. It's one >> of the things I use python for quite frequently, and I always end up >> rolling my own and discarding struct entirely. >> > > Yes, for variable-length fields the struct module is worse than useless: > it actually reduces clarity a little. Consider: > >>>> test_bytes = b'\x00\x00\x00\x0chello world!' > > With this, you can do: > >>>> length = int.from_bytes(test_bytes[:4], 'big') >>>> string = test_bytes[4:length] > Shouldn't that be: string = test_bytes[4:4+length] > or you can do: > >>>> length = struct.unpack_from('!I', test_bytes)[0] >>>> string = struct.unpack_from('{}s'.format(length), test_bytes, 4)[0] > > Which looks more readable without consulting the docs? ;) > Which is more likely to be correct? :-) > Building anything on top of the struct library like this would lead to > worse-looking code for minimal gains in efficiency. To quote Jamie > Zawinksi, it is like building a bookshelf out of mashed potatoes as it > stands. > > If we had an extension similar to netstruct: > >>>> length, string = struct.unpack('!I$', test_bytes) > > MUCH improved readability, and also less verbose. :) > From yselivanov.ml at gmail.com Thu Jan 19 16:04:42 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 19 Jan 2017 16:04:42 -0500 Subject: [Python-ideas] Ideas for improving the struct module In-Reply-To: <3fed1808-58a6-2f9a-487f-db865c5b3c56@interlinked.me> References: <3fed1808-58a6-2f9a-487f-db865c5b3c56@interlinked.me> Message-ID: This is a neat idea, but this will only work for parsing framed binary protocols. For example, if you protocol prefixes all packets with a length field, you can write an efficient read buffer and use your proposal to decode all of message's fields in one shot. Which is good. Not all protocols use framing though. For instance, your proposal won't help to write Thrift or Postgres protocols parsers. Overall, I'm not sure that this is worth the hassle. With proposal: data, = struct.unpack('!H$', buf) buf = buf[2+len(data):] with the current struct module: len, = struct.unpack('!H', buf) data = buf[2:2+len] buf = buf[2+len:] Another thing: struct.calcsize won't work with structs that use variable length fields. Yury On 2017-01-18 5:24 AM, Elizabeth Myers wrote: > Hello, > > I've noticed a lot of binary protocols require variable length > bytestrings (with or without a null terminator), but it is not easy to > unpack these in Python without first reading the desired length, or > reading bytes until a null terminator is reached. > > I've noticed the netstruct library > (https://github.com/stendec/netstruct) has a format specifier, $, which > assumes the previous type to pack/unpack is the string's length. This is > an interesting idea in of itself, but doesn't handle the null-terminated > string chase. I know $ is similar to pascal strings, but sometimes you > need more than 255 characters :p. > > For null-terminated strings, it may be simpler to have a specifier for > those. I propose 0, but this point can be bikeshedded over endlessly if > desired ;) (I thought about using n/N but they're :P). > > It's worth noting that (maybe one of?) Perl's equivalent to the struct > module, whose name escapes me atm, has a module which can handle this > case. I can't remember if it handled variable length or zero-terminated > though; maybe it did both. Perl is more or less my 10th language. :p > > This pain point is an annoyance imo and would greatly simplify a lot of > code if implemented, or something like it. I'd be happy to take a look > at implementing it if the idea is received sufficiently warmly. > > -- > Elizabeth > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From ethan at stoneleaf.us Thu Jan 19 16:27:28 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 19 Jan 2017 13:27:28 -0800 Subject: [Python-ideas] Ideas for improving the struct module In-Reply-To: <3fed1808-58a6-2f9a-487f-db865c5b3c56@interlinked.me> References: <3fed1808-58a6-2f9a-487f-db865c5b3c56@interlinked.me> Message-ID: <58812F40.9000005@stoneleaf.us> There is now an issue for this: http://bugs.python.org/issue29328 -- ~Ethan~ From steve at pearwood.info Thu Jan 19 19:30:38 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 20 Jan 2017 11:30:38 +1100 Subject: [Python-ideas] Ideas for improving the struct module In-Reply-To: References: <3fed1808-58a6-2f9a-487f-db865c5b3c56@interlinked.me> <20170119012719.GB7345@ando.pearwood.info> Message-ID: <20170120003037.GC7345@ando.pearwood.info> On Thu, Jan 19, 2017 at 08:31:03AM +0000, Mark Dickinson wrote: > On Thu, Jan 19, 2017 at 1:27 AM, Steven D'Aprano wrote: > > [...] struct already supports > > variable-width formats. > > Unfortunately, that's not really true: the Pascal strings it supports > are in some sense variable length, but are stored in a fixed-width > field. The internals of the struct module rely on each field starting > at a fixed offset, computable directly from the format string. I don't > think variable-length fields would be a good fit for the current > design of the struct module. I know nothing and care even less (is caring a negative amount possible?) about the internal implementation of the struct module. Since Elizabeth is volunteering to do the work to make it work, will it be accepted? Subject to the usual code quality reviews, contributor agreement, etc. Are there objections to the *idea* of adding support for null terminated strings to the struct module? Does it require a PEP just to add one more format code? (Maybe it will, if the format code requires a complete re-write of the entire module.) It seems to me that if Elizabeth is willing to do the work, and somebody to review it, this would be a welcome addition to the module. It would require at least one API change: struct.calcsize won't work for formats containing null-terminated strings. But that's a minor matter. -- Steve From steve at pearwood.info Thu Jan 19 19:38:52 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 20 Jan 2017 11:38:52 +1100 Subject: [Python-ideas] Ideas for improving the struct module In-Reply-To: References: <3fed1808-58a6-2f9a-487f-db865c5b3c56@interlinked.me> <20170119012719.GB7345@ando.pearwood.info> <6c517754-8f89-74e5-ebe0-57d8bc81004d@kynesim.co.uk> <281a3c57-25e7-f71c-bc2f-bda10c880c2f@interlinked.me> <3c707925-5d67-fd26-80d0-72138b9ba4a5@interlinked.me> Message-ID: <20170120003852.GD7345@ando.pearwood.info> On Fri, Jan 20, 2017 at 05:16:28AM +1100, Chris Angelico wrote: > To be fair, the name "struct" implies a C-style structure, which > _does_ have a fixed size, or at least fixed offsets for its members Ah, the old "everyone thinks in C terms" fallacy raises its ugly head agan :-) The name doesn't imply any such thing to me, or those who haven't been raised on C. It implies the word "structure", which has no implication of being fixed-width. The docs for the struct module describes it as: struct ? Interpret bytes as packed binary data which applies equally to the fixed- and variable-width case. The fact that we can sensibly talk about "fixed-width" and "variable-width" structs without confusion, shows that the concept is bigger than the C data-type. (Even if the most common use will probably remain C-style fixed-width structs.) Python is not C, and we shouldn't be limited by what C does. If we wanted C, we would use C. -- Steve From rosuav at gmail.com Thu Jan 19 19:53:46 2017 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 20 Jan 2017 11:53:46 +1100 Subject: [Python-ideas] Ideas for improving the struct module In-Reply-To: <20170120003852.GD7345@ando.pearwood.info> References: <3fed1808-58a6-2f9a-487f-db865c5b3c56@interlinked.me> <20170119012719.GB7345@ando.pearwood.info> <6c517754-8f89-74e5-ebe0-57d8bc81004d@kynesim.co.uk> <281a3c57-25e7-f71c-bc2f-bda10c880c2f@interlinked.me> <3c707925-5d67-fd26-80d0-72138b9ba4a5@interlinked.me> <20170120003852.GD7345@ando.pearwood.info> Message-ID: On Fri, Jan 20, 2017 at 11:38 AM, Steven D'Aprano wrote: > On Fri, Jan 20, 2017 at 05:16:28AM +1100, Chris Angelico wrote: > >> To be fair, the name "struct" implies a C-style structure, which >> _does_ have a fixed size, or at least fixed offsets for its members > > > Ah, the old "everyone thinks in C terms" fallacy raises its ugly head > agan :-) > > The name doesn't imply any such thing to me, or those who haven't been > raised on C. It implies the word "structure", which has no implication > of being fixed-width. Fair point. Objection retracted - and it was only minor anyway. This would be a handy feature to add. +1. ChrisA From gvanrossum at gmail.com Thu Jan 19 21:34:49 2017 From: gvanrossum at gmail.com (Guido van Rossum) Date: Thu, 19 Jan 2017 18:34:49 -0800 Subject: [Python-ideas] Ideas for improving the struct module In-Reply-To: References: <3fed1808-58a6-2f9a-487f-db865c5b3c56@interlinked.me> <20170119012719.GB7345@ando.pearwood.info> <6c517754-8f89-74e5-ebe0-57d8bc81004d@kynesim.co.uk> <281a3c57-25e7-f71c-bc2f-bda10c880c2f@interlinked.me> <3c707925-5d67-fd26-80d0-72138b9ba4a5@interlinked.me> <20170120003852.GD7345@ando.pearwood.info> Message-ID: Nevertheless the C meaning *is* the etymology of the module name. :-) --Guido (mobile) On Jan 19, 2017 16:54, "Chris Angelico" wrote: > On Fri, Jan 20, 2017 at 11:38 AM, Steven D'Aprano > wrote: > > On Fri, Jan 20, 2017 at 05:16:28AM +1100, Chris Angelico wrote: > > > >> To be fair, the name "struct" implies a C-style structure, which > >> _does_ have a fixed size, or at least fixed offsets for its members > > > > > > Ah, the old "everyone thinks in C terms" fallacy raises its ugly head > > agan :-) > > > > The name doesn't imply any such thing to me, or those who haven't been > > raised on C. It implies the word "structure", which has no implication > > of being fixed-width. > > Fair point. Objection retracted - and it was only minor anyway. This > would be a handy feature to add. +1. > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cs at zip.com.au Thu Jan 19 21:40:25 2017 From: cs at zip.com.au (Cameron Simpson) Date: Fri, 20 Jan 2017 13:40:25 +1100 Subject: [Python-ideas] Ideas for improving the struct module In-Reply-To: <3c707925-5d67-fd26-80d0-72138b9ba4a5@interlinked.me> References: <3c707925-5d67-fd26-80d0-72138b9ba4a5@interlinked.me> Message-ID: <20170120024025.GA90080@cskk.homeip.net> On 19Jan2017 12:08, Elizabeth Myers wrote: >I also didn't mention that when you are unpacking iteratively (e.g., you >have multiple strings), the code becomes a bit more hairy: > >>>> test_bytes = b'\x00\x05hello\x00\x07goodbye\x00\x04test' >>>> offset = 0 >>>> while offset < len(test_bytes): >... length = struct.unpack_from('!H', test_bytes, offset)[0] >... offset += 2 >... string = struct.unpack_from('{}s'.format(length), test_bytes, >offset)[0] >... offset += length > >It actually gets a lot worse when you have to unpack a set of strings in >a context-sensitive manner. You have to be sure to update the offset >constantly so you can always unpack strings appropriately. Yuck! Whenever I'm doing iterative stuff like this, either variable length binary or lexical stuff, I always end up with a bunch of functions which can be called like this: datalen, offset = get_bs(chunk, offset=offset) The notable thing here is just that they return the data and the new offset, which makes updating the offset impossible to forget, and also makes the calling code more succinct, like the internal call to get_bs() below: such as this decoder for a length encoded field: def get_bsdata(chunk, offset=0): ''' Fetch a length-prefixed data chunk. Decodes an unsigned value from a bytes at the specified `offset` (default 0), and collects that many following bytes. Return those following bytes and the new offset. ''' ##is_bytes(chunk) offset0 = offset datalen, offset = get_bs(chunk, offset=offset) data = chunk[offset:offset+datalen] ##is_bytes(data) if len(data) != datalen: raise ValueError("bsdata(chunk, offset=%d): insufficient data: expected %d bytes, got %d bytes" % (offset0, datalen, len(data))) offset += datalen return data, offset Cheers, Cameron Simpson From cs at zip.com.au Thu Jan 19 21:54:31 2017 From: cs at zip.com.au (Cameron Simpson) Date: Fri, 20 Jan 2017 13:54:31 +1100 Subject: [Python-ideas] Ideas for improving the struct module In-Reply-To: References: Message-ID: <20170120025431.GA82264@cskk.homeip.net> On 19Jan2017 16:04, Yury Selivanov wrote: >This is a neat idea, but this will only work for parsing framed >binary protocols. For example, if you protocol prefixes all packets >with a length field, you can write an efficient read buffer and >use your proposal to decode all of message's fields in one shot. >Which is good. > >Not all protocols use framing though. For instance, your proposal >won't help to write Thrift or Postgres protocols parsers. Sure, but a lot of things fit the proposal. Seems a win: both simple and useful. >Overall, I'm not sure that this is worth the hassle. With proposal: > > data, = struct.unpack('!H$', buf) > buf = buf[2+len(data):] > >with the current struct module: > > len, = struct.unpack('!H', buf) > data = buf[2:2+len] > buf = buf[2+len:] > >Another thing: struct.calcsize won't work with structs that use >variable length fields. True, but it would be enough for it to raise an exception of some kind. It won't break any in play code, and it will prevent accidents for users of new variable sizes formats. We've all got things we wish struct might cover (I have a few, but strangely the top of the list is nonsemantic: I wish it let me put meaningless whitespace inside the format for readability). +1 on the proposal from me. Oh: subject to one proviso: reading a struct will need to return how many bytes of input data were scanned, not merely returning the decoded values. Cheers, Cameron Simpson From dickinsm at gmail.com Fri Jan 20 02:43:29 2017 From: dickinsm at gmail.com (Mark Dickinson) Date: Fri, 20 Jan 2017 07:43:29 +0000 Subject: [Python-ideas] Ideas for improving the struct module In-Reply-To: <20170120003037.GC7345@ando.pearwood.info> References: <3fed1808-58a6-2f9a-487f-db865c5b3c56@interlinked.me> <20170119012719.GB7345@ando.pearwood.info> <20170120003037.GC7345@ando.pearwood.info> Message-ID: On Fri, Jan 20, 2017 at 12:30 AM, Steven D'Aprano wrote: > Does it require a PEP just to add one more > format code? (Maybe it will, if the format code requires a complete > re-write of the entire module.) Yes, I think a PEP would be useful in this case. The proposed change *would* entail some fairly substantial changes to the design of the module (I encourage you to take a look at the source to appreciate what's involved), and if we're going to that level of effort it's probably worth stepping back and seeing whether those changes are compatible with other proposed directions for the struct module, and whether it makes sense to do more than add that one format code. That level of change probably isn't worth it "just to add one more format code", but might be worth it if it allows other possible expansions of the struct module functionality. There are also performance considerations to look at, behaviour of alignment to consider, and other details. -- Mark From thomas at kluyver.me.uk Fri Jan 20 07:46:19 2017 From: thomas at kluyver.me.uk (Thomas Kluyver) Date: Fri, 20 Jan 2017 12:46:19 +0000 Subject: [Python-ideas] Context manager to temporarily set signal handlers Message-ID: <1484916379.4033524.853975624.1BE29C9D@webmail.messagingengine.com> Not uncommonly, I want to do something like this in code: import signal # Install my own signal handler prev_hup = signal.signal(signal.SIGHUP, my_handler) prev_term = signal.signal(signal.SIGTERM, my_handler) try: do_something_else() finally: # Restore previous signal handlers signal.signal(signal.SIGHUP, prev_hup) signal.signal(signal.SIGTERM, prev_term) This works if the existing signal handler is a Python function, or the special values SIG_IGN (ignore) or SIG_DFL (default). However, it breaks if code has set a signal handler in C: this is not returned, and there is no way in Python to reinstate a C-level signal handler once we've replaced it from Python. I propose two possible solutions: 1. The high-level approach: a context manager which can temporarily set one or more signal handlers. If this was implemented in C, it could restore C-level as well as Python-level signal handlers. 2. A lower level approach: signal() and getsignal() would gain the ability to return an opaque object which refers to a C-level signal handler. The only use for this would be to pass it back to signal.signal() to set it as a signal handler again. The context manager from (1) could then be implemented in Python. Crosslinking http://bugs.python.org/issue13285 Thomas From elizabeth at interlinked.me Fri Jan 20 11:34:30 2017 From: elizabeth at interlinked.me (Elizabeth Myers) Date: Fri, 20 Jan 2017 10:34:30 -0600 Subject: [Python-ideas] Ideas for improving the struct module In-Reply-To: References: <3fed1808-58a6-2f9a-487f-db865c5b3c56@interlinked.me> Message-ID: On 19/01/17 15:04, Yury Selivanov wrote: > This is a neat idea, but this will only work for parsing framed > binary protocols. For example, if you protocol prefixes all packets > with a length field, you can write an efficient read buffer and > use your proposal to decode all of message's fields in one shot. > Which is good. > > Not all protocols use framing though. For instance, your proposal > won't help to write Thrift or Postgres protocols parsers. It won't help them, no, but it will help others who have to do similar tasks, or help people build things on top of the struct module. > > Overall, I'm not sure that this is worth the hassle. With proposal: > > data, = struct.unpack('!H$', buf) > buf = buf[2+len(data):] > > with the current struct module: > > len, = struct.unpack('!H', buf) > data = buf[2:2+len] > buf = buf[2+len:] I find such a construction is not really needed most of the time if I'm dealing with repeated frames. I could just use struct.iter_unpack. It's not useful in all cases, but as it stands, neither is the present struct module. Just because it is not useful to everyone does not mean it is not useful to others, perhaps immensely so. The existence of third party libraries that implement a portion of my rather modest proposal I think already justifies its existence. > > Another thing: struct.calcsize won't work with structs that use > variable length fields. Should probably raise an error if the format has a variable-length string in it. If you're using variable-length strings, you probably aren't a consumer of struct.calcsize anyway. > > Yury > > > On 2017-01-18 5:24 AM, Elizabeth Myers wrote: >> Hello, >> >> I've noticed a lot of binary protocols require variable length >> bytestrings (with or without a null terminator), but it is not easy to >> unpack these in Python without first reading the desired length, or >> reading bytes until a null terminator is reached. >> >> I've noticed the netstruct library >> (https://github.com/stendec/netstruct) has a format specifier, $, which >> assumes the previous type to pack/unpack is the string's length. This is >> an interesting idea in of itself, but doesn't handle the null-terminated >> string chase. I know $ is similar to pascal strings, but sometimes you >> need more than 255 characters :p. >> >> For null-terminated strings, it may be simpler to have a specifier for >> those. I propose 0, but this point can be bikeshedded over endlessly if >> desired ;) (I thought about using n/N but they're :P). >> >> It's worth noting that (maybe one of?) Perl's equivalent to the struct >> module, whose name escapes me atm, has a module which can handle this >> case. I can't remember if it handled variable length or zero-terminated >> though; maybe it did both. Perl is more or less my 10th language. :p >> >> This pain point is an annoyance imo and would greatly simplify a lot of >> code if implemented, or something like it. I'd be happy to take a look >> at implementing it if the idea is received sufficiently warmly. >> >> -- >> Elizabeth >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From elizabeth at interlinked.me Fri Jan 20 11:42:28 2017 From: elizabeth at interlinked.me (Elizabeth Myers) Date: Fri, 20 Jan 2017 10:42:28 -0600 Subject: [Python-ideas] Ideas for improving the struct module In-Reply-To: <20170120024025.GA90080@cskk.homeip.net> References: <3c707925-5d67-fd26-80d0-72138b9ba4a5@interlinked.me> <20170120024025.GA90080@cskk.homeip.net> Message-ID: On 19/01/17 20:40, Cameron Simpson wrote: > On 19Jan2017 12:08, Elizabeth Myers wrote: >> I also didn't mention that when you are unpacking iteratively (e.g., you >> have multiple strings), the code becomes a bit more hairy: >> >>>>> test_bytes = b'\x00\x05hello\x00\x07goodbye\x00\x04test' >>>>> offset = 0 >>>>> while offset < len(test_bytes): >> ... length = struct.unpack_from('!H', test_bytes, offset)[0] >> ... offset += 2 >> ... string = struct.unpack_from('{}s'.format(length), test_bytes, >> offset)[0] >> ... offset += length >> >> It actually gets a lot worse when you have to unpack a set of strings in >> a context-sensitive manner. You have to be sure to update the offset >> constantly so you can always unpack strings appropriately. Yuck! > > Whenever I'm doing iterative stuff like this, either variable length > binary or lexical stuff, I always end up with a bunch of functions which > can be called like this: > > datalen, offset = get_bs(chunk, offset=offset) > > The notable thing here is just that they return the data and the new > offset, which makes updating the offset impossible to forget, and also > makes the calling code more succinct, like the internal call to get_bs() > below: > > such as this decoder for a length encoded field: > > def get_bsdata(chunk, offset=0): > ''' Fetch a length-prefixed data chunk. > Decodes an unsigned value from a bytes at the specified `offset` > (default 0), and collects that many following bytes. > Return those following bytes and the new offset. > ''' > ##is_bytes(chunk) > offset0 = offset > datalen, offset = get_bs(chunk, offset=offset) > data = chunk[offset:offset+datalen] > ##is_bytes(data) > if len(data) != datalen: > raise ValueError("bsdata(chunk, offset=%d): insufficient data: > expected %d bytes, got %d bytes" > % (offset0, datalen, len(data))) > offset += datalen > return data, offset Gotta be honest, this seems less elegant than just adding something like what netstruct does to the struct module. It's also way more verbose. Perhaps some kind of higher level module could be built on struct at some point, maybe in stdlib, maybe not (construct imo is not that lib for previous raised objections). > > Cheers, > Cameron Simpson > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From elizabeth at interlinked.me Fri Jan 20 11:47:25 2017 From: elizabeth at interlinked.me (Elizabeth Myers) Date: Fri, 20 Jan 2017 10:47:25 -0600 Subject: [Python-ideas] Ideas for improving the struct module In-Reply-To: <20170120025431.GA82264@cskk.homeip.net> References: <20170120025431.GA82264@cskk.homeip.net> Message-ID: <844362e2-6573-21d8-ca16-5eab86ff33b3@interlinked.me> On 19/01/17 20:54, Cameron Simpson wrote: > On 19Jan2017 16:04, Yury Selivanov wrote: >> This is a neat idea, but this will only work for parsing framed >> binary protocols. For example, if you protocol prefixes all packets >> with a length field, you can write an efficient read buffer and >> use your proposal to decode all of message's fields in one shot. >> Which is good. >> >> Not all protocols use framing though. For instance, your proposal >> won't help to write Thrift or Postgres protocols parsers. > > Sure, but a lot of things fit the proposal. Seems a win: both simple and > useful. > >> Overall, I'm not sure that this is worth the hassle. With proposal: >> >> data, = struct.unpack('!H$', buf) >> buf = buf[2+len(data):] >> >> with the current struct module: >> >> len, = struct.unpack('!H', buf) >> data = buf[2:2+len] >> buf = buf[2+len:] >> >> Another thing: struct.calcsize won't work with structs that use >> variable length fields. > > True, but it would be enough for it to raise an exception of some kind. > It won't break any in play code, and it will prevent accidents for users > of new variable sizes formats. > > We've all got things we wish struct might cover (I have a few, but > strangely the top of the list is nonsemantic: I wish it let me put > meaningless whitespace inside the format for readability). > > +1 on the proposal from me. > > Oh: subject to one proviso: reading a struct will need to return how > many bytes of input data were scanned, not merely returning the decoded > values. This is a little difficult without breaking backwards compatibility, but, it is not difficult to compute the lengths yourself. That said, calcsize could require an extra parameter if given a format string with variable-length specifiers in it, e.g.: struct.calcsize("z", (b'test')) Would return 5 (zero-length terminator), so you don't have to compute it yourself. Also, I filed a bug, and proposed use of Z and z. > > Cheers, > Cameron Simpson > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From elizabeth at interlinked.me Fri Jan 20 11:51:35 2017 From: elizabeth at interlinked.me (Elizabeth Myers) Date: Fri, 20 Jan 2017 10:51:35 -0600 Subject: [Python-ideas] Ideas for improving the struct module In-Reply-To: <844362e2-6573-21d8-ca16-5eab86ff33b3@interlinked.me> References: <20170120025431.GA82264@cskk.homeip.net> <844362e2-6573-21d8-ca16-5eab86ff33b3@interlinked.me> Message-ID: On 20/01/17 10:47, Elizabeth Myers wrote: > On 19/01/17 20:54, Cameron Simpson wrote: >> On 19Jan2017 16:04, Yury Selivanov wrote: >>> This is a neat idea, but this will only work for parsing framed >>> binary protocols. For example, if you protocol prefixes all packets >>> with a length field, you can write an efficient read buffer and >>> use your proposal to decode all of message's fields in one shot. >>> Which is good. >>> >>> Not all protocols use framing though. For instance, your proposal >>> won't help to write Thrift or Postgres protocols parsers. >> >> Sure, but a lot of things fit the proposal. Seems a win: both simple and >> useful. >> >>> Overall, I'm not sure that this is worth the hassle. With proposal: >>> >>> data, = struct.unpack('!H$', buf) >>> buf = buf[2+len(data):] >>> >>> with the current struct module: >>> >>> len, = struct.unpack('!H', buf) >>> data = buf[2:2+len] >>> buf = buf[2+len:] >>> >>> Another thing: struct.calcsize won't work with structs that use >>> variable length fields. >> >> True, but it would be enough for it to raise an exception of some kind. >> It won't break any in play code, and it will prevent accidents for users >> of new variable sizes formats. >> >> We've all got things we wish struct might cover (I have a few, but >> strangely the top of the list is nonsemantic: I wish it let me put >> meaningless whitespace inside the format for readability). >> >> +1 on the proposal from me. >> >> Oh: subject to one proviso: reading a struct will need to return how >> many bytes of input data were scanned, not merely returning the decoded >> values. > > This is a little difficult without breaking backwards compatibility, > but, it is not difficult to compute the lengths yourself. That said, > calcsize could require an extra parameter if given a format string with > variable-length specifiers in it, e.g.: > > struct.calcsize("z", (b'test')) > > Would return 5 (zero-length terminator), so you don't have to compute it > yourself. > > Also, I filed a bug, and proposed use of Z and z. > Should I write up a PEP about this? I am not sure if it's justified or not. It's 3 changes (calcsize and two format specifiers), but it might be useful to codify it. From p.f.moore at gmail.com Fri Jan 20 11:59:45 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 20 Jan 2017 16:59:45 +0000 Subject: [Python-ideas] Ideas for improving the struct module In-Reply-To: References: <20170120025431.GA82264@cskk.homeip.net> <844362e2-6573-21d8-ca16-5eab86ff33b3@interlinked.me> Message-ID: On 20 January 2017 at 16:51, Elizabeth Myers wrote: > Should I write up a PEP about this? I am not sure if it's justified or > not. It's 3 changes (calcsize and two format specifiers), but it might > be useful to codify it. It feels a bit minor to need a PEP, but having said that did you pick up on the comment about needing to return the number of bytes consumed? str = struct.unpack('z', b'test\0xxx') How do we know where the unpack got to, so that we can continue parsing from there? It seems a bit wasteful to have to scan the string twice to use calcsize for this... A PEP (or at least, a PEP-style design document) might capture the answer to questions like this. OTOH, the tracker discussion could easily be enough - can you put a reference to the bug report here? Paul From njs at pobox.com Fri Jan 20 12:13:38 2017 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 20 Jan 2017 09:13:38 -0800 Subject: [Python-ideas] Ideas for improving the struct module In-Reply-To: References: <20170120025431.GA82264@cskk.homeip.net> <844362e2-6573-21d8-ca16-5eab86ff33b3@interlinked.me> Message-ID: On Jan 20, 2017 09:00, "Paul Moore" wrote: On 20 January 2017 at 16:51, Elizabeth Myers wrote: > Should I write up a PEP about this? I am not sure if it's justified or > not. It's 3 changes (calcsize and two format specifiers), but it might > be useful to codify it. It feels a bit minor to need a PEP, but having said that did you pick up on the comment about needing to return the number of bytes consumed? str = struct.unpack('z', b'test\0xxx') How do we know where the unpack got to, so that we can continue parsing from there? It seems a bit wasteful to have to scan the string twice to use calcsize for this... unpack() is OK, because it already has the rule that it raises an error if it doesn't exactly consume the buffer. But I agree that if we do this then we'd really want versions of unpack_from and pack_into that return the new offset. (Further arguments that calcsize is insufficient: it doesn't work for potential other variable length items, e.g. if we added uleb128 support; it quickly becomes awkward if you have multiple strings; in practice I think everyone who needs this would just end up writing a wrapper that calls calcsize and returns the new offset anyway, so should just provide that up front.) For pack_into this is also easy, since currently it always returns None, so if it started returning an integer no one would notice (and it'd be kinda handy in its own right, honestly). unpack_from is the tricky one, because it already has a return value and this isn't it. Ideally it would have worked this way from the beginning, but too late for that now... I guess the obvious solution would be to come up with a new function that's otherwise identical to unpack_from but returns a (values, offset) tuple. What to call this, though, I don't know :-). unpack_at? unpack_next? (Hinting that this is the natural primitive you'd use to implement unpack_iter.) -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsbueno at python.org.br Fri Jan 20 13:09:19 2017 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Fri, 20 Jan 2017 16:09:19 -0200 Subject: [Python-ideas] Ideas for improving the struct module In-Reply-To: References: <20170120025431.GA82264@cskk.homeip.net> <844362e2-6573-21d8-ca16-5eab86ff33b3@interlinked.me> Message-ID: On 20 January 2017 at 15:13, Nathaniel Smith wrote: > On Jan 20, 2017 09:00, "Paul Moore" wrote: > > On 20 January 2017 at 16:51, Elizabeth Myers > wrote: >> Should I write up a PEP about this? I am not sure if it's justified or >> not. It's 3 changes (calcsize and two format specifiers), but it might >> be useful to codify it. > > It feels a bit minor to need a PEP, but having said that did you pick > up on the comment about needing to return the number of bytes > consumed? > > str = struct.unpack('z', b'test\0xxx') > > How do we know where the unpack got to, so that we can continue > parsing from there? It seems a bit wasteful to have to scan the string > twice to use calcsize for this... > > > unpack() is OK, because it already has the rule that it raises an error if > it doesn't exactly consume the buffer. But I agree that if we do this then > we'd really want versions of unpack_from and pack_into that return the new > offset. (Further arguments that calcsize is insufficient: it doesn't work > for potential other variable length items, e.g. if we added uleb128 support; > it quickly becomes awkward if you have multiple strings; in practice I think > everyone who needs this would just end up writing a wrapper that calls > calcsize and returns the new offset anyway, so should just provide that up > front.) > > For pack_into this is also easy, since currently it always returns None, so > if it started returning an integer no one would notice (and it'd be kinda > handy in its own right, honestly). > > unpack_from is the tricky one, because it already has a return value and > this isn't it. Ideally it would have worked this way from the beginning, but > too late for that now... I guess the obvious solution would be to come up > with a new function that's otherwise identical to unpack_from but returns a > (values, offset) tuple. What to call this, though, I don't know :-). > unpack_at? unpack_next? (Hinting that this is the natural primitive you'd > use to implement unpack_iter.) > Yes - maybe a PEP. Then we could also, for example, add the suggestion of whitespace on the struct description string - which is nice. And we could things of: unpack methods returns a specialized object- not a tuple, which has attributes with the extra information. So, instead of a, str = struct.unpack("IB$", data) people who want the length can do: tmp = struct.unpack("IB$", data) do_things_with_len(tmp.tell) a, str = tmp The struct "object" could allow other things as well. Since we are at it, maybe a 0 copy version, that would return items from their implace buffer positions. But, ok, maybe most of this should just go in a third party package - anyway, a PEP could be open for more improvements than the variable-lenght fields proposed. (The idea of having attributes with extra information about size, for example - I think that is better than having: size, (a, str) = struct.unpack2(... ) ) js -><- > -n > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From ethan at stoneleaf.us Fri Jan 20 13:15:54 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 20 Jan 2017 10:15:54 -0800 Subject: [Python-ideas] Ideas for improving the struct module In-Reply-To: References: <20170120025431.GA82264@cskk.homeip.net> <844362e2-6573-21d8-ca16-5eab86ff33b3@interlinked.me> Message-ID: <588253DA.8040802@stoneleaf.us> On 01/20/2017 10:09 AM, Joao S. O. Bueno wrote: > On 20 January 2017 at 16:51, Elizabeth Myers wrote: >> Should I write up a PEP about this? I am not sure if it's justified or >> not. It's 3 changes (calcsize and two format specifiers), but it might >> be useful to codify it. > > Yes - maybe a PEP. I agree, especially if the change, simple as it is, requires a lot of rewrite. In that case someone (ELizabeth?) should collect ideas for other improvements and shepherd it through the PEP process. -- ~Ethan~ From gvanrossum at gmail.com Fri Jan 20 13:18:34 2017 From: gvanrossum at gmail.com (Guido van Rossum) Date: Fri, 20 Jan 2017 10:18:34 -0800 Subject: [Python-ideas] Ideas for improving the struct module In-Reply-To: <588253DA.8040802@stoneleaf.us> References: <20170120025431.GA82264@cskk.homeip.net> <844362e2-6573-21d8-ca16-5eab86ff33b3@interlinked.me> <588253DA.8040802@stoneleaf.us> Message-ID: I'd be wary of making a grab-bag of small improvements, it encourages bikeshedding. --Guido (mobile) On Jan 20, 2017 10:16 AM, "Ethan Furman" wrote: > On 01/20/2017 10:09 AM, Joao S. O. Bueno wrote: > >> On 20 January 2017 at 16:51, Elizabeth Myers wrote: >> > > Should I write up a PEP about this? I am not sure if it's justified or >>> not. It's 3 changes (calcsize and two format specifiers), but it might >>> be useful to codify it. >>> >> >> Yes - maybe a PEP. >> > > I agree, especially if the change, simple as it is, requires a lot of > rewrite. In that case someone (ELizabeth?) should collect ideas for other > improvements and shepherd it through the PEP process. > > -- > ~Ethan~ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Fri Jan 20 15:39:08 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 20 Jan 2017 20:39:08 +0000 Subject: [Python-ideas] Ideas for improving the struct module In-Reply-To: References: <20170120025431.GA82264@cskk.homeip.net> <844362e2-6573-21d8-ca16-5eab86ff33b3@interlinked.me> <588253DA.8040802@stoneleaf.us> Message-ID: On 20 January 2017 at 18:18, Guido van Rossum wrote: > I'd be wary of making a grab-bag of small improvements, it encourages > bikeshedding. Agreed. Plus the bikeshedding and debating risks draining Elizabeth's motivation. Paul From elizabeth at interlinked.me Fri Jan 20 15:47:50 2017 From: elizabeth at interlinked.me (Elizabeth Myers) Date: Fri, 20 Jan 2017 14:47:50 -0600 Subject: [Python-ideas] Ideas for improving the struct module In-Reply-To: References: <20170120025431.GA82264@cskk.homeip.net> <844362e2-6573-21d8-ca16-5eab86ff33b3@interlinked.me> Message-ID: On 20/01/17 10:59, Paul Moore wrote: > On 20 January 2017 at 16:51, Elizabeth Myers wrote: >> Should I write up a PEP about this? I am not sure if it's justified or >> not. It's 3 changes (calcsize and two format specifiers), but it might >> be useful to codify it. > > It feels a bit minor to need a PEP, but having said that did you pick > up on the comment about needing to return the number of bytes > consumed? > > str = struct.unpack('z', b'test\0xxx') > > How do we know where the unpack got to, so that we can continue > parsing from there? It seems a bit wasteful to have to scan the string > twice to use calcsize for this... > > A PEP (or at least, a PEP-style design document) might capture the > answer to questions like this. OTOH, the tracker discussion could > easily be enough - can you put a reference to the bug report here? > > Paul > Two things: 1) struct.unpack and struct.unpack_from should remain backwards-compatible. I don't want to return extra values from it like (length unpacked, (data...)) for that reason. If the calcsize solution feels a bit weird (it isn't much less efficient, because strings store their length with them, so it's constant-time), there could also be new functions that *do* return the length if you need it. To me though, this feels like a use case for struct.iter_unpack. 2) I want to avoid making a weird incongruity, where only variable-length strings return the length actually parsed. This also doesn't really help with length calculations unless you're doing calcsize without the variable-length specifiers, then adding it on. It's just more of an annoyance. On 20/01/17 12:18, Guido van Rossum wrote: > I'd be wary of making a grab-bag of small improvements, it encourages > bikeshedding. > > --Guido (mobile) Definitely would prefer to avoid a bikeshed here, though other improvements to the struct module are certainly welcome! (Though about a better interface, I made a neat little prototype module for an object-oriented interface to struct, but I want to clean it up before I release it to the world... but I'm not sure I want to include it in the standard library, that's for another day and another proposal :p). -- Elizabeth From p.f.moore at gmail.com Fri Jan 20 15:56:13 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 20 Jan 2017 20:56:13 +0000 Subject: [Python-ideas] Ideas for improving the struct module In-Reply-To: References: <20170120025431.GA82264@cskk.homeip.net> <844362e2-6573-21d8-ca16-5eab86ff33b3@interlinked.me> Message-ID: On 20 January 2017 at 20:47, Elizabeth Myers wrote: > Two things: > > 1) struct.unpack and struct.unpack_from should remain > backwards-compatible. I don't want to return extra values from it like > (length unpacked, (data...)) for that reason. If the calcsize solution > feels a bit weird (it isn't much less efficient, because strings store > their length with them, so it's constant-time), there could also be new > functions that *do* return the length if you need it. To me though, this > feels like a use case for struct.iter_unpack. > > 2) I want to avoid making a weird incongruity, where only > variable-length strings return the length actually parsed. This also > doesn't really help with length calculations unless you're doing > calcsize without the variable-length specifiers, then adding it on. It's > just more of an annoyance. Fair points, both. And you've clearly thought the issues through, so I'm +1 on your decision. You have the actual use case, and I'm just theorising, so I'm happy to defer the decision to you. Paul From cs at zip.com.au Fri Jan 20 17:46:38 2017 From: cs at zip.com.au (Cameron Simpson) Date: Sat, 21 Jan 2017 09:46:38 +1100 Subject: [Python-ideas] Ideas for improving the struct module In-Reply-To: References: Message-ID: <20170120224638.GA70981@cskk.homeip.net> On 20Jan2017 14:47, Elizabeth Myers wrote: >1) struct.unpack and struct.unpack_from should remain >backwards-compatible. I don't want to return extra values from it like >(length unpacked, (data...)) for that reason. Fully agree with this. >If the calcsize solution >feels a bit weird (it isn't much less efficient, because strings store >their length with them, so it's constant-time), there could also be new >functions that *do* return the length if you need it. To me though, this >feels like a use case for struct.iter_unpack. Often, maybe, but there are still going to be protocols that the new format doesn't support, where the performant thing to do (in pure Python) is to scan what you can with struct and "hand scan" the special bits with special code. Consider, for example, a format like MP4/ISO14496, where there's a regular block structure (which is somewhat struct parsable) that can contain embedded arbitraily weird information. Or the flipside where struct parsable data are embedded in a format not supported by struct. The mixed situation is where you need to know where the parse got up to. Calling calcsize or its variable size equivalent after a parse seems needlessly repetetive of the parse work. For myself, I would want there to be some kind of call that returned the parse and the length scanned, with the historic interface preserved for the fixed size formats or for users not needing the length. >2) I want to avoid making a weird incongruity, where only >variable-length strings return the length actually parsed. Fully agree. Arguing for two API calls: the current one and one that also returns the scan length. Cheers, Cameron Simpson From njs at pobox.com Fri Jan 20 18:24:16 2017 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 20 Jan 2017 15:24:16 -0800 Subject: [Python-ideas] Ideas for improving the struct module In-Reply-To: References: <20170120025431.GA82264@cskk.homeip.net> <844362e2-6573-21d8-ca16-5eab86ff33b3@interlinked.me> Message-ID: On Jan 20, 2017 12:48 PM, "Elizabeth Myers" wrote: On 20/01/17 10:59, Paul Moore wrote: > On 20 January 2017 at 16:51, Elizabeth Myers wrote: >> Should I write up a PEP about this? I am not sure if it's justified or >> not. It's 3 changes (calcsize and two format specifiers), but it might >> be useful to codify it. > > It feels a bit minor to need a PEP, but having said that did you pick > up on the comment about needing to return the number of bytes > consumed? > > str = struct.unpack('z', b'test\0xxx') > > How do we know where the unpack got to, so that we can continue > parsing from there? It seems a bit wasteful to have to scan the string > twice to use calcsize for this... > > A PEP (or at least, a PEP-style design document) might capture the > answer to questions like this. OTOH, the tracker discussion could > easily be enough - can you put a reference to the bug report here? > > Paul > Two things: 1) struct.unpack and struct.unpack_from should remain backwards-compatible. I don't want to return extra values from it like (length unpacked, (data...)) for that reason. If the calcsize solution feels a bit weird (it isn't much less efficient, because strings store their length with them, so it's constant-time), there could also be new functions that *do* return the length if you need it. To me though, this feels like a use case for struct.iter_unpack. iter_unpack is strictly less powerful - you can easily and efficiently implement iter_unpack using unpack_from_with_offset (probably not it's real name, but you get the idea). The reverse is not true. And: val, offset = somefunc(buffer, offset) is *the* idiomatic signature for functions for unpacking complex binary formats. I've seen it reinvented independently at least 4 times in real projects. (It turns out that implementing sleb128 encoding in Python is sufficiently frustrating that you end up making lots of attempts to find someone anyone who has already done it. Or at least, I did :-).) Here's an example of this idiom used to parse Mach-O binding tables, which iter_unpack definitely can't do: https://github.com/njsmith/machomachomangler/blob/master/ machomachomangler/macho.py#L374-L429 Actually this example is a bit extreme since the format is *all* variable-width stuff, but it gives the idea. There are also lots of formats that have a mix of struct-style fixed width and variable width fields in a complicated pattern, e.g.: https://zs.readthedocs.io/en/latest/format.html#layout-details Definitely would prefer to avoid a bikeshed here, though other improvements to the struct module are certainly welcome! It doesn't necessarily have to be part of the same change, but if struct is gaining the infrastructure to support variable-width layouts then adding uleb128/sleb128 format specifiers would make a lot of sense. Implementing them in pure Python is difficult (all the standard "how to en/decode u/sleb128" documentation assumes you're working with C-style modulo integers) and slow, and they turn up all over the place: both of those links above, in Google protobufs, as a primitive in the .Net equivalent of the struct module [1], etc. -n [1] https://msdn.microsoft.com/en-us/library/system.io.binarywriter.write7bitencodedint.aspx -------------- next part -------------- An HTML attachment was scrubbed... URL: From elizabeth at interlinked.me Fri Jan 20 18:26:31 2017 From: elizabeth at interlinked.me (Elizabeth Myers) Date: Fri, 20 Jan 2017 17:26:31 -0600 Subject: [Python-ideas] Ideas for improving the struct module In-Reply-To: <20170120224638.GA70981@cskk.homeip.net> References: <20170120224638.GA70981@cskk.homeip.net> Message-ID: <9ac00934-04f3-9142-b02b-b9b04f3859d8@interlinked.me> On 20/01/17 16:46, Cameron Simpson wrote: > On 20Jan2017 14:47, Elizabeth Myers wrote: >> 1) struct.unpack and struct.unpack_from should remain >> backwards-compatible. I don't want to return extra values from it like >> (length unpacked, (data...)) for that reason. > > Fully agree with this. > >> If the calcsize solution >> feels a bit weird (it isn't much less efficient, because strings store >> their length with them, so it's constant-time), there could also be new >> functions that *do* return the length if you need it. To me though, this >> feels like a use case for struct.iter_unpack. > > Often, maybe, but there are still going to be protocols that the new > format doesn't support, where the performant thing to do (in pure > Python) is to scan what you can with struct and "hand scan" the special > bits with special code. > Consider, for example, a format like MP4/ISO14496, where there's a > regular block structure (which is somewhat struct parsable) that can > contain embedded arbitraily weird information. Or the flipside where > struct parsable data are embedded in a format not supported by struct. > > The mixed situation is where you need to know where the parse got up > to. Calling calcsize or its variable size equivalent after a parse > seems needlessly repetetive of the parse work. > > For myself, I would want there to be some kind of call that returned the > parse and the length scanned, with the historic interface preserved for > the fixed size formats or for users not needing the length. > >> 2) I want to avoid making a weird incongruity, where only >> variable-length strings return the length actually parsed. > > Fully agree. Arguing for two API calls: the current one and one that > also returns the scan length. > > Cheers, > Cameron Simpson Some of the responses on the bug are discouraging... mostly seems to boil down to people just not wanting to expand the struct module or discourage its use. Everyone is a critic. I didn't know adding two format specifiers was going to be this controversial. You'd think I proposed adding braces or something :/. I'm hesitant to go forward on this until the bug has a resolution. From elizabeth at interlinked.me Fri Jan 20 18:37:33 2017 From: elizabeth at interlinked.me (Elizabeth Myers) Date: Fri, 20 Jan 2017 17:37:33 -0600 Subject: [Python-ideas] Ideas for improving the struct module In-Reply-To: <9ac00934-04f3-9142-b02b-b9b04f3859d8@interlinked.me> References: <20170120224638.GA70981@cskk.homeip.net> <9ac00934-04f3-9142-b02b-b9b04f3859d8@interlinked.me> Message-ID: <3ed65893-8861-e81f-d90c-d6f9211db6bd@interlinked.me> On 20/01/17 17:26, Elizabeth Myers wrote: > On 20/01/17 16:46, Cameron Simpson wrote: >> On 20Jan2017 14:47, Elizabeth Myers wrote: >>> 1) struct.unpack and struct.unpack_from should remain >>> backwards-compatible. I don't want to return extra values from it like >>> (length unpacked, (data...)) for that reason. >> >> Fully agree with this. >> >>> If the calcsize solution >>> feels a bit weird (it isn't much less efficient, because strings store >>> their length with them, so it's constant-time), there could also be new >>> functions that *do* return the length if you need it. To me though, this >>> feels like a use case for struct.iter_unpack. >> >> Often, maybe, but there are still going to be protocols that the new >> format doesn't support, where the performant thing to do (in pure >> Python) is to scan what you can with struct and "hand scan" the special >> bits with special code. >> Consider, for example, a format like MP4/ISO14496, where there's a >> regular block structure (which is somewhat struct parsable) that can >> contain embedded arbitraily weird information. Or the flipside where >> struct parsable data are embedded in a format not supported by struct. >> >> The mixed situation is where you need to know where the parse got up >> to. Calling calcsize or its variable size equivalent after a parse >> seems needlessly repetetive of the parse work. >> >> For myself, I would want there to be some kind of call that returned the >> parse and the length scanned, with the historic interface preserved for >> the fixed size formats or for users not needing the length. >> >>> 2) I want to avoid making a weird incongruity, where only >>> variable-length strings return the length actually parsed. >> >> Fully agree. Arguing for two API calls: the current one and one that >> also returns the scan length. >> >> Cheers, >> Cameron Simpson > > Some of the responses on the bug are discouraging... mostly seems to > boil down to people just not wanting to expand the struct module or > discourage its use. Everyone is a critic. I didn't know adding two > format specifiers was going to be this controversial. You'd think I > proposed adding braces or something :/. > > I'm hesitant to go forward on this until the bug has a resolution. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > Also, btw, adding 128-bit length specifiers sounds like a good idea in theory, but the difficulty stems from the fact there's no real native 128-bit type that's portable. I don't know much about how python handles big ints internally, either, but I could learn. I was looking into implementing this already, and it appears it should be possible by teaching the module that "not all data is fixed length" and allowing functions to report back (via a Py_ssize_t *) how much data was actually unpacked/packed. But again, waiting on that bug to have a resolution before I do anything. I don't want to waste hours of effort on something the developers ultimately decide they don't want and will just reject. -- Elizabeth From cs at zip.com.au Fri Jan 20 21:36:26 2017 From: cs at zip.com.au (Cameron Simpson) Date: Sat, 21 Jan 2017 13:36:26 +1100 Subject: [Python-ideas] Ideas for improving the struct module In-Reply-To: <9ac00934-04f3-9142-b02b-b9b04f3859d8@interlinked.me> References: <9ac00934-04f3-9142-b02b-b9b04f3859d8@interlinked.me> Message-ID: <20170121023626.GA31200@cskk.homeip.net> On 20Jan2017 17:26, Elizabeth Myers wrote: >Some of the responses on the bug are discouraging... mostly seems to >boil down to people just not wanting to expand the struct module or >discourage its use. Everyone is a critic. I didn't know adding two >format specifiers was going to be this controversial. You'd think I >proposed adding braces or something :/. > >I'm hesitant to go forward on this until the bug has a resolution. Yes, they are, but I think they're being overly negative myself. The struct module _is_ explicitly targeted at C structs, and maybe its internals are quite rigid (I haven't looked). But as you say, bot NUL terminated strings and run length encoded strings are very common, and struct does not support them. Waiting for a bug resolution seems unrealistic to me; plenty of bugs don't get resolutions at all, and to resolve this someone needs to take ownership of the bug and decide on something, and that the opposing views don't carry enouygh weight. Why not write a PEP? If nothing else, even if it gets rejected (plenty of PEPs are rejected, and kept on record to preserve the arguments) it will be visible and on the record. And it will be a concrete proposal, not awash in bikeshedding. You can update the PEP to reflect the salient parts of the bikeshedding as it happens. Make it narrow focus, explicitly the variable length thing, just like your issue. List the arguments for this (real world use cases, perhaps example real world code now and how it would be with the new feature) and the arguments against. Describe the additional API (at the least it needs an additional calcsize-like function that will return the data length scanned). Make it clear that the current API will continue to work unchanged. Have you read the struct module? Do you think your additions would be very intrusive to it, or relatively simple? Will the present performance be likely to be the same with your additions (not necessarily to cost to parse the new formats, but the performance with any existing fixed length structs)? Cheers, Cameron Simpson From njs at pobox.com Fri Jan 20 22:39:15 2017 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 20 Jan 2017 19:39:15 -0800 Subject: [Python-ideas] Ideas for improving the struct module In-Reply-To: <3ed65893-8861-e81f-d90c-d6f9211db6bd@interlinked.me> References: <20170120224638.GA70981@cskk.homeip.net> <9ac00934-04f3-9142-b02b-b9b04f3859d8@interlinked.me> <3ed65893-8861-e81f-d90c-d6f9211db6bd@interlinked.me> Message-ID: On Fri, Jan 20, 2017 at 3:37 PM, Elizabeth Myers wrote: [...] >> Some of the responses on the bug are discouraging... mostly seems to >> boil down to people just not wanting to expand the struct module or >> discourage its use. Everyone is a critic. I didn't know adding two >> format specifiers was going to be this controversial. You'd think I >> proposed adding braces or something :/. >> >> I'm hesitant to go forward on this until the bug has a resolution. >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > Also, btw, adding 128-bit length specifiers sounds like a good idea in > theory, but the difficulty stems from the fact there's no real native > 128-bit type that's portable. I don't know much about how python handles > big ints internally, either, but I could learn. The "b128" in "uleb128" is short for "base 128"; it refers to how each byte contains one 7-bit "digit" of the integer being encoded -- so just like decimal needs 1 digit for 0-9, 2 digits for 10 - 99 = (10**2 - 1), etc., uleb128 uses 1 byte for 0-127, 2 bytes for 128 - 16383 = (128**2 - 1), etc. In practice most implementations are written in C and use some kind of native fixed width integer as the in-memory representation, and just error out if asked to decode a uleb128 that doesn't fit. In Python I suppose we could support encoding and decoding arbitrary size integers if we really wanted, but I also doubt anyone would be bothered if we were restricted to "only" handling numbers between 0 and 2**64 :-). > I was looking into implementing this already, and it appears it should > be possible by teaching the module that "not all data is fixed length" > and allowing functions to report back (via a Py_ssize_t *) how much data > was actually unpacked/packed. But again, waiting on that bug to have a > resolution before I do anything. I don't want to waste hours of effort > on something the developers ultimately decide they don't want and will > just reject. That's not really how Python bugs work in practice. For better or worse (and it's very often both), CPython development generally follows a traditional open-source model in which new proposals are only accepted if they have a champion who's willing to run the gauntlet of first proposing them, and then keep pushing forward through the hail of criticism and bike-shedding from random kibbitzers. This is at least in part a test to see how dedicated/stubborn you are about this feature. If you stop posting, then what will happen is that everyone else stops posting too, and the bug will just sit there unresolved indefinitely until you get (more) frustrated and give up. On the one hand, this does tend to guarantee that accepted proposals are very high quality and solve some important issue (b/c the champion didn't *really care* about the issue then they wouldn't put up with this). On the other hand, it's often pretty hellish for the individuals involved, and probably drives away all kinds of helpful contributions. But maybe it helps to know it's not personal? Having social capital definitely helps, but well-known/experienced contributors get put through this wringer too; the main difference is that we do it with eyes open and have strategies for navigating the system (at least until we get burned out). Some of these strategies that you might find helpful (or not): - realize that it's really common for someone to be all like "this is TERRIBLE and should definitely not be merged because of which is a TOTAL SHOW-STOPPER", but then if you ignore the histrionics and move forward anyway, it often turns out that all that person *actually* wanted was to see a brief paragraph in your design summary that acknowledges that you are aware of the existence of , and once they see this they're happy. (See also: [1]) - speaking of which, it is often very helpful to write up a short document to summarize and organize the different ideas proposed, critiques raised, and what you conclude based on them! That's basically what a "PEP" is - just an email in a somewhat standard format that reviews all the important issues that were raised and then says what you conclude and why, and which eventually also ends up on the website as a record. If you decide to try this then there are some guidelines [2][3] and a sample PEP [4] to start with. (The guidelines make it sound much more formal and scary than it really is, though -- e.g. when they say "your submission may be AUTOMATICALLY REJECTED" then in my experience what they actually mean is you might get a reply back saying "hey fyi the formatter script barfed on your document because you were missing a quote so I fixed it for you".) This particular proposal is really on the lower boundary of needing a PEP and you might well be able to push it through without one, but it might be easier to go this way than not. - sift through the responses to pick the ones that seem actually useful to you, then close the browser tab and go off and implement what you actually think should be implemented and come back with a patch. This does a few things: (a) it helps get everyone on the same page and make the discussion much more concrete, which tends to eliminate a lot of idle criticism/bikeshedding, (b) it tends to attract higher-quality responses because it demonstrates you're serious and makes it look more like this is a thing that will actually happen (see again the "trial by combat" thing above), (c) many of the experts whose good opinion is important are attention-scattered volunteers who are bad at time management and prioritization (I include myself in this category!), so if you stick a patch in front of their faces then you can trick them into switching into code review mode instead of design critique mode. And it's much easier to respond to "your semicolon is in the wrong place" than "but what is the struct module really *for*, I mean, in its heart of hearts?", you know? - join the core-mentorship list [5] and ask for help there. Actually this should probably be the first suggestion on this list! it's a group of folks who're specifically volunteering to help people like you get through this process :-) Hope that helps, -n [1] http://www.ftrain.com/wwic.html [2] https://www.python.org/dev/peps/pep-0001/#submitting-a-pep [3] https://www.python.org/dev/peps/pep-0001/#what-belongs-in-a-successful-pep [4] https://github.com/python/peps/blob/master/pep-0012.txt [5] https://mail.python.org/mailman/listinfo/core-mentorship -- Nathaniel J. Smith -- https://vorpus.org From njs at pobox.com Fri Jan 20 22:51:04 2017 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 20 Jan 2017 19:51:04 -0800 Subject: [Python-ideas] Ideas for improving the struct module In-Reply-To: References: <20170120224638.GA70981@cskk.homeip.net> <9ac00934-04f3-9142-b02b-b9b04f3859d8@interlinked.me> <3ed65893-8861-e81f-d90c-d6f9211db6bd@interlinked.me> Message-ID: On Fri, Jan 20, 2017 at 7:39 PM, Nathaniel Smith wrote: > [...] > Some of these strategies that you might find helpful (or not): Oh right, and of course just after I hit send I realized I forgot one of my favorites! - come up with a real chunk of code from a real project that would benefit from the change being proposed, and show what it looks like before/after the feature is added. This can be incredibly persuasive *but* it's *super important* that the code be as real as possible. The ideal is for it to solve a *concrete* *real-world* problem that can be described in a few sentences, and be drawn from a real code base that faces that problem. One of the biggest challenges for maintainers is figuring out how Python is actually used in the real world, because we all have very little visibility outside our own little bubbles, so people really appreciate this -- but at the same time, python-ideas is absolutely awash with people coming up with weird hypothetical situations where their pet idea would be just the ticket, so anything that comes across as cherry-picked like that tends to be heavily discounted. Sure, there *are* situations where the superpower of breathing underwater can help you fight crime, but... http://strongfemaleprotagonist.com/issue-6/page-63-3/ http://strongfemaleprotagonist.com/issue-6/page-64-3/ -n -- Nathaniel J. Smith -- https://vorpus.org From ncoghlan at gmail.com Sat Jan 21 07:14:58 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 21 Jan 2017 23:14:58 +1100 Subject: [Python-ideas] Ideas for improving the struct module In-Reply-To: References: <20170120224638.GA70981@cskk.homeip.net> <9ac00934-04f3-9142-b02b-b9b04f3859d8@interlinked.me> <3ed65893-8861-e81f-d90c-d6f9211db6bd@interlinked.me> Message-ID: On 21 January 2017 at 14:51, Nathaniel Smith wrote: > On Fri, Jan 20, 2017 at 7:39 PM, Nathaniel Smith wrote: >> [...] >> Some of these strategies that you might find helpful (or not): > > Oh right, and of course just after I hit send I realized I forgot one > of my favorites! > > - come up with a real chunk of code from a real project that would > benefit from the change being proposed, and show what it looks like > before/after the feature is added. This can be incredibly persuasive > *but* it's *super important* that the code be as real as possible. The > ideal is for it to solve a *concrete* *real-world* problem that can be > described in a few sentences, and be drawn from a real code base that > faces that problem. One of the biggest challenges for maintainers is > figuring out how Python is actually used in the real world, because we > all have very little visibility outside our own little bubbles, so > people really appreciate this -- but at the same time, python-ideas is > absolutely awash with people coming up with weird hypothetical > situations where their pet idea would be just the ticket, so anything > that comes across as cherry-picked like that tends to be heavily > discounted. In the specific case of this proposal, an interesting stress test of any design proposal would be to describe the layout of a CPython tuple in memory. If you trace through the struct and macro definition details in https://hg.python.org/cpython/file/tip/Include/object.h and https://hg.python.org/cpython/file/tip/Include/tupleobject.h you'll find that the last two fields in PyTupleObject are: Py_ssize_t ob_size; PyObject *ob_item[1]; So this is a C struct definition *in the CPython code base* that the struct module currently cannot describe (other PyVarObject definitions are similar to tuples, but don't necessarily guarantee that ob_size is the last field before the variable length section). Similarly, PyASCIIObject and PyCompactUnicodeObject append a data buffer to the preceding struct in order to include both a string's metadata and its contents into the same memory allocation. In that case, the buffer is also null-terminated in addition to having its length specified in the string metadata. So both of the proposals Elizabeth is making reflect ways that CPython itself uses C structs (matching the heritage of the module's name), even though the primary practical motivation and use case is common over-the-wire protocols. Cheers, Nick. P.S. "the reference interpreter does this" and "the standard library does this" can be particularly compelling sources of real world example code :) -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sat Jan 21 07:30:13 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 21 Jan 2017 23:30:13 +1100 Subject: [Python-ideas] PEP 538 (C locale coercion) now depends on PEP 540 (UTF-8 mode) Message-ID: Hi folks, I've pushed an update to PEP 538 that eliminates any fallback to the en_US.UTF-8 locale, and instead relies solely on PEP 540's bypassing of the platform's locale subsystem in cases where the default C locale is specified. This revision means that the two PEPs can now be read as follows: * PEP 540 will make CPython's default behaviour significantly more user friendly when the OS claims that the system API encoding is ASCII, at the cost of making it inconsistent with other C/C++ components in the same process * PEP 538 is then a follow-on proposal to fix that inconsistency by coercing the C/C++ locale settings to better align with CPython's decision to ignore the nominal ASCII encoding I won't repost the whole thing here, but will instead provide relevant links and note a few particular highlights of the changes: * Full PEP: https://www.python.org/dev/peps/pep-0538/ * Diff: https://github.com/python/peps/commit/481573aa2722f515b38e30beaaffc5e1fb9bbfb4 Major changes: * en_US-UTF-8 fallback eliminated in favour of the PYTHONUTF8 feature in PEP 540 * "LC_CTYPE=UTF-8" fallback added with the aim of improving behaviour on *BSD systems * surrogateescape is enabled on standard streams by default (see behavioural examples in the PEP for rationale) * notes that the proposal makes it possible to fix an interaction bug with the GNU readline module on Android I haven't updated the reference implementation yet, as I'd prefer to see PEP 540 resolved and merged before doing that. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From songofacandy at gmail.com Sat Jan 21 22:06:38 2017 From: songofacandy at gmail.com (INADA Naoki) Date: Sun, 22 Jan 2017 12:06:38 +0900 Subject: [Python-ideas] PEP 538 (C locale coercion) now depends on PEP 540 (UTF-8 mode) In-Reply-To: References: Message-ID: I love it PEP 540 helps using UTF-8 on Python. PEP 538 helps using UTF-8 on C libraries within Python, too, (if system supports LC_CTYPE=UTF-8 or C.UTF-8.) It seems best solution we can do for now. From g.rodola at gmail.com Sun Jan 22 06:15:27 2017 From: g.rodola at gmail.com (Giampaolo Rodola') Date: Sun, 22 Jan 2017 12:15:27 +0100 Subject: [Python-ideas] Context manager to temporarily set signal handlers In-Reply-To: <1484916379.4033524.853975624.1BE29C9D@webmail.messagingengine.com> References: <1484916379.4033524.853975624.1BE29C9D@webmail.messagingengine.com> Message-ID: I don't know if this is related (regarding functions registered in C) but one problem I often have is to always execute exit functions. I have come up with this: http://grodola.blogspot.com/2016/02/how-to-always-execute-exit-functions-in-py.html On Fri, Jan 20, 2017 at 1:46 PM, Thomas Kluyver wrote: > Not uncommonly, I want to do something like this in code: > > import signal > > # Install my own signal handler > prev_hup = signal.signal(signal.SIGHUP, my_handler) > prev_term = signal.signal(signal.SIGTERM, my_handler) > try: > do_something_else() > finally: > # Restore previous signal handlers > signal.signal(signal.SIGHUP, prev_hup) > signal.signal(signal.SIGTERM, prev_term) > > This works if the existing signal handler is a Python function, or the > special values SIG_IGN (ignore) or SIG_DFL (default). However, it breaks > if code has set a signal handler in C: this is not returned, and there > is no way in Python to reinstate a C-level signal handler once we've > replaced it from Python. > > I propose two possible solutions: > > 1. The high-level approach: a context manager which can temporarily set > one or more signal handlers. If this was implemented in C, it could > restore C-level as well as Python-level signal handlers. > > 2. A lower level approach: signal() and getsignal() would gain the > ability to return an opaque object which refers to a C-level signal > handler. The only use for this would be to pass it back to > signal.signal() to set it as a signal handler again. The context manager > from (1) could then be implemented in Python. > > Crosslinking http://bugs.python.org/issue13285 > > Thomas > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Giampaolo - http://grodola.blogspot.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From fakedme+py at gmail.com Sun Jan 22 17:45:01 2017 From: fakedme+py at gmail.com (Soni L.) Date: Sun, 22 Jan 2017 20:45:01 -0200 Subject: [Python-ideas] "Immutable Builder" Pattern and Operator Message-ID: I've been thinking of an Immutable Builder pattern and an operator to go with it. Since the builder would be immutable, this wouldn't work: long_name = mkbuilder() long_name.seta(a) long_name.setb(b) y = long_name.build() Instead, you'd need something more like this: long_name = mkbuilder() long_name = long_name.seta(a) long_name = long_name.setb(b) y = long_name.build() Or we could add an operator to simplify it: long_name = mkbuilder() long_name .= seta(a) long_name .= setb(b) y = long_name.build() (Yes, I'm aware you can x = mkbuilder().seta(a).setb(b), then y = x.build(). But that doesn't work if you wanna "fork" the builder. Some builders, like a builder for network connections of some sort, would work best if they were immutable/forkable.) From storchaka at gmail.com Sun Jan 22 17:54:58 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Mon, 23 Jan 2017 00:54:58 +0200 Subject: [Python-ideas] "Immutable Builder" Pattern and Operator In-Reply-To: References: Message-ID: On 23.01.17 00:45, Soni L. wrote: > I've been thinking of an Immutable Builder pattern and an operator to go > with it. Since the builder would be immutable, this wouldn't work: > > long_name = mkbuilder() > long_name.seta(a) > long_name.setb(b) > y = long_name.build() I think the more pythonic way is: y = build(a=a, b=b) A Builder pattern is less used in Python due to the support of keyword arguments. From jsbueno at python.org.br Sun Jan 22 18:19:58 2017 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Sun, 22 Jan 2017 21:19:58 -0200 Subject: [Python-ideas] "Immutable Builder" Pattern and Operator In-Reply-To: References: Message-ID: This is easy to do in Python, and we already have NamedTuples and other things. If you need such a builder anyway, this snippet can work - no need for special syntax: https://gist.github.com/jsbueno/b2b5f5c06caa915c451253bb4f171ee9 On 22 January 2017 at 20:54, Serhiy Storchaka wrote: > On 23.01.17 00:45, Soni L. wrote: >> >> I've been thinking of an Immutable Builder pattern and an operator to go >> with it. Since the builder would be immutable, this wouldn't work: >> >> long_name = mkbuilder() >> long_name.seta(a) >> long_name.setb(b) >> y = long_name.build() > > > I think the more pythonic way is: > > y = build(a=a, b=b) > > A Builder pattern is less used in Python due to the support of keyword > arguments. > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From fakedme+py at gmail.com Sun Jan 22 18:30:20 2017 From: fakedme+py at gmail.com (Soni L.) Date: Sun, 22 Jan 2017 21:30:20 -0200 Subject: [Python-ideas] "Immutable Builder" Pattern and Operator In-Reply-To: References: Message-ID: <2127feaa-0a5b-8246-978f-6346dea554a2@gmail.com> On 22/01/17 08:54 PM, Serhiy Storchaka wrote: > On 23.01.17 00:45, Soni L. wrote: >> I've been thinking of an Immutable Builder pattern and an operator to go >> with it. Since the builder would be immutable, this wouldn't work: >> >> long_name = mkbuilder() >> long_name.seta(a) >> long_name.setb(b) >> y = long_name.build() > > I think the more pythonic way is: > > y = build(a=a, b=b) > > A Builder pattern is less used in Python due to the support of keyword > arguments. I guess you could do something like this, for an IRC bot builder: fnircbotbuilder = mkircbotbuilder(network="irc.freenode.net", port=6697, ssl=true) mainbot = mkircbotbuilder(parent=fnircbotbuilder, # ??? channels=["#bots"]).build() fndccbotbuilder = mkircbotbuilder(parent=fnircbotbuilder, dcc=true, channeldcc=true) dccbot = mkircbotbuilder(parent=fndccbotbuilder, channels=["#ctcp-s"]).build() otherircbotbuilder = mkircbotbuilder(parent=fndccbotbuilder, network="irc.subluminal.net") # because we want this whole network otherbot = mkircbotbuilder(parent=otherircbotbuilder, channels=["#programming"]).build() # to use DCC and channel DCC But this would be cleaner: botbuilder = mkircbotbuilder().network("irc.freenode.net").port(6697).ssl(true) mainbot = botbuilder.channels(["#bots"]).build() botbuilder .= dcc(true).channeldcc(true) dccbot = botbuilder.channels(["#ctcp-s"]).build() botbuilder .= network("irc.subluminal.net") otherbot = botbuilder.channels(["#programming"]).build() (I mean, "channels" could/should be a per-bot property instead of a per-builder property... but that doesn't affect the examples much.) From ethan at stoneleaf.us Sun Jan 22 18:41:26 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 22 Jan 2017 15:41:26 -0800 Subject: [Python-ideas] "Immutable Builder" Pattern and Operator In-Reply-To: <2127feaa-0a5b-8246-978f-6346dea554a2@gmail.com> References: <2127feaa-0a5b-8246-978f-6346dea554a2@gmail.com> Message-ID: <58854326.7040502@stoneleaf.us> On 01/22/2017 03:30 PM, Soni L. wrote: > On 22/01/17 08:54 PM, Serhiy Storchaka wrote: >> On 23.01.17 00:45, Soni L. wrote: >>> I've been thinking of an Immutable Builder pattern and an operator to go >>> with it. Since the builder would be immutable, this wouldn't work: >>> >>> long_name = mkbuilder() >>> long_name.seta(a) >>> long_name.setb(b) >>> y = long_name.build() >> >> I think the more pythonic way is: >> >> y = build(a=a, b=b) >> >> A Builder pattern is less used in Python due to the support of keyword arguments. > > I guess you could do something like this, for an IRC bot builder: > > fnircbotbuilder = mkircbotbuilder(network="irc.freenode.net", port=6697, ssl=true) > mainbot = mkircbotbuilder(parent=fnircbotbuilder, # ??? > channels=["#bots"]).build() > fndccbotbuilder = mkircbotbuilder(parent=fnircbotbuilder, dcc=true, channeldcc=true) > dccbot = mkircbotbuilder(parent=fndccbotbuilder, channels=["#ctcp-s"]).build() > otherircbotbuilder = mkircbotbuilder(parent=fndccbotbuilder, network="irc.subluminal.net") # because we want this whole network > otherbot = mkircbotbuilder(parent=otherircbotbuilder, channels=["#programming"]).build() # to use DCC and channel DCC The following is just fine: fnircbotbuilder = mkircbotbuilder( network="irc.freenode.net", port=6697, ssl=true, ) mainbot = fnircbotbuilder(channels=["#bots"]).build() fndccbotbuilder = fnircbotbuilder(dcc=true, channeldcc=true) dccbot = fndccbotbuilder(channels=["#ctcp-s"]).build() otherircbotbuilder = fndccbotbuilder(network="irc.subluminal.net") otherbot = otherircbotbuilder(channels=["#programming"]).build() -- ~Ethan~ From wes.turner at gmail.com Sun Jan 22 18:52:14 2017 From: wes.turner at gmail.com (Wes Turner) Date: Sun, 22 Jan 2017 17:52:14 -0600 Subject: [Python-ideas] "Immutable Builder" Pattern and Operator In-Reply-To: References: Message-ID: Have you looked at pyrsistent for immutable/functional/persistent/copy-on-write data structures in Python? https://github.com/tobgu/pyrsistent/ (freeze() / thaw()) ... e.g. List and Dict NamedTuple values are not immutable (because append() and update() still work) On Sunday, January 22, 2017, Soni L. wrote: > I've been thinking of an Immutable Builder pattern and an operator to go > with it. Since the builder would be immutable, this wouldn't work: > > long_name = mkbuilder() > long_name.seta(a) > long_name.setb(b) > y = long_name.build() > > Instead, you'd need something more like this: > > long_name = mkbuilder() > long_name = long_name.seta(a) > long_name = long_name.setb(b) > y = long_name.build() > > Or we could add an operator to simplify it: > > long_name = mkbuilder() > long_name .= seta(a) > long_name .= setb(b) > y = long_name.build() > > (Yes, I'm aware you can x = mkbuilder().seta(a).setb(b), then y = > x.build(). But that doesn't work if you wanna "fork" the builder. Some > builders, like a builder for network connections of some sort, would work > best if they were immutable/forkable.) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Sun Jan 22 19:03:09 2017 From: wes.turner at gmail.com (Wes Turner) Date: Sun, 22 Jan 2017 18:03:09 -0600 Subject: [Python-ideas] "Immutable Builder" Pattern and Operator In-Reply-To: References: Message-ID: On Sunday, January 22, 2017, Wes Turner wrote: > Have you looked at pyrsistent for immutable/functional/persistent/copy-on-write > data structures in Python? > > https://github.com/tobgu/pyrsistent/ > > (freeze() / thaw()) > > ... e.g. List and Dict NamedTuple values are not immutable (because > append() and update() still work) > fn.py also has immutables: https://github.com/kachayev/fn.py/blob/master/README.rst#persistent-data-structures > > On Sunday, January 22, 2017, Soni L. > wrote: > >> I've been thinking of an Immutable Builder pattern and an operator to go >> with it. Since the builder would be immutable, this wouldn't work: >> >> long_name = mkbuilder() >> long_name.seta(a) >> long_name.setb(b) >> y = long_name.build() >> >> Instead, you'd need something more like this: >> >> long_name = mkbuilder() >> long_name = long_name.seta(a) >> long_name = long_name.setb(b) >> y = long_name.build() >> >> Or we could add an operator to simplify it: >> >> long_name = mkbuilder() >> long_name .= seta(a) >> long_name .= setb(b) >> y = long_name.build() >> >> (Yes, I'm aware you can x = mkbuilder().seta(a).setb(b), then y = >> x.build(). But that doesn't work if you wanna "fork" the builder. Some >> builders, like a builder for network connections of some sort, would work >> best if they were immutable/forkable.) >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fakedme+py at gmail.com Sun Jan 22 21:49:19 2017 From: fakedme+py at gmail.com (Soni L.) Date: Mon, 23 Jan 2017 00:49:19 -0200 Subject: [Python-ideas] "Immutable Builder" Pattern and Operator In-Reply-To: References: Message-ID: <5ff82f14-b0b8-9838-4924-9d73c6366f9a@gmail.com> On 22/01/17 10:03 PM, Wes Turner wrote: > > > On Sunday, January 22, 2017, Wes Turner > wrote: > > Have you looked at pyrsistent for > immutable/functional/persistent/copy-on-write data structures in > Python? > > https://github.com/tobgu/pyrsistent/ > > > (freeze() / thaw()) > > ... e.g. List and Dict NamedTuple values are not immutable > (because append() and update() still work) > > > fn.py also has immutables: > https://github.com/kachayev/fn.py/blob/master/README.rst#persistent-data-structures You seem to be thinking of "immutable object builder". Not "the builder itself is immutable and operations on it create new builders". > > On Sunday, January 22, 2017, Soni L. > wrote: > > I've been thinking of an Immutable Builder pattern and an > operator to go with it. Since the builder would be immutable, > this wouldn't work: > > long_name = mkbuilder() > long_name.seta(a) > long_name.setb(b) > y = long_name.build() > > Instead, you'd need something more like this: > > long_name = mkbuilder() > long_name = long_name.seta(a) > long_name = long_name.setb(b) > y = long_name.build() > > Or we could add an operator to simplify it: > > long_name = mkbuilder() > long_name .= seta(a) > long_name .= setb(b) > y = long_name.build() > > (Yes, I'm aware you can x = mkbuilder().seta(a).setb(b), then > y = x.build(). But that doesn't work if you wanna "fork" the > builder. Some builders, like a builder for network connections > of some sort, would work best if they were immutable/forkable.) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Sun Jan 22 23:09:34 2017 From: wes.turner at gmail.com (Wes Turner) Date: Sun, 22 Jan 2017 22:09:34 -0600 Subject: [Python-ideas] "Immutable Builder" Pattern and Operator In-Reply-To: <5ff82f14-b0b8-9838-4924-9d73c6366f9a@gmail.com> References: <5ff82f14-b0b8-9838-4924-9d73c6366f9a@gmail.com> Message-ID: On Sunday, January 22, 2017, Soni L. wrote: > > > On 22/01/17 10:03 PM, Wes Turner wrote: > > > > On Sunday, January 22, 2017, Wes Turner > wrote: > >> Have you looked at pyrsistent for immutable/functional/persistent/copy-on-write >> data structures in Python? >> >> https://github.com/tobgu/pyrsistent/ >> >> (freeze() / thaw()) >> >> ... e.g. List and Dict NamedTuple values are not immutable (because >> append() and update() still work) >> > > fn.py also has immutables: > https://github.com/kachayev/fn.py/blob/master/README.rst# > persistent-data-structures > > > You seem to be thinking of "immutable object builder". Not "the builder > itself is immutable and operations on it create new builders". > My mistake. Something like @optionable and/or @curried from fn.py in conjunction with PClass from pyrsistent may accomplish what you describe? > > > > >> >> On Sunday, January 22, 2017, Soni L. wrote: >> >>> I've been thinking of an Immutable Builder pattern and an operator to go >>> with it. Since the builder would be immutable, this wouldn't work: >>> >>> long_name = mkbuilder() >>> long_name.seta(a) >>> long_name.setb(b) >>> y = long_name.build() >>> >>> Instead, you'd need something more like this: >>> >>> long_name = mkbuilder() >>> long_name = long_name.seta(a) >>> long_name = long_name.setb(b) >>> y = long_name.build() >>> >>> Or we could add an operator to simplify it: >>> >>> long_name = mkbuilder() >>> long_name .= seta(a) >>> long_name .= setb(b) >>> y = long_name.build() >>> >>> (Yes, I'm aware you can x = mkbuilder().seta(a).setb(b), then y = >>> x.build(). But that doesn't work if you wanna "fork" the builder. Some >>> builders, like a builder for network connections of some sort, would work >>> best if they were immutable/forkable.) >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From cory at lukasa.co.uk Mon Jan 23 04:32:02 2017 From: cory at lukasa.co.uk (Cory Benfield) Date: Mon, 23 Jan 2017 09:32:02 +0000 Subject: [Python-ideas] "Immutable Builder" Pattern and Operator In-Reply-To: References: Message-ID: <8671EBCA-148F-41C8-A592-46EF653B9CAE@lukasa.co.uk> > On 22 Jan 2017, at 22:45, Soni L. wrote: > > This pattern is present in the cryptography module already with things like their x509.CertificateBuilder: https://cryptography.io/en/latest/x509/reference/#cryptography.x509.CertificateBuilder . My 2c, but I find that code perfectly readable and legible. I don?t think a dot-equals operator would be needed. Cory -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Mon Jan 23 04:54:55 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 23 Jan 2017 09:54:55 +0000 Subject: [Python-ideas] "Immutable Builder" Pattern and Operator In-Reply-To: References: Message-ID: On 22 January 2017 at 22:45, Soni L. wrote: > I've been thinking of an Immutable Builder pattern and an operator to go > with it. Since the builder would be immutable, this wouldn't work: > > long_name = mkbuilder() > long_name.seta(a) > long_name.setb(b) > y = long_name.build() > > Instead, you'd need something more like this: > > long_name = mkbuilder() > long_name = long_name.seta(a) > long_name = long_name.setb(b) > y = long_name.build() > > Or we could add an operator to simplify it: > > long_name = mkbuilder() > long_name .= seta(a) > long_name .= setb(b) > y = long_name.build() > > (Yes, I'm aware you can x = mkbuilder().seta(a).setb(b), then y = x.build(). > But that doesn't work if you wanna "fork" the builder. Some builders, like a > builder for network connections of some sort, would work best if they were > immutable/forkable.) I don't think the .= operator adds enough to be worth it. If the problem you see is the duplication of long_name in those lines (it's difficult to be sure without a real example) then you can use a temporary: b = mkbuilder() b = b.seta(a) b = b.setb(b) long_name = b y = long_name.build() For your real example: On 22 January 2017 at 23:30, Soni L. wrote: > fnircbotbuilder = mkircbotbuilder(network="irc.freenode.net", port=6697, > ssl=true) > mainbot = mkircbotbuilder(parent=fnircbotbuilder, # ??? > channels=["#bots"]).build() > fndccbotbuilder = mkircbotbuilder(parent=fnircbotbuilder, dcc=true, > channeldcc=true) > dccbot = mkircbotbuilder(parent=fndccbotbuilder, > channels=["#ctcp-s"]).build() > otherircbotbuilder = mkircbotbuilder(parent=fndccbotbuilder, > network="irc.subluminal.net") # because we want this whole network > otherbot = mkircbotbuilder(parent=otherircbotbuilder, > channels=["#programming"]).build() # to use DCC and channel DCC > > But this would be cleaner: > > botbuilder = > mkircbotbuilder().network("irc.freenode.net").port(6697).ssl(true) > mainbot = botbuilder.channels(["#bots"]).build() > botbuilder .= dcc(true).channeldcc(true) > dccbot = botbuilder.channels(["#ctcp-s"]).build() > botbuilder .= network("irc.subluminal.net") > otherbot = botbuilder.channels(["#programming"]).build() I don't find the second example appreciably cleaner than the first. But a bit of reformatting looks better to me: # First create builders for the bots fnircbotbuilder = mkircbotbuilder( network="irc.freenode.net", port=6697, ssl=true) fndccbotbuilder = mkircbotbuilder( parent=fnircbotbuilder, dcc=true, channeldcc=true) otherircbotbuilder = mkircbotbuilder( parent=fndccbotbuilder, network="irc.subluminal.net") # Now create the actual bots mainbot = mkircbotbuilder( parent=fnircbotbuilder, channels=["#bots"]).build() dccbot = mkircbotbuilder( parent=fndccbotbuilder, channels=["#ctcp-s"]).build() otherbot = mkircbotbuilder( parent=otherircbotbuilder, channels=["#programming"]).build() And some API redesign (make the builders classes, and the parent relationship becomes subclassing, and maybe make channels an argument to build() so that you don't need fresh builders for each of the actual bots, and you don't even need the "builder" in the name at this point) makes the whole thing look far cleaner (to me, at least): class FNIRCBot(IRCBot): network="irc.freenode.net" port=6697 ssl=True class FNDCCBot(FNIRCBot): dcc=True channeldcc=True class OtherIRCBot(IRCBot): network="irc.subluminal.net" mainbot = FNIRCBot(channels=["#bots"]) dccbot = FNDCCBot(channels=["#ctcp-s"]) otherbot = OtherIRCBot(channels=["#programming"]) Paul From storchaka at gmail.com Mon Jan 23 06:45:18 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Mon, 23 Jan 2017 13:45:18 +0200 Subject: [Python-ideas] "Immutable Builder" Pattern and Operator In-Reply-To: <2127feaa-0a5b-8246-978f-6346dea554a2@gmail.com> References: <2127feaa-0a5b-8246-978f-6346dea554a2@gmail.com> Message-ID: On 23.01.17 01:30, Soni L. wrote: > On 22/01/17 08:54 PM, Serhiy Storchaka wrote: >> On 23.01.17 00:45, Soni L. wrote: >>> I've been thinking of an Immutable Builder pattern and an operator to go >>> with it. Since the builder would be immutable, this wouldn't work: >>> >>> long_name = mkbuilder() >>> long_name.seta(a) >>> long_name.setb(b) >>> y = long_name.build() >> >> I think the more pythonic way is: >> >> y = build(a=a, b=b) >> >> A Builder pattern is less used in Python due to the support of keyword >> arguments. > > I guess you could do something like this, for an IRC bot builder: > > fnircbotbuilder = mkircbotbuilder(network="irc.freenode.net", port=6697, > ssl=true) > mainbot = mkircbotbuilder(parent=fnircbotbuilder, # ??? > channels=["#bots"]).build() > fndccbotbuilder = mkircbotbuilder(parent=fnircbotbuilder, dcc=true, > channeldcc=true) > dccbot = mkircbotbuilder(parent=fndccbotbuilder, > channels=["#ctcp-s"]).build() > otherircbotbuilder = mkircbotbuilder(parent=fndccbotbuilder, > network="irc.subluminal.net") # because we want this whole network > otherbot = mkircbotbuilder(parent=otherircbotbuilder, > channels=["#programming"]).build() # to use DCC and channel DCC > > But this would be cleaner: > > botbuilder = > mkircbotbuilder().network("irc.freenode.net").port(6697).ssl(true) > mainbot = botbuilder.channels(["#bots"]).build() > botbuilder .= dcc(true).channeldcc(true) > dccbot = botbuilder.channels(["#ctcp-s"]).build() > botbuilder .= network("irc.subluminal.net") > otherbot = botbuilder.channels(["#programming"]).build() In Python you can save common options in a dict and pass them as var-keyword argument. Or use functools.partial. In any case you don't need a builder class with the build method and a number of configuring methods. It can be just a function with optional keyword parameters. A Builder pattern is often used in languages that don't support passing arguments by keyword and partial functions. Python rarely needs the purposed class implementing a Builder pattern. Actually a Builder pattern is built-in in the language as a part of syntax. From fakedme+py at gmail.com Mon Jan 23 08:05:12 2017 From: fakedme+py at gmail.com (Soni L.) Date: Mon, 23 Jan 2017 11:05:12 -0200 Subject: [Python-ideas] "Immutable Builder" Pattern and Operator In-Reply-To: References: <2127feaa-0a5b-8246-978f-6346dea554a2@gmail.com> Message-ID: On 23/01/17 09:45 AM, Serhiy Storchaka wrote: > On 23.01.17 01:30, Soni L. wrote: >> On 22/01/17 08:54 PM, Serhiy Storchaka wrote: >>> On 23.01.17 00:45, Soni L. wrote: >>>> I've been thinking of an Immutable Builder pattern and an operator >>>> to go >>>> with it. Since the builder would be immutable, this wouldn't work: >>>> >>>> long_name = mkbuilder() >>>> long_name.seta(a) >>>> long_name.setb(b) >>>> y = long_name.build() >>> >>> I think the more pythonic way is: >>> >>> y = build(a=a, b=b) >>> >>> A Builder pattern is less used in Python due to the support of keyword >>> arguments. >> >> I guess you could do something like this, for an IRC bot builder: >> >> fnircbotbuilder = mkircbotbuilder(network="irc.freenode.net", port=6697, >> ssl=true) >> mainbot = mkircbotbuilder(parent=fnircbotbuilder, # ??? >> channels=["#bots"]).build() >> fndccbotbuilder = mkircbotbuilder(parent=fnircbotbuilder, dcc=true, >> channeldcc=true) >> dccbot = mkircbotbuilder(parent=fndccbotbuilder, >> channels=["#ctcp-s"]).build() >> otherircbotbuilder = mkircbotbuilder(parent=fndccbotbuilder, >> network="irc.subluminal.net") # because we want this whole network >> otherbot = mkircbotbuilder(parent=otherircbotbuilder, >> channels=["#programming"]).build() # to use DCC and channel DCC >> >> But this would be cleaner: >> >> botbuilder = >> mkircbotbuilder().network("irc.freenode.net").port(6697).ssl(true) >> mainbot = botbuilder.channels(["#bots"]).build() >> botbuilder .= dcc(true).channeldcc(true) >> dccbot = botbuilder.channels(["#ctcp-s"]).build() >> botbuilder .= network("irc.subluminal.net") >> otherbot = botbuilder.channels(["#programming"]).build() > > In Python you can save common options in a dict and pass them as > var-keyword argument. Or use functools.partial. In any case you don't > need a builder class with the build method and a number of configuring > methods. It can be just a function with optional keyword parameters. > > A Builder pattern is often used in languages that don't support > passing arguments by keyword and partial functions. Python rarely > needs the purposed class implementing a Builder pattern. Actually a > Builder pattern is built-in in the language as a part of syntax. > Yeah but the dotequals operator has many other benefits: long_name .= __call__ # cast to callable long_name .= wrapped # unwrap etc And it also looks neat. From mal at egenix.com Mon Jan 23 08:18:18 2017 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 23 Jan 2017 14:18:18 +0100 Subject: [Python-ideas] "Immutable Builder" Pattern and Operator In-Reply-To: References: <2127feaa-0a5b-8246-978f-6346dea554a2@gmail.com> Message-ID: <0654a27e-7200-c468-d4eb-17bef13b61d2@egenix.com> On 23.01.2017 14:05, Soni L. wrote: > > > On 23/01/17 09:45 AM, Serhiy Storchaka wrote: >> On 23.01.17 01:30, Soni L. wrote: >>> On 22/01/17 08:54 PM, Serhiy Storchaka wrote: >>>> On 23.01.17 00:45, Soni L. wrote: >>>>> I've been thinking of an Immutable Builder pattern and an operator >>>>> to go >>>>> with it. Since the builder would be immutable, this wouldn't work: >>>>> >>>>> long_name = mkbuilder() >>>>> long_name.seta(a) >>>>> long_name.setb(b) >>>>> y = long_name.build() >>>> >>>> I think the more pythonic way is: >>>> >>>> y = build(a=a, b=b) >>>> >>>> A Builder pattern is less used in Python due to the support of keyword >>>> arguments. >>> >>> I guess you could do something like this, for an IRC bot builder: >>> >>> fnircbotbuilder = mkircbotbuilder(network="irc.freenode.net", port=6697, >>> ssl=true) >>> mainbot = mkircbotbuilder(parent=fnircbotbuilder, # ??? >>> channels=["#bots"]).build() >>> fndccbotbuilder = mkircbotbuilder(parent=fnircbotbuilder, dcc=true, >>> channeldcc=true) >>> dccbot = mkircbotbuilder(parent=fndccbotbuilder, >>> channels=["#ctcp-s"]).build() >>> otherircbotbuilder = mkircbotbuilder(parent=fndccbotbuilder, >>> network="irc.subluminal.net") # because we want this whole network >>> otherbot = mkircbotbuilder(parent=otherircbotbuilder, >>> channels=["#programming"]).build() # to use DCC and channel DCC >>> >>> But this would be cleaner: >>> >>> botbuilder = >>> mkircbotbuilder().network("irc.freenode.net").port(6697).ssl(true) >>> mainbot = botbuilder.channels(["#bots"]).build() >>> botbuilder .= dcc(true).channeldcc(true) >>> dccbot = botbuilder.channels(["#ctcp-s"]).build() >>> botbuilder .= network("irc.subluminal.net") >>> otherbot = botbuilder.channels(["#programming"]).build() >> >> In Python you can save common options in a dict and pass them as >> var-keyword argument. Or use functools.partial. In any case you don't >> need a builder class with the build method and a number of configuring >> methods. It can be just a function with optional keyword parameters. >> >> A Builder pattern is often used in languages that don't support >> passing arguments by keyword and partial functions. Python rarely >> needs the purposed class implementing a Builder pattern. Actually a >> Builder pattern is built-in in the language as a part of syntax. >> > Yeah but the dotequals operator has many other benefits: > > long_name .= __call__ # cast to callable > long_name .= wrapped # unwrap > etc > > And it also looks neat. I don't see this an being a particular intuitive way of writing such rather uncommon constructs. The syntax is not clear (what if you have an expression on the RHS) and it doesn't save you much in writing (if long_name is too long simply rebind it under a shorter name for the purpose of the code block). Also note that rebinding different objects to the same name in the same block is often poor style and can easily lead to hard to track bugs. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jan 23 2017) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From p.f.moore at gmail.com Mon Jan 23 08:26:49 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 23 Jan 2017 13:26:49 +0000 Subject: [Python-ideas] "Immutable Builder" Pattern and Operator In-Reply-To: References: <2127feaa-0a5b-8246-978f-6346dea554a2@gmail.com> Message-ID: On 23 January 2017 at 13:05, Soni L. wrote: > Yeah but the dotequals operator has many other benefits: > > long_name .= __call__ # cast to callable > long_name .= wrapped # unwrap > etc Those don't seem particularly clear to me. > And it also looks neat. Well, we have to agree to differ on that one. Also, the semantics of the proposed operation are very odd. If I understand your proposal a .= b(c) doesn't evaluate b(c) (It can't, as b is a method of a and doesn't make sense on its own), but rather combines the LHS and RHS with a dot - so it's defined in terms of rewriting the input rather than as an operation on the subexpressions. There's no other operator in Python that I'm aware of that works like this. What grammar would you allow for the RHS? So far you've shown LHS .= METHOD(ARGS) LHS .= ATTRIBUTE Clearly, LHS .= EXPR makes no sense in general (consider a .= 1+1). On the other hand, what about LHS .= ATTRIBUTE[INDEX] ? I'm guessing you'd want that allowed? Frankly, I don't think the benefits are even close to justifying the complexity. Paul From fakedme+py at gmail.com Mon Jan 23 08:33:26 2017 From: fakedme+py at gmail.com (Soni L.) Date: Mon, 23 Jan 2017 11:33:26 -0200 Subject: [Python-ideas] "Immutable Builder" Pattern and Operator In-Reply-To: References: <2127feaa-0a5b-8246-978f-6346dea554a2@gmail.com> <0654a27e-7200-c468-d4eb-17bef13b61d2@egenix.com> Message-ID: ... I need a better email client. *double-checks I got everything right this time...* On 23/01/17 11:30 AM, Soni L. wrote: > Sorry, I replied to this wrong. Not used to this mailing list. > > On 23/01/17 11:28 AM, Soni L. wrote: >> >> >> On 23/01/17 11:18 AM, M.-A. Lemburg wrote: >>> On 23.01.2017 14:05, Soni L. wrote: >>>> >>>> On 23/01/17 09:45 AM, Serhiy Storchaka wrote: >>>>> On 23.01.17 01:30, Soni L. wrote: >>>>>> On 22/01/17 08:54 PM, Serhiy Storchaka wrote: >>>>>>> On 23.01.17 00:45, Soni L. wrote: >>>>>>>> I've been thinking of an Immutable Builder pattern and an operator >>>>>>>> to go >>>>>>>> with it. Since the builder would be immutable, this wouldn't work: >>>>>>>> >>>>>>>> long_name = mkbuilder() >>>>>>>> long_name.seta(a) >>>>>>>> long_name.setb(b) >>>>>>>> y = long_name.build() >>>>>>> I think the more pythonic way is: >>>>>>> >>>>>>> y = build(a=a, b=b) >>>>>>> >>>>>>> A Builder pattern is less used in Python due to the support of >>>>>>> keyword >>>>>>> arguments. >>>>>> I guess you could do something like this, for an IRC bot builder: >>>>>> >>>>>> fnircbotbuilder = mkircbotbuilder(network="irc.freenode.net", >>>>>> port=6697, >>>>>> ssl=true) >>>>>> mainbot = mkircbotbuilder(parent=fnircbotbuilder, # ??? >>>>>> channels=["#bots"]).build() >>>>>> fndccbotbuilder = mkircbotbuilder(parent=fnircbotbuilder, dcc=true, >>>>>> channeldcc=true) >>>>>> dccbot = mkircbotbuilder(parent=fndccbotbuilder, >>>>>> channels=["#ctcp-s"]).build() >>>>>> otherircbotbuilder = mkircbotbuilder(parent=fndccbotbuilder, >>>>>> network="irc.subluminal.net") # because we want this whole network >>>>>> otherbot = mkircbotbuilder(parent=otherircbotbuilder, >>>>>> channels=["#programming"]).build() # to use DCC and channel DCC >>>>>> >>>>>> But this would be cleaner: >>>>>> >>>>>> botbuilder = >>>>>> mkircbotbuilder().network("irc.freenode.net").port(6697).ssl(true) >>>>>> mainbot = botbuilder.channels(["#bots"]).build() >>>>>> botbuilder .= dcc(true).channeldcc(true) >>>>>> dccbot = botbuilder.channels(["#ctcp-s"]).build() >>>>>> botbuilder .= network("irc.subluminal.net") >>>>>> otherbot = botbuilder.channels(["#programming"]).build() >>>>> In Python you can save common options in a dict and pass them as >>>>> var-keyword argument. Or use functools.partial. In any case you don't >>>>> need a builder class with the build method and a number of >>>>> configuring >>>>> methods. It can be just a function with optional keyword parameters. >>>>> >>>>> A Builder pattern is often used in languages that don't support >>>>> passing arguments by keyword and partial functions. Python rarely >>>>> needs the purposed class implementing a Builder pattern. Actually a >>>>> Builder pattern is built-in in the language as a part of syntax. >>>>> >>>> Yeah but the dotequals operator has many other benefits: >>>> >>>> long_name .= __call__ # cast to callable >>>> long_name .= wrapped # unwrap >>>> etc >>>> >>>> And it also looks neat. >>> I don't see this an being a particular intuitive way of writing >>> such rather uncommon constructs. >>> >>> The syntax is not clear (what if you have an expression on the RHS) >>> and it doesn't save you much in writing (if long_name is too long >>> simply rebind it under a shorter name for the purpose of the code >>> block). >> >> It's literally sugar for repeating the name and moving the dot to the >> right. I think it's clearer than most other compound operators in >> that it doesn't affect precedence rules. >> >> `x += y`, for any code `y`, is equivalent to `x = x + (y)`, not `x = >> x + y`. >> >> `x .= y`, for any code `y`, is equivalent to `x = x . y`, not `x = x >> . (y)`. >> >>> >>> Also note that rebinding different objects to the same name >>> in the same block is often poor style and can easily lead to >>> hard to track bugs. >>> >> >> Rebinding different objects to the same name in rapid succession is >> fine. > From hervinhioslash at gmail.com Mon Jan 23 08:38:40 2017 From: hervinhioslash at gmail.com (=?UTF-8?B?SGVydsOpICJLeWxlIiBNVVRPTUJP?=) Date: Mon, 23 Jan 2017 14:38:40 +0100 Subject: [Python-ideas] Python-ideas Digest, Vol 122, Issue 81 In-Reply-To: References: Message-ID: Pleasing to see and somehow elegant. I believe .= is a good idea. On Jan 23, 2017 14:18, wrote: > Send Python-ideas mailing list submissions to > python-ideas at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/python-ideas > or, via email, send a message with subject or body 'help' to > python-ideas-request at python.org > > You can reach the person managing the list at > python-ideas-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Python-ideas digest..." > > > Today's Topics: > > 1. Re: "Immutable Builder" Pattern and Operator (Cory Benfield) > 2. Re: "Immutable Builder" Pattern and Operator (Paul Moore) > 3. Re: "Immutable Builder" Pattern and Operator (Serhiy Storchaka) > 4. Re: "Immutable Builder" Pattern and Operator (Soni L.) > 5. Re: "Immutable Builder" Pattern and Operator (M.-A. Lemburg) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 23 Jan 2017 09:32:02 +0000 > From: Cory Benfield > To: "Soni L." > Cc: python-ideas at python.org > Subject: Re: [Python-ideas] "Immutable Builder" Pattern and Operator > Message-ID: <8671EBCA-148F-41C8-A592-46EF653B9CAE at lukasa.co.uk> > Content-Type: text/plain; charset="utf-8" > > > > On 22 Jan 2017, at 22:45, Soni L. wrote: > > > > > > This pattern is present in the cryptography module already with things > like their x509.CertificateBuilder: https://cryptography.io/en/ > latest/x509/reference/#cryptography.x509.CertificateBuilder < > https://cryptography.io/en/latest/x509/reference/#cryptography.x509. > CertificateBuilder>. > > My 2c, but I find that code perfectly readable and legible. I don?t think > a dot-equals operator would be needed. > > Cory > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: attachments/20170123/c4e7d09f/attachment-0001.html> > > ------------------------------ > > Message: 2 > Date: Mon, 23 Jan 2017 09:54:55 +0000 > From: Paul Moore > To: "Soni L." > Cc: Python-Ideas > Subject: Re: [Python-ideas] "Immutable Builder" Pattern and Operator > Message-ID: > gmail.com> > Content-Type: text/plain; charset=UTF-8 > > On 22 January 2017 at 22:45, Soni L. wrote: > > I've been thinking of an Immutable Builder pattern and an operator to go > > with it. Since the builder would be immutable, this wouldn't work: > > > > long_name = mkbuilder() > > long_name.seta(a) > > long_name.setb(b) > > y = long_name.build() > > > > Instead, you'd need something more like this: > > > > long_name = mkbuilder() > > long_name = long_name.seta(a) > > long_name = long_name.setb(b) > > y = long_name.build() > > > > Or we could add an operator to simplify it: > > > > long_name = mkbuilder() > > long_name .= seta(a) > > long_name .= setb(b) > > y = long_name.build() > > > > (Yes, I'm aware you can x = mkbuilder().seta(a).setb(b), then y = > x.build(). > > But that doesn't work if you wanna "fork" the builder. Some builders, > like a > > builder for network connections of some sort, would work best if they > were > > immutable/forkable.) > > I don't think the .= operator adds enough to be worth it. If the > problem you see is the duplication of long_name in those lines (it's > difficult to be sure without a real example) then you can use a > temporary: > > b = mkbuilder() > b = b.seta(a) > b = b.setb(b) > long_name = b > y = long_name.build() > > For your real example: > > On 22 January 2017 at 23:30, Soni L. wrote: > > fnircbotbuilder = mkircbotbuilder(network="irc.freenode.net", port=6697, > > ssl=true) > > mainbot = mkircbotbuilder(parent=fnircbotbuilder, # ??? > > channels=["#bots"]).build() > > fndccbotbuilder = mkircbotbuilder(parent=fnircbotbuilder, dcc=true, > > channeldcc=true) > > dccbot = mkircbotbuilder(parent=fndccbotbuilder, > > channels=["#ctcp-s"]).build() > > otherircbotbuilder = mkircbotbuilder(parent=fndccbotbuilder, > > network="irc.subluminal.net") # because we want this whole network > > otherbot = mkircbotbuilder(parent=otherircbotbuilder, > > channels=["#programming"]).build() # to use DCC and channel DCC > > > > But this would be cleaner: > > > > botbuilder = > > mkircbotbuilder().network("irc.freenode.net").port(6697).ssl(true) > > mainbot = botbuilder.channels(["#bots"]).build() > > botbuilder .= dcc(true).channeldcc(true) > > dccbot = botbuilder.channels(["#ctcp-s"]).build() > > botbuilder .= network("irc.subluminal.net") > > otherbot = botbuilder.channels(["#programming"]).build() > > I don't find the second example appreciably cleaner than the first. > But a bit of reformatting looks better to me: > > # First create builders for the bots > fnircbotbuilder = mkircbotbuilder( > network="irc.freenode.net", > port=6697, > ssl=true) > fndccbotbuilder = mkircbotbuilder( > parent=fnircbotbuilder, > dcc=true, > channeldcc=true) > otherircbotbuilder = mkircbotbuilder( > parent=fndccbotbuilder, > network="irc.subluminal.net") > > # Now create the actual bots > mainbot = mkircbotbuilder( > parent=fnircbotbuilder, > channels=["#bots"]).build() > dccbot = mkircbotbuilder( > parent=fndccbotbuilder, > channels=["#ctcp-s"]).build() > otherbot = mkircbotbuilder( > parent=otherircbotbuilder, > channels=["#programming"]).build() > > And some API redesign (make the builders classes, and the parent > relationship becomes subclassing, and maybe make channels an argument > to build() so that you don't need fresh builders for each of the > actual bots, and you don't even need the "builder" in the name at this > point) makes the whole thing look far cleaner (to me, at least): > > class FNIRCBot(IRCBot): > network="irc.freenode.net" > port=6697 > ssl=True > class FNDCCBot(FNIRCBot): > dcc=True > channeldcc=True > class OtherIRCBot(IRCBot): > network="irc.subluminal.net" > > mainbot = FNIRCBot(channels=["#bots"]) > dccbot = FNDCCBot(channels=["#ctcp-s"]) > otherbot = OtherIRCBot(channels=["#programming"]) > > Paul > > > ------------------------------ > > Message: 3 > Date: Mon, 23 Jan 2017 13:45:18 +0200 > From: Serhiy Storchaka > To: python-ideas at python.org > Subject: Re: [Python-ideas] "Immutable Builder" Pattern and Operator > Message-ID: > Content-Type: text/plain; charset=windows-1252; format=flowed > > On 23.01.17 01:30, Soni L. wrote: > > On 22/01/17 08:54 PM, Serhiy Storchaka wrote: > >> On 23.01.17 00:45, Soni L. wrote: > >>> I've been thinking of an Immutable Builder pattern and an operator to > go > >>> with it. Since the builder would be immutable, this wouldn't work: > >>> > >>> long_name = mkbuilder() > >>> long_name.seta(a) > >>> long_name.setb(b) > >>> y = long_name.build() > >> > >> I think the more pythonic way is: > >> > >> y = build(a=a, b=b) > >> > >> A Builder pattern is less used in Python due to the support of keyword > >> arguments. > > > > I guess you could do something like this, for an IRC bot builder: > > > > fnircbotbuilder = mkircbotbuilder(network="irc.freenode.net", port=6697, > > ssl=true) > > mainbot = mkircbotbuilder(parent=fnircbotbuilder, # ??? > > channels=["#bots"]).build() > > fndccbotbuilder = mkircbotbuilder(parent=fnircbotbuilder, dcc=true, > > channeldcc=true) > > dccbot = mkircbotbuilder(parent=fndccbotbuilder, > > channels=["#ctcp-s"]).build() > > otherircbotbuilder = mkircbotbuilder(parent=fndccbotbuilder, > > network="irc.subluminal.net") # because we want this whole network > > otherbot = mkircbotbuilder(parent=otherircbotbuilder, > > channels=["#programming"]).build() # to use DCC and channel DCC > > > > But this would be cleaner: > > > > botbuilder = > > mkircbotbuilder().network("irc.freenode.net").port(6697).ssl(true) > > mainbot = botbuilder.channels(["#bots"]).build() > > botbuilder .= dcc(true).channeldcc(true) > > dccbot = botbuilder.channels(["#ctcp-s"]).build() > > botbuilder .= network("irc.subluminal.net") > > otherbot = botbuilder.channels(["#programming"]).build() > > In Python you can save common options in a dict and pass them as > var-keyword argument. Or use functools.partial. In any case you don't > need a builder class with the build method and a number of configuring > methods. It can be just a function with optional keyword parameters. > > A Builder pattern is often used in languages that don't support passing > arguments by keyword and partial functions. Python rarely needs the > purposed class implementing a Builder pattern. Actually a Builder > pattern is built-in in the language as a part of syntax. > > > > > ------------------------------ > > Message: 4 > Date: Mon, 23 Jan 2017 11:05:12 -0200 > From: "Soni L." > To: python-ideas at python.org > Subject: Re: [Python-ideas] "Immutable Builder" Pattern and Operator > Message-ID: > Content-Type: text/plain; charset=windows-1252; format=flowed > > > > On 23/01/17 09:45 AM, Serhiy Storchaka wrote: > > On 23.01.17 01:30, Soni L. wrote: > >> On 22/01/17 08:54 PM, Serhiy Storchaka wrote: > >>> On 23.01.17 00:45, Soni L. wrote: > >>>> I've been thinking of an Immutable Builder pattern and an operator > >>>> to go > >>>> with it. Since the builder would be immutable, this wouldn't work: > >>>> > >>>> long_name = mkbuilder() > >>>> long_name.seta(a) > >>>> long_name.setb(b) > >>>> y = long_name.build() > >>> > >>> I think the more pythonic way is: > >>> > >>> y = build(a=a, b=b) > >>> > >>> A Builder pattern is less used in Python due to the support of keyword > >>> arguments. > >> > >> I guess you could do something like this, for an IRC bot builder: > >> > >> fnircbotbuilder = mkircbotbuilder(network="irc.freenode.net", > port=6697, > >> ssl=true) > >> mainbot = mkircbotbuilder(parent=fnircbotbuilder, # ??? > >> channels=["#bots"]).build() > >> fndccbotbuilder = mkircbotbuilder(parent=fnircbotbuilder, dcc=true, > >> channeldcc=true) > >> dccbot = mkircbotbuilder(parent=fndccbotbuilder, > >> channels=["#ctcp-s"]).build() > >> otherircbotbuilder = mkircbotbuilder(parent=fndccbotbuilder, > >> network="irc.subluminal.net") # because we want this whole network > >> otherbot = mkircbotbuilder(parent=otherircbotbuilder, > >> channels=["#programming"]).build() # to use DCC and channel DCC > >> > >> But this would be cleaner: > >> > >> botbuilder = > >> mkircbotbuilder().network("irc.freenode.net").port(6697).ssl(true) > >> mainbot = botbuilder.channels(["#bots"]).build() > >> botbuilder .= dcc(true).channeldcc(true) > >> dccbot = botbuilder.channels(["#ctcp-s"]).build() > >> botbuilder .= network("irc.subluminal.net") > >> otherbot = botbuilder.channels(["#programming"]).build() > > > > In Python you can save common options in a dict and pass them as > > var-keyword argument. Or use functools.partial. In any case you don't > > need a builder class with the build method and a number of configuring > > methods. It can be just a function with optional keyword parameters. > > > > A Builder pattern is often used in languages that don't support > > passing arguments by keyword and partial functions. Python rarely > > needs the purposed class implementing a Builder pattern. Actually a > > Builder pattern is built-in in the language as a part of syntax. > > > Yeah but the dotequals operator has many other benefits: > > long_name .= __call__ # cast to callable > long_name .= wrapped # unwrap > etc > > And it also looks neat. > > > ------------------------------ > > Message: 5 > Date: Mon, 23 Jan 2017 14:18:18 +0100 > From: "M.-A. Lemburg" > To: "Soni L." , python-ideas at python.org > Subject: Re: [Python-ideas] "Immutable Builder" Pattern and Operator > Message-ID: <0654a27e-7200-c468-d4eb-17bef13b61d2 at egenix.com> > Content-Type: text/plain; charset=windows-1252 > > On 23.01.2017 14:05, Soni L. wrote: > > > > > > On 23/01/17 09:45 AM, Serhiy Storchaka wrote: > >> On 23.01.17 01:30, Soni L. wrote: > >>> On 22/01/17 08:54 PM, Serhiy Storchaka wrote: > >>>> On 23.01.17 00:45, Soni L. wrote: > >>>>> I've been thinking of an Immutable Builder pattern and an operator > >>>>> to go > >>>>> with it. Since the builder would be immutable, this wouldn't work: > >>>>> > >>>>> long_name = mkbuilder() > >>>>> long_name.seta(a) > >>>>> long_name.setb(b) > >>>>> y = long_name.build() > >>>> > >>>> I think the more pythonic way is: > >>>> > >>>> y = build(a=a, b=b) > >>>> > >>>> A Builder pattern is less used in Python due to the support of keyword > >>>> arguments. > >>> > >>> I guess you could do something like this, for an IRC bot builder: > >>> > >>> fnircbotbuilder = mkircbotbuilder(network="irc.freenode.net", > port=6697, > >>> ssl=true) > >>> mainbot = mkircbotbuilder(parent=fnircbotbuilder, # ??? > >>> channels=["#bots"]).build() > >>> fndccbotbuilder = mkircbotbuilder(parent=fnircbotbuilder, dcc=true, > >>> channeldcc=true) > >>> dccbot = mkircbotbuilder(parent=fndccbotbuilder, > >>> channels=["#ctcp-s"]).build() > >>> otherircbotbuilder = mkircbotbuilder(parent=fndccbotbuilder, > >>> network="irc.subluminal.net") # because we want this whole network > >>> otherbot = mkircbotbuilder(parent=otherircbotbuilder, > >>> channels=["#programming"]).build() # to use DCC and channel DCC > >>> > >>> But this would be cleaner: > >>> > >>> botbuilder = > >>> mkircbotbuilder().network("irc.freenode.net").port(6697).ssl(true) > >>> mainbot = botbuilder.channels(["#bots"]).build() > >>> botbuilder .= dcc(true).channeldcc(true) > >>> dccbot = botbuilder.channels(["#ctcp-s"]).build() > >>> botbuilder .= network("irc.subluminal.net") > >>> otherbot = botbuilder.channels(["#programming"]).build() > >> > >> In Python you can save common options in a dict and pass them as > >> var-keyword argument. Or use functools.partial. In any case you don't > >> need a builder class with the build method and a number of configuring > >> methods. It can be just a function with optional keyword parameters. > >> > >> A Builder pattern is often used in languages that don't support > >> passing arguments by keyword and partial functions. Python rarely > >> needs the purposed class implementing a Builder pattern. Actually a > >> Builder pattern is built-in in the language as a part of syntax. > >> > > Yeah but the dotequals operator has many other benefits: > > > > long_name .= __call__ # cast to callable > > long_name .= wrapped # unwrap > > etc > > > > And it also looks neat. > > I don't see this an being a particular intuitive way of writing > such rather uncommon constructs. > > The syntax is not clear (what if you have an expression on the RHS) > and it doesn't save you much in writing (if long_name is too long > simply rebind it under a shorter name for the purpose of the code > block). > > Also note that rebinding different objects to the same name > in the same block is often poor style and can easily lead to > hard to track bugs. > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Experts (#1, Jan 23 2017) > >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ > >>> Python Database Interfaces ... http://products.egenix.com/ > >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ > ________________________________________________________________________ > > ::: We implement business ideas - efficiently in both time and costs ::: > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > http://www.malemburg.com/ > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > > ------------------------------ > > End of Python-ideas Digest, Vol 122, Issue 81 > ********************************************* > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hervinhioslash at gmail.com Mon Jan 23 08:47:21 2017 From: hervinhioslash at gmail.com (=?UTF-8?B?SGVydsOpICJLeWxlIiBNVVRPTUJP?=) Date: Mon, 23 Jan 2017 14:47:21 +0100 Subject: [Python-ideas] Immutable Builder" Pattern and Operator In-Reply-To: References: Message-ID: Paul Moore is clearly right when He says that this "a .= 1+1" doesn't make sense. It means nothing understandable although in "a .= s(e)" can mean something. As a matter of fact "a .= EXPR" is bound to succeed only in a very small set of cases. On Jan 23, 2017 14:39, wrote: Send Python-ideas mailing list submissions to python-ideas at python.org To subscribe or unsubscribe via the World Wide Web, visit https://mail.python.org/mailman/listinfo/python-ideas or, via email, send a message with subject or body 'help' to python-ideas-request at python.org You can reach the person managing the list at python-ideas-owner at python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Python-ideas digest..." Today's Topics: 1. Re: "Immutable Builder" Pattern and Operator (Paul Moore) 2. Re: "Immutable Builder" Pattern and Operator (Soni L.) 3. Re: Python-ideas Digest, Vol 122, Issue 81 (Herv? Kyle MUTOMBO) ---------------------------------------------------------------------- Message: 1 Date: Mon, 23 Jan 2017 13:26:49 +0000 From: Paul Moore To: "Soni L." Cc: Python-Ideas Subject: Re: [Python-ideas] "Immutable Builder" Pattern and Operator Message-ID: Content-Type: text/plain; charset=UTF-8 On 23 January 2017 at 13:05, Soni L. wrote: > Yeah but the dotequals operator has many other benefits: > > long_name .= __call__ # cast to callable > long_name .= wrapped # unwrap > etc Those don't seem particularly clear to me. > And it also looks neat. Well, we have to agree to differ on that one. Also, the semantics of the proposed operation are very odd. If I understand your proposal a .= b(c) doesn't evaluate b(c) (It can't, as b is a method of a and doesn't make sense on its own), but rather combines the LHS and RHS with a dot - so it's defined in terms of rewriting the input rather than as an operation on the subexpressions. There's no other operator in Python that I'm aware of that works like this. What grammar would you allow for the RHS? So far you've shown LHS .= METHOD(ARGS) LHS .= ATTRIBUTE Clearly, LHS .= EXPR makes no sense in general (consider a .= 1+1). On the other hand, what about LHS .= ATTRIBUTE[INDEX] ? I'm guessing you'd want that allowed? Frankly, I don't think the benefits are even close to justifying the complexity. Paul ------------------------------ Message: 2 Date: Mon, 23 Jan 2017 11:33:26 -0200 From: "Soni L." To: "M.-A. Lemburg" Cc: "python-ideas at python.org" Subject: Re: [Python-ideas] "Immutable Builder" Pattern and Operator Message-ID: Content-Type: text/plain; charset=windows-1252; format=flowed ... I need a better email client. *double-checks I got everything right this time...* On 23/01/17 11:30 AM, Soni L. wrote: > Sorry, I replied to this wrong. Not used to this mailing list. > > On 23/01/17 11:28 AM, Soni L. wrote: >> >> >> On 23/01/17 11:18 AM, M.-A. Lemburg wrote: >>> On 23.01.2017 14:05, Soni L. wrote: >>>> >>>> On 23/01/17 09:45 AM, Serhiy Storchaka wrote: >>>>> On 23.01.17 01:30, Soni L. wrote: >>>>>> On 22/01/17 08:54 PM, Serhiy Storchaka wrote: >>>>>>> On 23.01.17 00:45, Soni L. wrote: >>>>>>>> I've been thinking of an Immutable Builder pattern and an operator >>>>>>>> to go >>>>>>>> with it. Since the builder would be immutable, this wouldn't work: >>>>>>>> >>>>>>>> long_name = mkbuilder() >>>>>>>> long_name.seta(a) >>>>>>>> long_name.setb(b) >>>>>>>> y = long_name.build() >>>>>>> I think the more pythonic way is: >>>>>>> >>>>>>> y = build(a=a, b=b) >>>>>>> >>>>>>> A Builder pattern is less used in Python due to the support of >>>>>>> keyword >>>>>>> arguments. >>>>>> I guess you could do something like this, for an IRC bot builder: >>>>>> >>>>>> fnircbotbuilder = mkircbotbuilder(network="irc.freenode.net", >>>>>> port=6697, >>>>>> ssl=true) >>>>>> mainbot = mkircbotbuilder(parent=fnircbotbuilder, # ??? >>>>>> channels=["#bots"]).build() >>>>>> fndccbotbuilder = mkircbotbuilder(parent=fnircbotbuilder, dcc=true, >>>>>> channeldcc=true) >>>>>> dccbot = mkircbotbuilder(parent=fndccbotbuilder, >>>>>> channels=["#ctcp-s"]).build() >>>>>> otherircbotbuilder = mkircbotbuilder(parent=fndccbotbuilder, >>>>>> network="irc.subluminal.net") # because we want this whole network >>>>>> otherbot = mkircbotbuilder(parent=otherircbotbuilder, >>>>>> channels=["#programming"]).build() # to use DCC and channel DCC >>>>>> >>>>>> But this would be cleaner: >>>>>> >>>>>> botbuilder = >>>>>> mkircbotbuilder().network("irc.freenode.net").port(6697).ssl(true) >>>>>> mainbot = botbuilder.channels(["#bots"]).build() >>>>>> botbuilder .= dcc(true).channeldcc(true) >>>>>> dccbot = botbuilder.channels(["#ctcp-s"]).build() >>>>>> botbuilder .= network("irc.subluminal.net") >>>>>> otherbot = botbuilder.channels(["#programming"]).build() >>>>> In Python you can save common options in a dict and pass them as >>>>> var-keyword argument. Or use functools.partial. In any case you don't >>>>> need a builder class with the build method and a number of >>>>> configuring >>>>> methods. It can be just a function with optional keyword parameters. >>>>> >>>>> A Builder pattern is often used in languages that don't support >>>>> passing arguments by keyword and partial functions. Python rarely >>>>> needs the purposed class implementing a Builder pattern. Actually a >>>>> Builder pattern is built-in in the language as a part of syntax. >>>>> >>>> Yeah but the dotequals operator has many other benefits: >>>> >>>> long_name .= __call__ # cast to callable >>>> long_name .= wrapped # unwrap >>>> etc >>>> >>>> And it also looks neat. >>> I don't see this an being a particular intuitive way of writing >>> such rather uncommon constructs. >>> >>> The syntax is not clear (what if you have an expression on the RHS) >>> and it doesn't save you much in writing (if long_name is too long >>> simply rebind it under a shorter name for the purpose of the code >>> block). >> >> It's literally sugar for repeating the name and moving the dot to the >> right. I think it's clearer than most other compound operators in >> that it doesn't affect precedence rules. >> >> `x += y`, for any code `y`, is equivalent to `x = x + (y)`, not `x = >> x + y`. >> >> `x .= y`, for any code `y`, is equivalent to `x = x . y`, not `x = x >> . (y)`. >> >>> >>> Also note that rebinding different objects to the same name >>> in the same block is often poor style and can easily lead to >>> hard to track bugs. >>> >> >> Rebinding different objects to the same name in rapid succession is >> fine. > ------------------------------ Message: 3 Date: Mon, 23 Jan 2017 14:38:40 +0100 From: Herv? "Kyle" MUTOMBO To: python-ideas at python.org Subject: Re: [Python-ideas] Python-ideas Digest, Vol 122, Issue 81 Message-ID: Content-Type: text/plain; charset="utf-8" Pleasing to see and somehow elegant. I believe .= is a good idea. On Jan 23, 2017 14:18, wrote: > Send Python-ideas mailing list submissions to > python-ideas at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/python-ideas > or, via email, send a message with subject or body 'help' to > python-ideas-request at python.org > > You can reach the person managing the list at > python-ideas-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Python-ideas digest..." > > > Today's Topics: > > 1. Re: "Immutable Builder" Pattern and Operator (Cory Benfield) > 2. Re: "Immutable Builder" Pattern and Operator (Paul Moore) > 3. Re: "Immutable Builder" Pattern and Operator (Serhiy Storchaka) > 4. Re: "Immutable Builder" Pattern and Operator (Soni L.) > 5. Re: "Immutable Builder" Pattern and Operator (M.-A. Lemburg) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 23 Jan 2017 09:32:02 +0000 > From: Cory Benfield > To: "Soni L." > Cc: python-ideas at python.org > Subject: Re: [Python-ideas] "Immutable Builder" Pattern and Operator > Message-ID: <8671EBCA-148F-41C8-A592-46EF653B9CAE at lukasa.co.uk> > Content-Type: text/plain; charset="utf-8" > > > > On 22 Jan 2017, at 22:45, Soni L. wrote: > > > > > > This pattern is present in the cryptography module already with things > like their x509.CertificateBuilder: https://cryptography.io/en/ > latest/x509/reference/#cryptography.x509.CertificateBuilder < > https://cryptography.io/en/latest/x509/reference/#cryptography.x509. > CertificateBuilder>. > > My 2c, but I find that code perfectly readable and legible. I don?t think > a dot-equals operator would be needed. > > Cory > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: attachments/20170123/c4e7d09f/attachment-0001.html> > > ------------------------------ > > Message: 2 > Date: Mon, 23 Jan 2017 09:54:55 +0000 > From: Paul Moore > To: "Soni L." > Cc: Python-Ideas > Subject: Re: [Python-ideas] "Immutable Builder" Pattern and Operator > Message-ID: > gmail.com> > Content-Type: text/plain; charset=UTF-8 > > On 22 January 2017 at 22:45, Soni L. wrote: > > I've been thinking of an Immutable Builder pattern and an operator to go > > with it. Since the builder would be immutable, this wouldn't work: > > > > long_name = mkbuilder() > > long_name.seta(a) > > long_name.setb(b) > > y = long_name.build() > > > > Instead, you'd need something more like this: > > > > long_name = mkbuilder() > > long_name = long_name.seta(a) > > long_name = long_name.setb(b) > > y = long_name.build() > > > > Or we could add an operator to simplify it: > > > > long_name = mkbuilder() > > long_name .= seta(a) > > long_name .= setb(b) > > y = long_name.build() > > > > (Yes, I'm aware you can x = mkbuilder().seta(a).setb(b), then y = > x.build(). > > But that doesn't work if you wanna "fork" the builder. Some builders, > like a > > builder for network connections of some sort, would work best if they > were > > immutable/forkable.) > > I don't think the .= operator adds enough to be worth it. If the > problem you see is the duplication of long_name in those lines (it's > difficult to be sure without a real example) then you can use a > temporary: > > b = mkbuilder() > b = b.seta(a) > b = b.setb(b) > long_name = b > y = long_name.build() > > For your real example: > > On 22 January 2017 at 23:30, Soni L. wrote: > > fnircbotbuilder = mkircbotbuilder(network="irc.freenode.net", port=6697, > > ssl=true) > > mainbot = mkircbotbuilder(parent=fnircbotbuilder, # ??? > > channels=["#bots"]).build() > > fndccbotbuilder = mkircbotbuilder(parent=fnircbotbuilder, dcc=true, > > channeldcc=true) > > dccbot = mkircbotbuilder(parent=fndccbotbuilder, > > channels=["#ctcp-s"]).build() > > otherircbotbuilder = mkircbotbuilder(parent=fndccbotbuilder, > > network="irc.subluminal.net") # because we want this whole network > > otherbot = mkircbotbuilder(parent=otherircbotbuilder, > > channels=["#programming"]).build() # to use DCC and channel DCC > > > > But this would be cleaner: > > > > botbuilder = > > mkircbotbuilder().network("irc.freenode.net").port(6697).ssl(true) > > mainbot = botbuilder.channels(["#bots"]).build() > > botbuilder .= dcc(true).channeldcc(true) > > dccbot = botbuilder.channels(["#ctcp-s"]).build() > > botbuilder .= network("irc.subluminal.net") > > otherbot = botbuilder.channels(["#programming"]).build() > > I don't find the second example appreciably cleaner than the first. > But a bit of reformatting looks better to me: > > # First create builders for the bots > fnircbotbuilder = mkircbotbuilder( > network="irc.freenode.net", > port=6697, > ssl=true) > fndccbotbuilder = mkircbotbuilder( > parent=fnircbotbuilder, > dcc=true, > channeldcc=true) > otherircbotbuilder = mkircbotbuilder( > parent=fndccbotbuilder, > network="irc.subluminal.net") > > # Now create the actual bots > mainbot = mkircbotbuilder( > parent=fnircbotbuilder, > channels=["#bots"]).build() > dccbot = mkircbotbuilder( > parent=fndccbotbuilder, > channels=["#ctcp-s"]).build() > otherbot = mkircbotbuilder( > parent=otherircbotbuilder, > channels=["#programming"]).build() > > And some API redesign (make the builders classes, and the parent > relationship becomes subclassing, and maybe make channels an argument > to build() so that you don't need fresh builders for each of the > actual bots, and you don't even need the "builder" in the name at this > point) makes the whole thing look far cleaner (to me, at least): > > class FNIRCBot(IRCBot): > network="irc.freenode.net" > port=6697 > ssl=True > class FNDCCBot(FNIRCBot): > dcc=True > channeldcc=True > class OtherIRCBot(IRCBot): > network="irc.subluminal.net" > > mainbot = FNIRCBot(channels=["#bots"]) > dccbot = FNDCCBot(channels=["#ctcp-s"]) > otherbot = OtherIRCBot(channels=["#programming"]) > > Paul > > > ------------------------------ > > Message: 3 > Date: Mon, 23 Jan 2017 13:45:18 +0200 > From: Serhiy Storchaka > To: python-ideas at python.org > Subject: Re: [Python-ideas] "Immutable Builder" Pattern and Operator > Message-ID: > Content-Type: text/plain; charset=windows-1252; format=flowed > > On 23.01.17 01:30, Soni L. wrote: > > On 22/01/17 08:54 PM, Serhiy Storchaka wrote: > >> On 23.01.17 00:45, Soni L. wrote: > >>> I've been thinking of an Immutable Builder pattern and an operator to > go > >>> with it. Since the builder would be immutable, this wouldn't work: > >>> > >>> long_name = mkbuilder() > >>> long_name.seta(a) > >>> long_name.setb(b) > >>> y = long_name.build() > >> > >> I think the more pythonic way is: > >> > >> y = build(a=a, b=b) > >> > >> A Builder pattern is less used in Python due to the support of keyword > >> arguments. > > > > I guess you could do something like this, for an IRC bot builder: > > > > fnircbotbuilder = mkircbotbuilder(network="irc.freenode.net", port=6697, > > ssl=true) > > mainbot = mkircbotbuilder(parent=fnircbotbuilder, # ??? > > channels=["#bots"]).build() > > fndccbotbuilder = mkircbotbuilder(parent=fnircbotbuilder, dcc=true, > > channeldcc=true) > > dccbot = mkircbotbuilder(parent=fndccbotbuilder, > > channels=["#ctcp-s"]).build() > > otherircbotbuilder = mkircbotbuilder(parent=fndccbotbuilder, > > network="irc.subluminal.net") # because we want this whole network > > otherbot = mkircbotbuilder(parent=otherircbotbuilder, > > channels=["#programming"]).build() # to use DCC and channel DCC > > > > But this would be cleaner: > > > > botbuilder = > > mkircbotbuilder().network("irc.freenode.net").port(6697).ssl(true) > > mainbot = botbuilder.channels(["#bots"]).build() > > botbuilder .= dcc(true).channeldcc(true) > > dccbot = botbuilder.channels(["#ctcp-s"]).build() > > botbuilder .= network("irc.subluminal.net") > > otherbot = botbuilder.channels(["#programming"]).build() > > In Python you can save common options in a dict and pass them as > var-keyword argument. Or use functools.partial. In any case you don't > need a builder class with the build method and a number of configuring > methods. It can be just a function with optional keyword parameters. > > A Builder pattern is often used in languages that don't support passing > arguments by keyword and partial functions. Python rarely needs the > purposed class implementing a Builder pattern. Actually a Builder > pattern is built-in in the language as a part of syntax. > > > > > ------------------------------ > > Message: 4 > Date: Mon, 23 Jan 2017 11:05:12 -0200 > From: "Soni L." > To: python-ideas at python.org > Subject: Re: [Python-ideas] "Immutable Builder" Pattern and Operator > Message-ID: > Content-Type: text/plain; charset=windows-1252; format=flowed > > > > On 23/01/17 09:45 AM, Serhiy Storchaka wrote: > > On 23.01.17 01:30, Soni L. wrote: > >> On 22/01/17 08:54 PM, Serhiy Storchaka wrote: > >>> On 23.01.17 00:45, Soni L. wrote: > >>>> I've been thinking of an Immutable Builder pattern and an operator > >>>> to go > >>>> with it. Since the builder would be immutable, this wouldn't work: > >>>> > >>>> long_name = mkbuilder() > >>>> long_name.seta(a) > >>>> long_name.setb(b) > >>>> y = long_name.build() > >>> > >>> I think the more pythonic way is: > >>> > >>> y = build(a=a, b=b) > >>> > >>> A Builder pattern is less used in Python due to the support of keyword > >>> arguments. > >> > >> I guess you could do something like this, for an IRC bot builder: > >> > >> fnircbotbuilder = mkircbotbuilder(network="irc.freenode.net", > port=6697, > >> ssl=true) > >> mainbot = mkircbotbuilder(parent=fnircbotbuilder, # ??? > >> channels=["#bots"]).build() > >> fndccbotbuilder = mkircbotbuilder(parent=fnircbotbuilder, dcc=true, > >> channeldcc=true) > >> dccbot = mkircbotbuilder(parent=fndccbotbuilder, > >> channels=["#ctcp-s"]).build() > >> otherircbotbuilder = mkircbotbuilder(parent=fndccbotbuilder, > >> network="irc.subluminal.net") # because we want this whole network > >> otherbot = mkircbotbuilder(parent=otherircbotbuilder, > >> channels=["#programming"]).build() # to use DCC and channel DCC > >> > >> But this would be cleaner: > >> > >> botbuilder = > >> mkircbotbuilder().network("irc.freenode.net").port(6697).ssl(true) > >> mainbot = botbuilder.channels(["#bots"]).build() > >> botbuilder .= dcc(true).channeldcc(true) > >> dccbot = botbuilder.channels(["#ctcp-s"]).build() > >> botbuilder .= network("irc.subluminal.net") > >> otherbot = botbuilder.channels(["#programming"]).build() > > > > In Python you can save common options in a dict and pass them as > > var-keyword argument. Or use functools.partial. In any case you don't > > need a builder class with the build method and a number of configuring > > methods. It can be just a function with optional keyword parameters. > > > > A Builder pattern is often used in languages that don't support > > passing arguments by keyword and partial functions. Python rarely > > needs the purposed class implementing a Builder pattern. Actually a > > Builder pattern is built-in in the language as a part of syntax. > > > Yeah but the dotequals operator has many other benefits: > > long_name .= __call__ # cast to callable > long_name .= wrapped # unwrap > etc > > And it also looks neat. > > > ------------------------------ > > Message: 5 > Date: Mon, 23 Jan 2017 14:18:18 +0100 > From: "M.-A. Lemburg" > To: "Soni L." , python-ideas at python.org > Subject: Re: [Python-ideas] "Immutable Builder" Pattern and Operator > Message-ID: <0654a27e-7200-c468-d4eb-17bef13b61d2 at egenix.com> > Content-Type: text/plain; charset=windows-1252 > > On 23.01.2017 14:05, Soni L. wrote: > > > > > > On 23/01/17 09:45 AM, Serhiy Storchaka wrote: > >> On 23.01.17 01:30, Soni L. wrote: > >>> On 22/01/17 08:54 PM, Serhiy Storchaka wrote: > >>>> On 23.01.17 00:45, Soni L. wrote: > >>>>> I've been thinking of an Immutable Builder pattern and an operator > >>>>> to go > >>>>> with it. Since the builder would be immutable, this wouldn't work: > >>>>> > >>>>> long_name = mkbuilder() > >>>>> long_name.seta(a) > >>>>> long_name.setb(b) > >>>>> y = long_name.build() > >>>> > >>>> I think the more pythonic way is: > >>>> > >>>> y = build(a=a, b=b) > >>>> > >>>> A Builder pattern is less used in Python due to the support of keyword > >>>> arguments. > >>> > >>> I guess you could do something like this, for an IRC bot builder: > >>> > >>> fnircbotbuilder = mkircbotbuilder(network="irc.freenode.net", > port=6697, > >>> ssl=true) > >>> mainbot = mkircbotbuilder(parent=fnircbotbuilder, # ??? > >>> channels=["#bots"]).build() > >>> fndccbotbuilder = mkircbotbuilder(parent=fnircbotbuilder, dcc=true, > >>> channeldcc=true) > >>> dccbot = mkircbotbuilder(parent=fndccbotbuilder, > >>> channels=["#ctcp-s"]).build() > >>> otherircbotbuilder = mkircbotbuilder(parent=fndccbotbuilder, > >>> network="irc.subluminal.net") # because we want this whole network > >>> otherbot = mkircbotbuilder(parent=otherircbotbuilder, > >>> channels=["#programming"]).build() # to use DCC and channel DCC > >>> > >>> But this would be cleaner: > >>> > >>> botbuilder = > >>> mkircbotbuilder().network("irc.freenode.net").port(6697).ssl(true) > >>> mainbot = botbuilder.channels(["#bots"]).build() > >>> botbuilder .= dcc(true).channeldcc(true) > >>> dccbot = botbuilder.channels(["#ctcp-s"]).build() > >>> botbuilder .= network("irc.subluminal.net") > >>> otherbot = botbuilder.channels(["#programming"]).build() > >> > >> In Python you can save common options in a dict and pass them as > >> var-keyword argument. Or use functools.partial. In any case you don't > >> need a builder class with the build method and a number of configuring > >> methods. It can be just a function with optional keyword parameters. > >> > >> A Builder pattern is often used in languages that don't support > >> passing arguments by keyword and partial functions. Python rarely > >> needs the purposed class implementing a Builder pattern. Actually a > >> Builder pattern is built-in in the language as a part of syntax. > >> > > Yeah but the dotequals operator has many other benefits: > > > > long_name .= __call__ # cast to callable > > long_name .= wrapped # unwrap > > etc > > > > And it also looks neat. > > I don't see this an being a particular intuitive way of writing > such rather uncommon constructs. > > The syntax is not clear (what if you have an expression on the RHS) > and it doesn't save you much in writing (if long_name is too long > simply rebind it under a shorter name for the purpose of the code > block). > > Also note that rebinding different objects to the same name > in the same block is often poor style and can easily lead to > hard to track bugs. > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Experts (#1, Jan 23 2017) > >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ > >>> Python Database Interfaces ... http://products.egenix.com/ > >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ > ________________________________________________________________________ > > ::: We implement business ideas - efficiently in both time and costs ::: > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > http://www.malemburg.com/ > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > > ------------------------------ > > End of Python-ideas Digest, Vol 122, Issue 81 > ********************************************* > -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Subject: Digest Footer _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas ------------------------------ End of Python-ideas Digest, Vol 122, Issue 82 ********************************************* -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Mon Jan 23 08:54:19 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 24 Jan 2017 00:54:19 +1100 Subject: [Python-ideas] "Immutable Builder" Pattern and Operator In-Reply-To: <5ff82f14-b0b8-9838-4924-9d73c6366f9a@gmail.com> References: <5ff82f14-b0b8-9838-4924-9d73c6366f9a@gmail.com> Message-ID: <20170123135418.GF7345@ando.pearwood.info> On Mon, Jan 23, 2017 at 12:49:19AM -0200, Soni L. wrote: [...] > You seem to be thinking of "immutable object builder". Not "the builder > itself is immutable and operations on it create new builders". Why would you make a builder class immutable? That's not a rhetorical question -- I'm genuinely surprised that you're not only using the builder design pattern (there are usually better solutions in Python) but more so that you're making the builder itself immutable. In any case, it seems that this is such a narrow and unusual use-case that it wouldn't be wise to give it special syntax. Especially when there are other, potentially far more useful, uses for the .= syntax, e.g. Julia's syntactic loop fusion: http://julialang.org/blog/2017/01/moredots -- Steve From fakedme+py at gmail.com Mon Jan 23 09:12:24 2017 From: fakedme+py at gmail.com (Soni L.) Date: Mon, 23 Jan 2017 12:12:24 -0200 Subject: [Python-ideas] "Immutable Builder" Pattern and Operator In-Reply-To: <20170123135418.GF7345@ando.pearwood.info> References: <5ff82f14-b0b8-9838-4924-9d73c6366f9a@gmail.com> <20170123135418.GF7345@ando.pearwood.info> Message-ID: <356673bc-4f42-4426-7554-071361b40b32@gmail.com> On 23/01/17 11:54 AM, Steven D'Aprano wrote: > On Mon, Jan 23, 2017 at 12:49:19AM -0200, Soni L. wrote: > [...] >> You seem to be thinking of "immutable object builder". Not "the builder >> itself is immutable and operations on it create new builders". > Why would you make a builder class immutable? Builders for network connections where you don't wanna start with a fresh builder every time. > > That's not a rhetorical question -- I'm genuinely surprised that you're > not only using the builder design pattern (there are usually better > solutions in Python) but more so that you're making the builder itself > immutable. > > In any case, it seems that this is such a narrow and unusual use-case > that it wouldn't be wise to give it special syntax. Especially when > there are other, potentially far more useful, uses for the .= syntax, > e.g. Julia's syntactic loop fusion: > > http://julialang.org/blog/2017/01/moredots > It is far more useful only because it's not just a syntax sugar. It's more like a completely new, standalone operator. Which, IMO, makes it more confusing. I propose `x .= y` -> `x = x . y`, for any `y`. You propose `x .= y` -> `operate_on(x).with(lambda: y)` This feels like you're arguing for loops are more useful than multiplication, and thus we shouldn't have multiplication. > > From random832 at fastmail.com Mon Jan 23 09:25:48 2017 From: random832 at fastmail.com (Random832) Date: Mon, 23 Jan 2017 09:25:48 -0500 Subject: [Python-ideas] "Immutable Builder" Pattern and Operator In-Reply-To: <356673bc-4f42-4426-7554-071361b40b32@gmail.com> References: <5ff82f14-b0b8-9838-4924-9d73c6366f9a@gmail.com> <20170123135418.GF7345@ando.pearwood.info> <356673bc-4f42-4426-7554-071361b40b32@gmail.com> Message-ID: <1485181548.715706.856562624.0C7A3F03@webmail.messagingengine.com> On Mon, Jan 23, 2017, at 09:12, Soni L. wrote: > Builders for network connections where you don't wanna start with a > fresh builder every time. Maybe you need a builder builder. Or, more seriously, a way to differentiate the things in the 'builder' from the things that are going to be different with each connection, and pass them separately in the connection's constructor. Or a clone method. > It is far more useful only because it's not just a syntax sugar. It's > more like a completely new, standalone operator. > Which, IMO, makes it more confusing. > > I propose `x .= y` -> `x = x . y`, for any `y`. I think you're underestimating the extent to which the fact that "y" isn't a real expression will cause confusion. From p.f.moore at gmail.com Mon Jan 23 09:53:25 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 23 Jan 2017 14:53:25 +0000 Subject: [Python-ideas] Immutable Builder" Pattern and Operator In-Reply-To: References: Message-ID: On 23 January 2017 at 13:47, Herv? "Kyle" MUTOMBO wrote: > Paul Moore is clearly right when He says that this "a .= 1+1" doesn't make > sense. It means nothing understandable although in "a .= s(e)" can mean > something. As a matter of fact "a .= EXPR" is bound to succeed only in a > very small set of cases. By responding to a digest you make it very hard to see what you're replying to. Could you get the messages as individual ones, and reply quoting the context properly, please? I'm not sure how to interpret your above comment in the light of your other comment > Pleasing to see and somehow elegant. I believe .= is a good idea. as I'm arguing pretty strongly that .= is *not* a good idea, because there are all sorts of ill-defined cases that haven't been clearly explained in a way that matches the rest of Python's grammar and semantics. Paul From wes.turner at gmail.com Mon Jan 23 10:48:25 2017 From: wes.turner at gmail.com (Wes Turner) Date: Mon, 23 Jan 2017 09:48:25 -0600 Subject: [Python-ideas] "Immutable Builder" Pattern and Operator In-Reply-To: References: <5ff82f14-b0b8-9838-4924-9d73c6366f9a@gmail.com> Message-ID: On Sunday, January 22, 2017, Wes Turner wrote: > > > On Sunday, January 22, 2017, Soni L. > wrote: > >> >> >> On 22/01/17 10:03 PM, Wes Turner wrote: >> >> >> >> On Sunday, January 22, 2017, Wes Turner wrote: >> >>> Have you looked at pyrsistent for immutable/functional/persistent/copy-on-write >>> data structures in Python? >>> >>> https://github.com/tobgu/pyrsistent/ >>> >>> (freeze() / thaw()) >>> >>> ... e.g. List and Dict NamedTuple values are not immutable (because >>> append() and update() still work) >>> >> >> fn.py also has immutables: >> https://github.com/kachayev/fn.py/blob/master/README.rst#per >> sistent-data-structures >> >> >> You seem to be thinking of "immutable object builder". Not "the builder >> itself is immutable and operations on it create new builders". >> > > My mistake. > Something like @optionable and/or @curried from fn.py in conjunction with > PClass from pyrsistent may accomplish what you describe? > From http://pyrsistent.readthedocs.io/en/latest/api.html#pyrsistent.PClass.set : "Set a field in the instance. Returns a new instance with the updated value. The original instance remains unmodified. Accepts key-value pairs or single string representing the field name and a value." > >> >> >> >> >>> >>> On Sunday, January 22, 2017, Soni L. wrote: >>> >>>> I've been thinking of an Immutable Builder pattern and an operator to >>>> go with it. Since the builder would be immutable, this wouldn't work: >>>> >>>> long_name = mkbuilder() >>>> long_name.seta(a) >>>> long_name.setb(b) >>>> y = long_name.build() >>>> >>>> Instead, you'd need something more like this: >>>> >>>> long_name = mkbuilder() >>>> long_name = long_name.seta(a) >>>> long_name = long_name.setb(b) >>>> y = long_name.build() >>>> >>>> Or we could add an operator to simplify it: >>>> >>>> long_name = mkbuilder() >>>> long_name .= seta(a) >>>> long_name .= setb(b) >>>> y = long_name.build() >>>> >>>> (Yes, I'm aware you can x = mkbuilder().seta(a).setb(b), then y = >>>> x.build(). But that doesn't work if you wanna "fork" the builder. Some >>>> builders, like a builder for network connections of some sort, would work >>>> best if they were immutable/forkable.) >>>> _______________________________________________ >>>> Python-ideas mailing list >>>> Python-ideas at python.org >>>> https://mail.python.org/mailman/listinfo/python-ideas >>>> Code of Conduct: http://python.org/psf/codeofconduct/ >>>> >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Mon Jan 23 10:51:58 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 23 Jan 2017 07:51:58 -0800 Subject: [Python-ideas] "Immutable Builder" Pattern and Operator In-Reply-To: References: <2127feaa-0a5b-8246-978f-6346dea554a2@gmail.com> <0654a27e-7200-c468-d4eb-17bef13b61d2@egenix.com> Message-ID: <5886269E.1080507@stoneleaf.us> On 01/23/2017 05:33 AM, Soni L. wrote: > It's literally sugar for repeating the name and moving the dot to the right. > I think it's clearer than most other compound operators in that it doesn't > affect precedence rules. > > `x += y`, for any code `y`, is equivalent to `x = x + (y)`, not `x = x + y`. > > `x .= y`, for any code `y`, is equivalent to `x = x . y`, not `x = x . (y)`. This is not an improvement. -- ~Ethan~ From gerald.britton at gmail.com Mon Jan 23 10:52:25 2017 From: gerald.britton at gmail.com (Gerald Britton) Date: Mon, 23 Jan 2017 10:52:25 -0500 Subject: [Python-ideas] "Immutable Builder" Pattern and Operator In-Reply-To: References: Message-ID: [snip] >I propose `x .= y` -> `x = x . y`, for any `y`. [snip] I think you mean "any y that is a member of x" Also, note that this syntax means that x will be rebound to the result of calling x.y, whatever that is (frequently, None, for mutating methods) In general, you can't count on methods to return references to their instances, even though it's handy for fluent coding, so this side effect may be unexpected to some That's a problem with your original example: >long_name = mkbuilder() >long_name = long_name.seta(a) >long_name = long_name.setb(b) >y = long_name.build() What do the methods seta and setb return? If they don't return "self" you've got a problem. I think. FWIW why can't you just write: x.y or for your example: long_name.seta(a) ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From fakedme+py at gmail.com Mon Jan 23 11:07:50 2017 From: fakedme+py at gmail.com (Soni L.) Date: Mon, 23 Jan 2017 14:07:50 -0200 Subject: [Python-ideas] "Immutable Builder" Pattern and Operator In-Reply-To: References: Message-ID: <9bec65d9-a1ae-a0b7-2f4a-cb7db48e362f@gmail.com> On 23/01/17 01:52 PM, Gerald Britton wrote: > > > > [snip] > > >I propose `x .= y` -> `x = x . y`, for any `y`. > > [snip] > > I think you mean "any y that is a member of x" > Since it desugars into `x = x.y`, you can literally use anything for `y`. x .= __call__().whatever().unwrap() * 3 is equivalent to x = x.__call__().whatever().unwrap() * 3 and x .= 1 is equivalent to x = x.1 which is equivalent to SyntaxError: invalid syntax > Also, note that this syntax means that x will be rebound to the > result of calling x.y, whatever that is (frequently, None, for > mutating methods) > > In general, you can't count on methods to return references to > their instances, even though it's handy for fluent coding, so this > side effect may be unexpected to some > This is why it's for use with **immutable** objects. > That's a problem with your original example: > > >long_name = mkbuilder() > > >long_name = long_name.seta(a) > > >long_name = long_name.setb(b) > > >y = long_name.build() > > What do the methods seta and setb return? If they don't return > "self" you've got a problem. I think. > They don't return self. Ever. The value bound to long_name is immutable, like an integer. They return a new instance. > FWIW why can't you just write: > > x.y > > or for your example: > > long_name.seta(a) > > ? > > See the IRC bot builder example, it should be more clear. (It's about forking the builder.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Mon Jan 23 11:23:14 2017 From: guido at python.org (Guido van Rossum) Date: Mon, 23 Jan 2017 08:23:14 -0800 Subject: [Python-ideas] "Immutable Builder" Pattern and Operator In-Reply-To: <9bec65d9-a1ae-a0b7-2f4a-cb7db48e362f@gmail.com> References: <9bec65d9-a1ae-a0b7-2f4a-cb7db48e362f@gmail.com> Message-ID: On Mon, Jan 23, 2017 at 8:07 AM, Soni L. wrote: > > > Since it desugars into `x = x.y`, you can literally use anything for `y`. > > x .= __call__().whatever().unwrap() * 3 > > is equivalent to > > x = x.__call__().whatever().unwrap() * 3 > > and > > x .= 1 > > is equivalent to > > x = x.1 > > which is equivalent to > > SyntaxError: invalid syntax > And that's exactly the problem. Users would be greatly confused because what's to the right of `.=` is *not* an expression, it's something more restricted (in particular it must start with a plain identifier). This makes the `.=` operator a very different beast from `=`, `+=` and friends. I assume you think that's fine, but given your cavalier attitude about `x .= 1` my feeling is that you don't have a lot of experience designing and implementing language features. That is okay, you are learning it here. But perhaps you should take the hint from the large number of people here who have gently tried to explain to you that while this is a good idea, it's not a great idea, and there's no sufficiently important use case to make up for the confusion (indicated above) that it will inevitably cause. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From gerald.britton at gmail.com Mon Jan 23 11:56:51 2017 From: gerald.britton at gmail.com (Gerald Britton) Date: Mon, 23 Jan 2017 11:56:51 -0500 Subject: [Python-ideas] "Immutable Builder" Pattern and Operator In-Reply-To: <9bec65d9-a1ae-a0b7-2f4a-cb7db48e362f@gmail.com> References: <9bec65d9-a1ae-a0b7-2f4a-cb7db48e362f@gmail.com> Message-ID: On Jan 23, 2017 11:07 AM, "Soni L." wrote: On 23/01/17 01:52 PM, Gerald Britton wrote: [snip] >I propose `x .= y` -> `x = x . y`, for any `y`. [snip] I think you mean "any y that is a member of x" Since it desugars into `x = x.y`, you can literally use anything for `y`. x .= __call__().whatever().unwrap() * 3 is equivalent to x = x.__call__().whatever().unwrap() * 3 and x .= 1 is equivalent to x = x.1 which is equivalent to SyntaxError: invalid syntax Also, note that this syntax means that x will be rebound to the result of calling x.y, whatever that is (frequently, None, for mutating methods) In general, you can't count on methods to return references to their instances, even though it's handy for fluent coding, so this side effect may be unexpected to some This is why it's for use with **immutable** objects. That's a problem with your original example: >long_name = mkbuilder() >long_name = long_name.seta(a) >long_name = long_name.setb(b) >y = long_name.build() What do the methods seta and setb return? If they don't return "self" you've got a problem. I think. They don't return self. Ever. The value bound to long_name is immutable, like an integer. They return a new instance. Then long_name isn't immutable. It changes with every line. That can lead to nasty bugs if you count on its immutability. Easy to see. Just print long_name after each call. FWIW why can't you just write: x.y or for your example: long_name.seta(a) ? See the IRC bot builder example, it should be more clear. (It's about forking the builder.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From fakedme+py at gmail.com Mon Jan 23 12:54:05 2017 From: fakedme+py at gmail.com (Soni L.) Date: Mon, 23 Jan 2017 15:54:05 -0200 Subject: [Python-ideas] "Immutable Builder" Pattern and Operator In-Reply-To: References: <9bec65d9-a1ae-a0b7-2f4a-cb7db48e362f@gmail.com> Message-ID: On 23/01/17 02:56 PM, Gerald Britton wrote: > > > On Jan 23, 2017 11:07 AM, "Soni L." > wrote: > > > > On 23/01/17 01:52 PM, Gerald Britton wrote: >> >> >> >> [snip] >> >> >I propose `x .= y` -> `x = x . y`, for any `y`. >> >> [snip] >> >> I think you mean "any y that is a member of x" >> > > Since it desugars into `x = x.y`, you can literally use anything > for `y`. > > x .= __call__().whatever().unwrap() * 3 > > is equivalent to > > x = x.__call__().whatever().unwrap() * 3 > > and > > x .= 1 > > is equivalent to > > x = x.1 > > which is equivalent to > > SyntaxError: invalid syntax > > >> Also, note that this syntax means that x will be rebound to >> the result of calling x.y, whatever that is (frequently, >> None, for mutating methods) >> >> In general, you can't count on methods to return references >> to their instances, even though it's handy for fluent coding, >> so this side effect may be unexpected to some >> > > This is why it's for use with **immutable** objects. > > >> That's a problem with your original example: >> >> >long_name = mkbuilder() >> >> >long_name = long_name.seta(a) >> >> >long_name = long_name.setb(b) >> >> >y = long_name.build() >> >> What do the methods seta and setb return? If they don't >> return "self" you've got a problem. I think. >> > > They don't return self. Ever. The value bound to long_name is > immutable, like an integer. They return a new instance. > > > Then long_name isn't immutable. It changes with every line. That can > lead to nasty bugs if you count on its immutability. > > Easy to see. Just print long_name after each call. You're mixing up value immutability with name immutability. The name isn't immutable, but: long_name = mkbuilder() x = long_name long_name .= seta("a") y = long_name long_name .= setb("b") z = long_name print(x) # a = None, b = None print(y) # a = "a", b = None print(z) # a = "a", b = "b" print(x is y) # False print(x is z) # False print(y is z) # False print(long_name is z) # True See also: long_name = 1 x = long_name long_name += 1 y = long_name long_name += 1 z = long_name print(x) # 1 print(y) # 2 print(z) # 3 print(x is y) # False print(x is z) # False print(y is z) # False print(long_name is z) # True > > > >> FWIW why can't you just write: >> >> x.y >> >> or for your example: >> >> long_name.seta(a) >> >> ? >> >> > > See the IRC bot builder example, it should be more clear. (It's > about forking the builder.) > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gerald.britton at gmail.com Mon Jan 23 13:27:03 2017 From: gerald.britton at gmail.com (Gerald Britton) Date: Mon, 23 Jan 2017 13:27:03 -0500 Subject: [Python-ideas] "Immutable Builder" Pattern and Operator In-Reply-To: <39e9f6150fbf4503875edd46fad43600@CO2PR27MB010.066d.mgd.msft.net> References: <39e9f6150fbf4503875edd46fad43600@CO2PR27MB010.066d.mgd.msft.net> Message-ID: On Jan 23, 2017 1:12 PM, "Britton, Gerald" wrote: On 23/01/17 02:56 PM, Gerald Britton wrote: > > >* On Jan 23, 2017 11:07 AM, "Soni L." * >* 2Bpy at gmail.com >> wrote:* > > > >* On 23/01/17 01:52 PM, Gerald Britton wrote:* >> >> >> >>* [snip]* >> >>* >I propose `x .= y` -> `x = x . y`, for any `y`.* >> >>* [snip]* >> >>* I think you mean "any y that is a member of x"* >> > >* Since it desugars into `x = x.y`, you can literally use anything* >* for `y`.* > >* x .= __call__().whatever().unwrap() * 3* > >* is equivalent to* > >* x = x.__call__().whatever().unwrap() * 3* > >* and* > >* x .= 1* > >* is equivalent to* > >* x = x.1* > >* which is equivalent to* > >* SyntaxError: invalid syntax* > > >>* Also, note that this syntax means that x will be rebound to* >>* the result of calling x.y, whatever that is (frequently,* >>* None, for mutating methods)* >> >>* In general, you can't count on methods to return references* >>* to their instances, even though it's handy for fluent coding,* >>* so this side effect may be unexpected to some* >> > >* This is why it's for use with **immutable** objects.* > > >>* That's a problem with your original example:* >> >>* >long_name = mkbuilder()* >> >>* >long_name = long_name.seta(a)* >> >>* >long_name = long_name.setb(b)* >> >>* >y = long_name.build()* >> >>* What do the methods seta and setb return? If they don't* >>* return "self" you've got a problem. I think.* >> > >* They don't return self. Ever. The value bound to long_name is* >* immutable, like an integer. They return a new instance.* > > >* Then long_name isn't immutable. It changes with every line. That can * >* lead to nasty bugs if you count on its immutability.* > >* Easy to see. Just print long_name after each call.* You're mixing up value immutability with name immutability. The name isn't immutable, but: Er...No. I'm not confused at all, unless you define immutability in a new way.. you said that the "value bound to long_name is immutable." It's not. Your example below proves it. An immutable object is one whose state cannot be modified once set. That's not happening here. The state of the object bound to long_name, which is a pointer to an instance of you class, changes with each line. long_name = mkbuilder() x = long_name long_name .= seta("a") y = long_name long_name .= setb("b") z = long_name print(x) # a = None, b = None print(y) # a = "a", b = None print(z) # a = "a", b = "b" print(x is y) # False print(x is z) # False print(y is z) # False print(long_name is z) # True See also: long_name = 1 x = long_name long_name += 1 y = long_name long_name += 1 z = long_name print(x) # 1 print(y) # 2 print(z) # 3 print(x is y) # False print(x is z) # False print(y is z) # False print(long_name is z) # True > > > >>* FWIW why can't you just write:* >> >>* x.y* >> >>* or for your example:* >> >>* long_name.seta(a)* >> >>* ?* >> >> > >* See the IRC bot builder example, it should be more clear. (It's* >* about forking the builder.)* > > If you wish to unsubscribe from receiving commercial electronic messages from TD Bank Group, please click here or go to the following web address: www.td.com/tdoptout Si vous souhaitez vous d?sabonner des messages ?lectroniques de nature commerciale envoy?s par Groupe Banque TD veuillez cliquer ici ou vous rendre ? l'adresse www.td.com/tddesab NOTICE: Confidential message which may be privileged. Unauthorized use/disclosure prohibited. If received in error, please go to www.td.com/legal for instructions. AVIS : Message confidentiel dont le contenu peut ?tre privil?gi?. Utilisation/divulgation interdites sans permission. Si re?u par erreur, pri?re d'aller au www.td.com/francais/avis_juridique pour des instructions. -------------- next part -------------- An HTML attachment was scrubbed... URL: From fakedme+py at gmail.com Mon Jan 23 13:32:12 2017 From: fakedme+py at gmail.com (Soni L.) Date: Mon, 23 Jan 2017 16:32:12 -0200 Subject: [Python-ideas] "Immutable Builder" Pattern and Operator In-Reply-To: References: <39e9f6150fbf4503875edd46fad43600@CO2PR27MB010.066d.mgd.msft.net> Message-ID: <457a6966-b2cb-3c96-7e44-09bbe9efde01@gmail.com> On 23/01/17 04:27 PM, Gerald Britton wrote: > > > On Jan 23, 2017 1:12 PM, "Britton, Gerald" > wrote: > > On 23/01/17 02:56 PM, Gerald Britton wrote: > > >// > > >// > > >/On Jan 23, 2017 11:07 AM, "Soni L." / > > >/2Bpy at gmail.com > >> wrote:/ > > >// > > >// > > >// > > >/ On 23/01/17 01:52 PM, Gerald Britton wrote:/ > > >>// > > >>// > > >>// > > >>/ [snip]/ > > >>// > > >>/ >I propose `x .= y` -> `x = x . y`, for any `y`./ > > >>// > > >>/ [snip]/ > > >>// > > >>/ I think you mean "any y that is a member of x"/ > > >>// > > >// > > >/ Since it desugars into `x = x.y`, you can literally use > anything/ > > >/ for `y`./ > > >// > > >/ x .= __call__().whatever().unwrap() * 3/ > > >// > > >/ is equivalent to/ > > >// > > >/ x = x.__call__().whatever().unwrap() * 3/ > > >// > > >/ and/ > > >// > > >/ x .= 1/ > > >// > > >/ is equivalent to/ > > >// > > >/ x = x.1/ > > >// > > >/ which is equivalent to/ > > >// > > >/ SyntaxError: invalid syntax/ > > >// > > >// > > >>/ Also, note that this syntax means that x will be rebound to/ > > >>/ the result of calling x.y, whatever that is (frequently,/ > > >>/ None, for mutating methods)/ > > >>// > > >>/ In general, you can't count on methods to return references/ > > >>/ to their instances, even though it's handy for fluent > coding,/ > > >>/ so this side effect may be unexpected to some/ > > >>// > > >// > > >/ This is why it's for use with **immutable** objects./ > > >// > > >// > > >>/ That's a problem with your original example:/ > > >>// > > >>/ >long_name = mkbuilder()/ > > >>// > > >>/ >long_name = long_name.seta(a)/ > > >>// > > >>/ >long_name = long_name.setb(b)/ > > >>// > > >>/ >y = long_name.build()/ > > >>// > > >>/ What do the methods seta and setb return? If they don't/ > > >>/ return "self" you've got a problem. I think./ > > >>// > > >// > > >/ They don't return self. Ever. The value bound to long_name is/ > > >/ immutable, like an integer. They return a new instance./ > > >// > > >// > > >/Then long_name isn't immutable. It changes with every line. That > can / > > >/lead to nasty bugs if you count on its immutability./ > > >// > > >/Easy to see. Just print long_name after each call./ > > You're mixing up value immutability with name immutability. The name > > isn't immutable, but: > > Er...No. I'm not confused at all, unless you define immutability in a > new way.. you said that the "value bound to long_name is immutable." > It's not. Your example below proves it. > An immutable object is one whose state cannot be modified once set. > That's not happening here. The state of the object bound to > long_name, which is a pointer to an instance of you class, changes > with each line. Python has pointers now?! The value pointed to by the value bound to long_name is immutable. Can you stop being so pedantic? >.< > > long_name = mkbuilder() > > x = long_name > > long_name .= seta("a") > > y = long_name > > long_name .= setb("b") > > z = long_name > > print(x) # a = None, b = None > > print(y) # a = "a", b = None > > print(z) # a = "a", b = "b" > > print(x is y) # False > > print(x is z) # False > > print(y is z) # False > > print(long_name is z) # True > > See also: > > long_name = 1 > > x = long_name > > long_name += 1 > > y = long_name > > long_name += 1 > > z = long_name > > print(x) # 1 > > print(y) # 2 > > print(z) # 3 > > print(x is y) # False > > print(x is z) # False > > print(y is z) # False > > print(long_name is z) # True > > >// > > >// > > >// > > >>/ FWIW why can't you just write:/ > > >>// > > >>/ x.y/ > > >>// > > >>/ or for your example:/ > > >>// > > >>/ long_name.seta(a)/ > > >>// > > >>/ ?/ > > >>// > > >>// > > >// > > >/ See the IRC bot builder example, it should be more clear. (It's/ > > >/ about forking the builder.)/ > > >// > > >// > > If you wish to unsubscribe from receiving commercial electronic > messages from TD Bank Group, please click here > or go to the following web address: > www.td.com/tdoptout Si vous souhaitez > vous d?sabonner des messages ?lectroniques de nature commerciale > envoy?s par Groupe Banque TD veuillez cliquer ici > ou vous rendre ? l'adresse > www.td.com/tddesab > > NOTICE: Confidential message which may be privileged. Unauthorized > use/disclosure prohibited. If received in error, please go to > www.td.com/legal for instructions. AVIS > : Message confidentiel dont le contenu peut ?tre privil?gi?. > Utilisation/divulgation interdites sans permission. Si re?u par > erreur, pri?re d'aller au www.td.com/francais/avis_juridique > pour des instructions. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From abrault at mapgears.com Mon Jan 23 13:27:24 2017 From: abrault at mapgears.com (Alexandre Brault) Date: Mon, 23 Jan 2017 13:27:24 -0500 Subject: [Python-ideas] "Immutable Builder" Pattern and Operator In-Reply-To: References: <9bec65d9-a1ae-a0b7-2f4a-cb7db48e362f@gmail.com> Message-ID: On 2017-01-23 12:54 PM, Soni L. wrote: > On 23/01/17 02:56 PM, Gerald Britton wrote: >> On Jan 23, 2017 11:07 AM, "Soni L." > > wrote: >> >> On 23/01/17 01:52 PM, Gerald Britton wrote: >>> >>> [snip] >>> >>> >>> >>> >I propose `x .= y` -> `x = x . y`, for any `y`. >>> >>> >>> >>> [snip] >>> >>> >>> >>> I think you mean "any y that is a member of x" >>> >> >> Since it desugars into `x = x.y`, you can literally use anything >> for `y`. >> >> x .= __call__().whatever().unwrap() * 3 >> >> is equivalent to >> >> x = x.__call__().whatever().unwrap() * 3 >> >> and >> >> x .= 1 >> >> is equivalent to >> >> x = x.1 >> >> which is equivalent to >> >> SyntaxError: invalid syntax >> >> >>> >>> >>> Also, note that this syntax means that x will be rebound to >>> the result of calling x.y, whatever that is (frequently, >>> None, for mutating methods) >>> >>> In general, you can't count on methods to return references >>> to their instances, even though it's handy for fluent >>> coding, so this side effect may be unexpected to some >>> >> >> This is why it's for use with **immutable** objects. >> >> >>> >>> >>> That's a problem with your original example: >>> >>> >>> >>> >long_name = mkbuilder() >>> >>> >long_name = long_name.seta(a) >>> >>> >long_name = long_name.setb(b) >>> >>> >y = long_name.build() >>> >>> >>> >>> What do the methods seta and setb return? If they don't >>> return "self" you've got a problem. I think. >>> >> >> They don't return self. Ever. The value bound to long_name is >> immutable, like an integer. They return a new instance. >> >> >> Then long_name isn't immutable. It changes with every line. That can >> lead to nasty bugs if you count on its immutability. >> >> Easy to see. Just print long_name after each call. > > You're mixing up value immutability with name immutability. The name > isn't immutable, but: > > long_name = mkbuilder() > x = long_name > long_name .= seta("a") > y = long_name > long_name .= setb("b") > z = long_name > print(x) # a = None, b = None > print(y) # a = "a", b = None > print(z) # a = "a", b = "b" > print(x is y) # False > print(x is z) # False > print(y is z) # False > print(long_name is z) # True This is a really contrived example that doesn't need a special syntax. Consider instead: >>> x = mkbuilder() >>> y = x.seta('a') >>> z = y.setb('b') >>> long_name = z All your prints work as well, this is much easier to mentally parse (notwithstanding the unpythonesque builder pattern) and doesn't require the addition of a new syntax that people more knowledgeable than me have already explained is not a good idea --Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcrmatos at gmail.com Mon Jan 23 13:43:52 2017 From: jcrmatos at gmail.com (=?UTF-8?Q?Jo=c3=a3o_Matos?=) Date: Mon, 23 Jan 2017 18:43:52 +0000 Subject: [Python-ideas] globals should accept parenteses for extending beyond 1 line Message-ID: <0147af7a-fcfd-3379-a1d9-f7f1ec1d1369@gmail.com> Hello, I would like to suggest that globals should follow the existing rule (followed by the import statement, the if statement and in other places) for extending beyond 1 line using parentheses. Like this: globals (var_1, var_2, var_3) instead of what must be done now, which is: globals var_1, var_2 \ var_3 Best regards, JM From guido at python.org Mon Jan 23 14:14:07 2017 From: guido at python.org (Guido van Rossum) Date: Mon, 23 Jan 2017 11:14:07 -0800 Subject: [Python-ideas] globals should accept parenteses for extending beyond 1 line In-Reply-To: <0147af7a-fcfd-3379-a1d9-f7f1ec1d1369@gmail.com> References: <0147af7a-fcfd-3379-a1d9-f7f1ec1d1369@gmail.com> Message-ID: You can just write global foo, bar global baz, bletch On Mon, Jan 23, 2017 at 10:43 AM, Jo?o Matos wrote: > Hello, > > I would like to suggest that globals should follow the existing rule > (followed by the import statement, the if statement and in other places) > for extending beyond 1 line using parentheses. > Like this: > globals (var_1, var_2, > var_3) > > instead of what must be done now, which is: > globals var_1, var_2 \ > var_3 > > > Best regards, > > JM > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcrmatos at gmail.com Mon Jan 23 14:22:38 2017 From: jcrmatos at gmail.com (=?UTF-8?Q?Jo=c3=a3o_Matos?=) Date: Mon, 23 Jan 2017 19:22:38 +0000 Subject: [Python-ideas] globals should accept parenteses for extending beyond 1 line In-Reply-To: References: <0147af7a-fcfd-3379-a1d9-f7f1ec1d1369@gmail.com> Message-ID: Hello, To me that makes no sense. If for the import statement the rule is to use the parentheses and not repeating the import statement, why should it be different with global? Best regards, JM On 23-01-2017 19:14, Guido van Rossum wrote: > You can just write > global foo, bar > global baz, bletch > > On Mon, Jan 23, 2017 at 10:43 AM, Jo?o Matos > wrote: > > Hello, > > I would like to suggest that globals should follow the existing > rule (followed by the import statement, the if statement and in > other places) for extending beyond 1 line using parentheses. > Like this: > globals (var_1, var_2, > var_3) > > instead of what must be done now, which is: > globals var_1, var_2 \ > var_3 > > > Best regards, > > JM > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > > > > > > -- > --Guido van Rossum (python.org/~guido ) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ned at nedbatchelder.com Mon Jan 23 14:24:15 2017 From: ned at nedbatchelder.com (Ned Batchelder) Date: Mon, 23 Jan 2017 14:24:15 -0500 Subject: [Python-ideas] "Immutable Builder" Pattern and Operator In-Reply-To: References: <39e9f6150fbf4503875edd46fad43600@CO2PR27MB010.066d.mgd.msft.net> Message-ID: <8d345f41-9a8d-d1ff-cecb-85a6e1e86548@nedbatchelder.com> On 1/23/17 1:27 PM, Gerald Britton wrote: > > > On Jan 23, 2017 1:12 PM, "Britton, Gerald" > wrote: > > > > > You're mixing up value immutability with name immutability. The name > > isn't immutable, but: > > > Er...No. I'm not confused at all, unless you define immutability in a > new way.. you said that the "value bound to long_name is immutable." > It's not. Your example below proves it. > > An immutable object is one whose state cannot be modified once set. > That's not happening here. The state of the object bound to > long_name, which is a pointer to an instance of you class, changes > with each line. In Python, names refer to values. Values can be immutable. Ints, strings, and tuples are examples of immutable values. They cannot be mutated. In Python, names cannot be immutable. It is always possible to make an existing name refer to a new value. --Ned. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Mon Jan 23 14:25:45 2017 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 23 Jan 2017 14:25:45 -0500 Subject: [Python-ideas] globals should accept parenteses for extending beyond 1 line In-Reply-To: <0147af7a-fcfd-3379-a1d9-f7f1ec1d1369@gmail.com> References: <0147af7a-fcfd-3379-a1d9-f7f1ec1d1369@gmail.com> Message-ID: On 1/23/2017 1:43 PM, Jo?o Matos wrote: > Hello, > > I would like to suggest that globals should follow the existing rule > (followed by the import statement, the if statement and in other places) > for extending beyond 1 line using parentheses. > Like this: > globals (var_1, var_2, > var_3) > > instead of what must be done now, which is: > globals var_1, var_2 \ > var_3 The declaration keyword is 'global'; 'globals' is the built-in function. In any case global var_1, var_2 global var_3 works fine. There is no connection between the names and, unlike with import, no operational efficiency is gained by mashing the statements together. This issue should be rare. The global statement is only needed when one is rebinding global names within a function*. If a function rebinds 10 different global names, the design should probably be re-examined. * 'global' at class scope seems useless. a = 0 class C: a = 1 has the same effect as a = 0 a = 1 class C: pass -- Terry Jan Reedy From jcrmatos at gmail.com Mon Jan 23 14:37:44 2017 From: jcrmatos at gmail.com (=?UTF-8?Q?Jo=c3=a3o_Matos?=) Date: Mon, 23 Jan 2017 19:37:44 +0000 Subject: [Python-ideas] globals should accept parenteses for extending beyond 1 line In-Reply-To: References: <0147af7a-fcfd-3379-a1d9-f7f1ec1d1369@gmail.com> Message-ID: Hello, You are correct, my mistake. I should have written global and not globals. The purpose of using parentheses on the import statement is not (in my view) for operational efficiency but for appearance/cleaness. The same applies to using it to global. One does not need to have 10 global vars. It may have to do with var name length and the 79 max line length. This is an example from my one of my programs: global existing_graph, expected_duration_in_sec, file_size, \ file_mtime, no_change_counter Anyway, the use of global being rare is of no concern. The point of my suggestion is standardization. My opinion is that a standard language is easier to learn (and teach) than one that has different syntax for the same issue, depending on the statement. In short, if the recommended multi-line use for import is import (a, b, c) instead of import a, b, \ c Then the same should apply to global. Best regards, JM On 23-01-2017 19:25, Terry Reedy wrote: > On 1/23/2017 1:43 PM, Jo?o Matos wrote: >> Hello, >> >> I would like to suggest that globals should follow the existing rule >> (followed by the import statement, the if statement and in other places) >> for extending beyond 1 line using parentheses. >> Like this: >> globals (var_1, var_2, >> var_3) >> >> instead of what must be done now, which is: >> globals var_1, var_2 \ >> var_3 > > The declaration keyword is 'global'; 'globals' is the built-in > function. In any case > > global var_1, var_2 > global var_3 > > works fine. There is no connection between the names and, unlike with > import, no operational efficiency is gained by mashing the statements > together. > > This issue should be rare. The global statement is only needed when > one is rebinding global names within a function*. If a function > rebinds 10 different global names, the design should probably be > re-examined. > > * 'global' at class scope seems useless. > > a = 0 > class C: > a = 1 > > has the same effect as > a = 0 > a = 1 > class C: pass > From stephanh42 at gmail.com Mon Jan 23 14:51:04 2017 From: stephanh42 at gmail.com (Stephan Houben) Date: Mon, 23 Jan 2017 20:51:04 +0100 Subject: [Python-ideas] globals should accept parenteses for extending beyond 1 line In-Reply-To: References: <0147af7a-fcfd-3379-a1d9-f7f1ec1d1369@gmail.com> Message-ID: For what it's worth, I normally just do: global a global b But I've never needed more than two. I think if you need more, then there is a serious style issue. That it looks syntactically ugly is a feature. Perhaps we should deprecate the comma in global ;-) . Stephan Op 23 jan. 2017 8:38 p.m. schreef "Jo?o Matos" : > Hello, > > You are correct, my mistake. I should have written global and not globals. > > The purpose of using parentheses on the import statement is not (in my > view) for operational efficiency but for appearance/cleaness. > The same applies to using it to global. > > One does not need to have 10 global vars. It may have to do with var name > length and the 79 max line length. > > This is an example from my one of my programs: > global existing_graph, expected_duration_in_sec, file_size, \ > file_mtime, no_change_counter > > Anyway, the use of global being rare is of no concern. The point of my > suggestion is standardization. > My opinion is that a standard language is easier to learn (and teach) than > one that has different syntax for the same issue, depending on the > statement. > > In short, if the recommended multi-line use for import is > > import (a, b, > c) > > instead of > > import a, b, \ > c > > Then the same should apply to global. > > > Best regards, > > JM > > > > > On 23-01-2017 19:25, Terry Reedy wrote: > >> On 1/23/2017 1:43 PM, Jo?o Matos wrote: >> >>> Hello, >>> >>> I would like to suggest that globals should follow the existing rule >>> (followed by the import statement, the if statement and in other places) >>> for extending beyond 1 line using parentheses. >>> Like this: >>> globals (var_1, var_2, >>> var_3) >>> >>> instead of what must be done now, which is: >>> globals var_1, var_2 \ >>> var_3 >>> >> >> The declaration keyword is 'global'; 'globals' is the built-in function. >> In any case >> >> global var_1, var_2 >> global var_3 >> >> works fine. There is no connection between the names and, unlike with >> import, no operational efficiency is gained by mashing the statements >> together. >> >> This issue should be rare. The global statement is only needed when one >> is rebinding global names within a function*. If a function rebinds 10 >> different global names, the design should probably be re-examined. >> >> * 'global' at class scope seems useless. >> >> a = 0 >> class C: >> a = 1 >> >> has the same effect as >> a = 0 >> a = 1 >> class C: pass >> >> > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Mon Jan 23 14:53:20 2017 From: brett at python.org (Brett Cannon) Date: Mon, 23 Jan 2017 19:53:20 +0000 Subject: [Python-ideas] globals should accept parenteses for extending beyond 1 line In-Reply-To: References: <0147af7a-fcfd-3379-a1d9-f7f1ec1d1369@gmail.com> Message-ID: Actually multi-line import doesn't work: File ".\Untitled.py", line 1 import (tokenize, ^ SyntaxError: invalid syntax I think you're getting this mixed up with parentheses being allowed in `from ... import (...)` syntax. So unless there is another single-word keyword that allows multi-line arguments using parentheses I don't think there's an inconsistency here. Plus, as Guido pointed out, the current syntax isn't preventing you from doing something you can already do. So if you want to add parentheses support to global, nonlocal, and import, you can propose a patch, but it's not a priority to solve without someone providing a solution since it doesn't open up anything new for something people don't use on a regular basis. On Mon, 23 Jan 2017 at 11:39 Jo?o Matos wrote: > Hello, > > You are correct, my mistake. I should have written global and not globals. > > The purpose of using parentheses on the import statement is not (in my > view) for operational efficiency but for appearance/cleaness. > The same applies to using it to global. > > One does not need to have 10 global vars. It may have to do with var > name length and the 79 max line length. > > This is an example from my one of my programs: > global existing_graph, expected_duration_in_sec, file_size, \ > file_mtime, no_change_counter > > Anyway, the use of global being rare is of no concern. The point of my > suggestion is standardization. > My opinion is that a standard language is easier to learn (and teach) > than one that has different syntax for the same issue, depending on the > statement. > > In short, if the recommended multi-line use for import is > > import (a, b, > c) > > instead of > > import a, b, \ > c > > Then the same should apply to global. > > > Best regards, > > JM > > > > > On 23-01-2017 19:25, Terry Reedy wrote: > > On 1/23/2017 1:43 PM, Jo?o Matos wrote: > >> Hello, > >> > >> I would like to suggest that globals should follow the existing rule > >> (followed by the import statement, the if statement and in other places) > >> for extending beyond 1 line using parentheses. > >> Like this: > >> globals (var_1, var_2, > >> var_3) > >> > >> instead of what must be done now, which is: > >> globals var_1, var_2 \ > >> var_3 > > > > The declaration keyword is 'global'; 'globals' is the built-in > > function. In any case > > > > global var_1, var_2 > > global var_3 > > > > works fine. There is no connection between the names and, unlike with > > import, no operational efficiency is gained by mashing the statements > > together. > > > > This issue should be rare. The global statement is only needed when > > one is rebinding global names within a function*. If a function > > rebinds 10 different global names, the design should probably be > > re-examined. > > > > * 'global' at class scope seems useless. > > > > a = 0 > > class C: > > a = 1 > > > > has the same effect as > > a = 0 > > a = 1 > > class C: pass > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From prometheus235 at gmail.com Mon Jan 23 15:09:12 2017 From: prometheus235 at gmail.com (Nick Timkovich) Date: Mon, 23 Jan 2017 14:09:12 -0600 Subject: [Python-ideas] globals should accept parenteses for extending beyond 1 line In-Reply-To: References: <0147af7a-fcfd-3379-a1d9-f7f1ec1d1369@gmail.com> Message-ID: Related and probably more common is the need for the line-continuation operator for long/multiple context managers with "with". I assume that's come up before, but was it also just a low priority rather than any technical reason? On Mon, Jan 23, 2017 at 1:53 PM, Brett Cannon wrote: > Actually multi-line import doesn't work: > > File ".\Untitled.py", line 1 > import (tokenize, > ^ > SyntaxError: invalid syntax > > I think you're getting this mixed up with parentheses being allowed in > `from ... import (...)` syntax. So unless there is another single-word > keyword that allows multi-line arguments using parentheses I don't think > there's an inconsistency here. > > Plus, as Guido pointed out, the current syntax isn't preventing you from > doing something you can already do. So if you want to add parentheses > support to global, nonlocal, and import, you can propose a patch, but it's > not a priority to solve without someone providing a solution since it > doesn't open up anything new for something people don't use on a regular > basis. > > > On Mon, 23 Jan 2017 at 11:39 Jo?o Matos wrote: > >> Hello, >> >> You are correct, my mistake. I should have written global and not globals. >> >> The purpose of using parentheses on the import statement is not (in my >> view) for operational efficiency but for appearance/cleaness. >> The same applies to using it to global. >> >> One does not need to have 10 global vars. It may have to do with var >> name length and the 79 max line length. >> >> This is an example from my one of my programs: >> global existing_graph, expected_duration_in_sec, file_size, \ >> file_mtime, no_change_counter >> >> Anyway, the use of global being rare is of no concern. The point of my >> suggestion is standardization. >> My opinion is that a standard language is easier to learn (and teach) >> than one that has different syntax for the same issue, depending on the >> statement. >> >> In short, if the recommended multi-line use for import is >> >> import (a, b, >> c) >> >> instead of >> >> import a, b, \ >> c >> >> Then the same should apply to global. >> >> >> Best regards, >> >> JM >> >> >> >> >> On 23-01-2017 19:25, Terry Reedy wrote: >> > On 1/23/2017 1:43 PM, Jo?o Matos wrote: >> >> Hello, >> >> >> >> I would like to suggest that globals should follow the existing rule >> >> (followed by the import statement, the if statement and in other >> places) >> >> for extending beyond 1 line using parentheses. >> >> Like this: >> >> globals (var_1, var_2, >> >> var_3) >> >> >> >> instead of what must be done now, which is: >> >> globals var_1, var_2 \ >> >> var_3 >> > >> > The declaration keyword is 'global'; 'globals' is the built-in >> > function. In any case >> > >> > global var_1, var_2 >> > global var_3 >> > >> > works fine. There is no connection between the names and, unlike with >> > import, no operational efficiency is gained by mashing the statements >> > together. >> > >> > This issue should be rare. The global statement is only needed when >> > one is rebinding global names within a function*. If a function >> > rebinds 10 different global names, the design should probably be >> > re-examined. >> > >> > * 'global' at class scope seems useless. >> > >> > a = 0 >> > class C: >> > a = 1 >> > >> > has the same effect as >> > a = 0 >> > a = 1 >> > class C: pass >> > >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Mon Jan 23 15:24:53 2017 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 24 Jan 2017 07:24:53 +1100 Subject: [Python-ideas] globals should accept parenteses for extending beyond 1 line In-Reply-To: References: <0147af7a-fcfd-3379-a1d9-f7f1ec1d1369@gmail.com> Message-ID: On Tue, Jan 24, 2017 at 6:37 AM, Jo?o Matos wrote: > One does not need to have 10 global vars. It may have to do with var name > length and the 79 max line length. > > This is an example from my one of my programs: > global existing_graph, expected_duration_in_sec, file_size, \ > file_mtime, no_change_counter > I think you're already running into serious design concerns here. Why are file_size and file_mtime global? Perhaps a better design would involve a class, where this function would become a method, and those globals become "self.file_size" and "self.file_mtime". Then you can have a single global instance of that class for now, but if ever you need two of them, it's trivially easy. You encapsulate all of this global state into a coherent package. But if you MUST use globals, I would split the lines according to purpose: global existing_graph, expected_duration # "in_sec" is unnecessary global file_size, file_mtime global no_change_counter # also probably needs a new name That way, you're unlikely to run into the 80-char limit. ChrisA From jcrmatos at gmail.com Mon Jan 23 16:22:09 2017 From: jcrmatos at gmail.com (=?UTF-8?Q?Jo=c3=a3o_Matos?=) Date: Mon, 23 Jan 2017 21:22:09 +0000 Subject: [Python-ideas] globals should accept parenteses for extending beyond 1 line In-Reply-To: References: <0147af7a-fcfd-3379-a1d9-f7f1ec1d1369@gmail.com> Message-ID: Hello, I understand. Python sources are very large. Any pointers to which file defines the global statement syntax? Best regards, JM On 23-01-2017 19:53, Brett Cannon wrote: > Actually multi-line import doesn't work: > > File ".\Untitled.py", line 1 > import (tokenize, > ^ > SyntaxError: invalid syntax > > I think you're getting this mixed up with parentheses being allowed in > `from ... import (...)` syntax. So unless there is another single-word > keyword that allows multi-line arguments using parentheses I don't > think there's an inconsistency here. > > Plus, as Guido pointed out, the current syntax isn't preventing you > from doing something you can already do. So if you want to add > parentheses support to global, nonlocal, and import, you can propose a > patch, but it's not a priority to solve without someone providing a > solution since it doesn't open up anything new for something people > don't use on a regular basis. > > > On Mon, 23 Jan 2017 at 11:39 Jo?o Matos > wrote: > > Hello, > > You are correct, my mistake. I should have written global and not > globals. > > The purpose of using parentheses on the import statement is not (in my > view) for operational efficiency but for appearance/cleaness. > The same applies to using it to global. > > One does not need to have 10 global vars. It may have to do with var > name length and the 79 max line length. > > This is an example from my one of my programs: > global existing_graph, expected_duration_in_sec, file_size, \ > file_mtime, no_change_counter > > Anyway, the use of global being rare is of no concern. The point of my > suggestion is standardization. > My opinion is that a standard language is easier to learn (and teach) > than one that has different syntax for the same issue, depending > on the > statement. > > In short, if the recommended multi-line use for import is > > import (a, b, > c) > > instead of > > import a, b, \ > c > > Then the same should apply to global. > > > Best regards, > > JM > > > > > On 23-01-2017 19:25, Terry Reedy wrote: > > On 1/23/2017 1:43 PM, Jo?o Matos wrote: > >> Hello, > >> > >> I would like to suggest that globals should follow the existing > rule > >> (followed by the import statement, the if statement and in > other places) > >> for extending beyond 1 line using parentheses. > >> Like this: > >> globals (var_1, var_2, > >> var_3) > >> > >> instead of what must be done now, which is: > >> globals var_1, var_2 \ > >> var_3 > > > > The declaration keyword is 'global'; 'globals' is the built-in > > function. In any case > > > > global var_1, var_2 > > global var_3 > > > > works fine. There is no connection between the names and, > unlike with > > import, no operational efficiency is gained by mashing the > statements > > together. > > > > This issue should be rare. The global statement is only needed when > > one is rebinding global names within a function*. If a function > > rebinds 10 different global names, the design should probably be > > re-examined. > > > > * 'global' at class scope seems useless. > > > > a = 0 > > class C: > > a = 1 > > > > has the same effect as > > a = 0 > > a = 1 > > class C: pass > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcrmatos at gmail.com Mon Jan 23 16:18:54 2017 From: jcrmatos at gmail.com (=?UTF-8?Q?Jo=c3=a3o_Matos?=) Date: Mon, 23 Jan 2017 21:18:54 +0000 Subject: [Python-ideas] globals should accept parenteses for extending beyond 1 line In-Reply-To: References: <0147af7a-fcfd-3379-a1d9-f7f1ec1d1369@gmail.com> Message-ID: Hello, The subject of this topic is a suggestion about the language and not the programming paradigm/style. Why should I repeat global if I can use the line separation character \ (like I mentioned on my 1st email) or parentheses as I suggested? "global existing_graph, expected_duration # "in_sec" is unnecessary" No it is not unnecessary unless sec is the only unit you use (in this case we have several units of duration and thus it is necessary). Best regards, JM On 23-01-2017 20:24, Chris Angelico wrote: > On Tue, Jan 24, 2017 at 6:37 AM, Jo?o Matos wrote: >> One does not need to have 10 global vars. It may have to do with var name >> length and the 79 max line length. >> >> This is an example from my one of my programs: >> global existing_graph, expected_duration_in_sec, file_size, \ >> file_mtime, no_change_counter >> > I think you're already running into serious design concerns here. Why > are file_size and file_mtime global? Perhaps a better design would > involve a class, where this function would become a method, and those > globals become "self.file_size" and "self.file_mtime". Then you can > have a single global instance of that class for now, but if ever you > need two of them, it's trivially easy. You encapsulate all of this > global state into a coherent package. > > But if you MUST use globals, I would split the lines according to purpose: > > global existing_graph, expected_duration # "in_sec" is unnecessary > global file_size, file_mtime > global no_change_counter # also probably needs a new name > > That way, you're unlikely to run into the 80-char limit. > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From python at mrabarnett.plus.com Mon Jan 23 16:29:51 2017 From: python at mrabarnett.plus.com (MRAB) Date: Mon, 23 Jan 2017 21:29:51 +0000 Subject: [Python-ideas] globals should accept parenteses for extending beyond 1 line In-Reply-To: References: <0147af7a-fcfd-3379-a1d9-f7f1ec1d1369@gmail.com> Message-ID: On 2017-01-23 20:09, Nick Timkovich wrote: > Related and probably more common is the need for the line-continuation > operator for long/multiple context managers with "with". I assume that's > come up before, but was it also just a low priority rather than any > technical reason? > It has come up before, and there is a technical reason, namely the syntactic ambiguity when parsing. Not impossible to fix, but probably not worth the added complexity. > On Mon, Jan 23, 2017 at 1:53 PM, Brett Cannon > wrote: > > Actually multi-line import doesn't work: > > File ".\Untitled.py", line 1 > import (tokenize, > ^ > SyntaxError: invalid syntax > > I think you're getting this mixed up with parentheses being allowed > in `from ... import (...)` syntax. So unless there is another > single-word keyword that allows multi-line arguments using > parentheses I don't think there's an inconsistency here. > > Plus, as Guido pointed out, the current syntax isn't preventing you > from doing something you can already do. So if you want to add > parentheses support to global, nonlocal, and import, you can propose > a patch, but it's not a priority to solve without someone providing > a solution since it doesn't open up anything new for something > people don't use on a regular basis. > > > On Mon, 23 Jan 2017 at 11:39 Jo?o Matos > wrote: > > Hello, > > You are correct, my mistake. I should have written global and > not globals. > > The purpose of using parentheses on the import statement is not > (in my > view) for operational efficiency but for appearance/cleaness. > The same applies to using it to global. > > One does not need to have 10 global vars. It may have to do with var > name length and the 79 max line length. > > This is an example from my one of my programs: > global existing_graph, expected_duration_in_sec, file_size, \ > file_mtime, no_change_counter > > Anyway, the use of global being rare is of no concern. The point > of my > suggestion is standardization. > My opinion is that a standard language is easier to learn (and > teach) > than one that has different syntax for the same issue, depending > on the > statement. > > In short, if the recommended multi-line use for import is > > import (a, b, > c) > > instead of > > import a, b, \ > c > > Then the same should apply to global. > > > Best regards, > > JM > > > > > On 23-01-2017 19:25, Terry Reedy wrote: > > On 1/23/2017 1:43 PM, Jo?o Matos wrote: > >> Hello, > >> > >> I would like to suggest that globals should follow the > existing rule > >> (followed by the import statement, the if statement and in > other places) > >> for extending beyond 1 line using parentheses. > >> Like this: > >> globals (var_1, var_2, > >> var_3) > >> > >> instead of what must be done now, which is: > >> globals var_1, var_2 \ > >> var_3 > > > > The declaration keyword is 'global'; 'globals' is the built-in > > function. In any case > > > > global var_1, var_2 > > global var_3 > > > > works fine. There is no connection between the names and, > unlike with > > import, no operational efficiency is gained by mashing the > statements > > together. > > > > This issue should be rare. The global statement is only > needed when > > one is rebinding global names within a function*. If a function > > rebinds 10 different global names, the design should probably be > > re-examined. > > > > * 'global' at class scope seems useless. > > > > a = 0 > > class C: > > a = 1 > > > > has the same effect as > > a = 0 > > a = 1 > > class C: pass > > From ethan at stoneleaf.us Mon Jan 23 17:04:16 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 23 Jan 2017 14:04:16 -0800 Subject: [Python-ideas] globals should accept parenteses for extending beyond 1 line In-Reply-To: References: <0147af7a-fcfd-3379-a1d9-f7f1ec1d1369@gmail.com> Message-ID: <58867DE0.70107@stoneleaf.us> On 01/23/2017 01:18 PM, Jo?o Matos wrote: > Why should I repeat global if I can use the line separation character \ > (like I mentioned on my 1st email) or parentheses as I suggested? Because prefixing each line with global is more readable than either \ or ( )? At least to me. ;) -- ~Ethan~ From steve at pearwood.info Mon Jan 23 19:04:23 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 24 Jan 2017 11:04:23 +1100 Subject: [Python-ideas] globals should accept parenteses for extending beyond 1 line In-Reply-To: References: <0147af7a-fcfd-3379-a1d9-f7f1ec1d1369@gmail.com> Message-ID: <20170124000423.GH7345@ando.pearwood.info> On Mon, Jan 23, 2017 at 09:18:54PM +0000, Jo?o Matos wrote: > Why should I repeat global if I can use the line separation character \ > (like I mentioned on my 1st email) or parentheses as I suggested? That's the wrong question. The right question is, why should the Python language be made larger, more complicated, with more lines of code, to support parentheses in global declarations, when there are already two perfectly good alternatives? global spam global eggs global spam, \ eggs Even if it only adds one extra line of code to the Python interpreter, that's still a cost. But it will add more than that: it will require the implementation, tests and documentation. And all other Python interpretations will need to do the same: IronPython, Jython, PyPy, ?Py, Stackless, Nuitka and possibly more. And it is a new feature that people have to learn. Every new feature has a cost. Even if the cost is tiny, the question is, will the benefit be greater than the cost? Supporting parentheses in from...import statements has major benefit: from deep.package.name import (spam, eggs, cheese, foo, bar, baz, fe, fi, fo, fum) is a big improvement over: from deep.package.name import spam, eggs, cheese from deep.package.name import foo, bar, baz from deep.package.name import fe, fi, fo, fum for at least two reasons: better efficiency, and DRY (Don't Repeat Yourself) with the package name. But for globals, neither reason applies: global statements are a compile-time declaration, not an executable statement, so efficiency isn't relevant, and there is no significant DRY issue with repeating the keyword global itself. So the question here is not "why shouldn't we allow parentheses?" but "why should we allow parentheses?" If your answer is just "I think it looks nicer", you probably won't find a lot of people who agree, and even fewer people who agree enough to actually do the work of writing the patch, the tests and the documentation. So that comes down to the most important question of all: - are you volunteering to do the work yourself? If there are no strong objections to adding this feature, it might be easier to get a core developer to offer to review your work and check it in, than to get a core developer to add the feature themselves. I don't dislike this proposed feature. Nor do I like it. I would probably never use it: it is very rare for me to use global at all, and even rarer to use more than one or two globals. But if somebody else did the work, I wouldn't strongly object to it. -- Steve From encukou at gmail.com Tue Jan 24 04:41:49 2017 From: encukou at gmail.com (Petr Viktorin) Date: Tue, 24 Jan 2017 10:41:49 +0100 Subject: [Python-ideas] globals should accept parenteses for extending beyond 1 line In-Reply-To: References: <0147af7a-fcfd-3379-a1d9-f7f1ec1d1369@gmail.com> Message-ID: On 01/23/2017 10:22 PM, Jo?o Matos wrote: > Hello, > > I understand. > Python sources are very large. Any pointers to which file defines the > global statement syntax? Consider joining the core-mentorship list for questions like these: https://mail.python.org/mailman/listinfo/core-mentorship Anyway, Python's grammar is defined in the file Grammar/Grammar, and there's a mini-HOWTO at the top of that file. Good luck! From brett at python.org Tue Jan 24 15:30:36 2017 From: brett at python.org (Brett Cannon) Date: Tue, 24 Jan 2017 20:30:36 +0000 Subject: [Python-ideas] globals should accept parenteses for extending beyond 1 line In-Reply-To: References: <0147af7a-fcfd-3379-a1d9-f7f1ec1d1369@gmail.com> Message-ID: On Tue, 24 Jan 2017 at 01:42 Petr Viktorin wrote: > On 01/23/2017 10:22 PM, Jo?o Matos wrote: > > Hello, > > > > I understand. > > Python sources are very large. Any pointers to which file defines the > > global statement syntax? > > Consider joining the core-mentorship list for questions like these: > https://mail.python.org/mailman/listinfo/core-mentorship > > Anyway, Python's grammar is defined in the file Grammar/Grammar, and > there's a mini-HOWTO at the top of that file. Good luck! > There's also https://cpython-devguide.readthedocs.io/en/latest/grammar.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From toddrjen at gmail.com Tue Jan 24 15:30:34 2017 From: toddrjen at gmail.com (Todd) Date: Tue, 24 Jan 2017 15:30:34 -0500 Subject: [Python-ideas] pathlib suggestions Message-ID: I have been using pathlib, and I have come up with a few suggestions on what would make the module more useful for me (and hopefully others): First, for me, extensions are primarily useful as a single unit. So, practically speaking, the extension of "spam.tar.gz" isn't ".gz", it is ".tar.gz". So it would be nice to have some properties to make it easier to deal with the "complete" extension like this. There is a "suffixes" property, but it returns a list, which you then have to recombine manually. And as far as I can tell there is no method to return the name without any extension. And there is no method for replacing all the extensions at once. So although the names are tentative, perhaps there could be a "fullsuffix" property to return the extensions as a single string, a "nosuffix" extension to return the path without any extensions, and a "with_suffixes" method that replaces all the suffix and can accept multiple arguments (which would then be joined to create the extensions). Second, for methods like "rename" and "replace", it would be nice if there was an "exist_ok" argument that defaults to "True" to allow for safe renaming. Third, it would be nice if there was a "uid" and "gid" method for getting the numeric user and group IDs for a file, or alternatively a "numeric" argument for the "owner" and "group" methods. Fourth, for the "is_*" methods, it would be nice if there was a "strict" argument that would raise an exception if the file or directory doesn't exist. Finally, although not problem with the module per se, the example for the "parts" property should probably show at least one file with an extension, to make it clear how it deals with extensions (since the documentation is ambiguous in this regard). Thanks for your time. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Tue Jan 24 15:39:17 2017 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Tue, 24 Jan 2017 20:39:17 +0000 Subject: [Python-ideas] pathlib suggestions In-Reply-To: References: Message-ID: <2rdg3xv0jumnjtg8o92h9gm22-0@mailer.nylas.com> As another suggestion, I'd love an rmtree method analogous to shutil.rmtree. And maybe also a remove method, that basically does: if path.is_dir(): path.rmtree() else: path.unlink() \-- Ryan (????) Yoko Shimomura > ryo (supercell/EGOIST) > Hiroyuki Sawano >> everyone else On Jan 24 2017, at 2:32 pm, Todd wrote: > I have been using pathlib, and I have come up with a few suggestions on what would make the module more useful for me (and hopefully others): First, for me, extensions are primarily useful as a single unit.? So, practically speaking, the extension of "spam.tar.gz" isn't ".gz", it is ".tar.gz".? So it would be nice to have some properties to make it easier to deal with the "complete" extension like this.? There is a "suffixes" property, but it returns a list, which you then have to recombine manually.? And as far as I can tell there is no method to return the name without any extension. And there is no method for replacing all the extensions at once. So although the names are tentative, perhaps there could be a "fullsuffix" property to return the extensions as a single string, a "nosuffix" extension to return the path without any extensions, and a "with_suffixes" method that replaces all the suffix and can accept multiple arguments (which would then be joined to create the extensions). Second, for methods like "rename" and "replace", it would be nice if there was an "exist_ok" argument that defaults to "True" to allow for safe renaming. Third, it would be nice if there was a "uid" and "gid" method for getting the numeric user and group IDs for a file, or alternatively a "numeric" argument for the "owner" and "group" methods. Fourth, for the "is_*" methods, it would be nice if there was a "strict" argument that would raise an exception if the file or directory doesn't exist. Finally, although not problem with the module per se, the example for the "parts" property should probably show at least one file with an extension, to make it clear how it deals with extensions (since the documentation is ambiguous in this regard). Thanks for your time. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Tue Jan 24 16:27:59 2017 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 25 Jan 2017 08:27:59 +1100 Subject: [Python-ideas] pathlib suggestions In-Reply-To: References: Message-ID: On Wed, Jan 25, 2017 at 7:30 AM, Todd wrote: > First, for me, extensions are primarily useful as a single unit. So, > practically speaking, the extension of "spam.tar.gz" isn't ".gz", it is > ".tar.gz". So it would be nice to have some properties to make it easier to > deal with the "complete" extension like this. There is a "suffixes" > property, but it returns a list, which you then have to recombine manually. > And as far as I can tell there is no method to return the name without any > extension. And there is no method for replacing all the extensions at once. > > So although the names are tentative, perhaps there could be a "fullsuffix" > property to return the extensions as a single string, a "nosuffix" extension > to return the path without any extensions, and a "with_suffixes" method that > replaces all the suffix and can accept multiple arguments (which would then > be joined to create the extensions). +0. Not all files with multiple dots in them are actually using them to mean multiple file extensions. Every day I'm working with files that use dots to separate words in a title, or have section numbers ("4.2.5 Yada Yada Yada.md" does not have a base name of "4"), etc. Since there's no perfect way to pin these down, this needs to be a completely separate feature, and it'd only really be useful for some situations. So go ahead, if there's interest, but the current one shouldn't be deprecated or anything. ChrisA From toddrjen at gmail.com Tue Jan 24 17:02:14 2017 From: toddrjen at gmail.com (Todd) Date: Tue, 24 Jan 2017 17:02:14 -0500 Subject: [Python-ideas] pathlib suggestions In-Reply-To: References: Message-ID: On Tue, Jan 24, 2017 at 4:27 PM, Chris Angelico wrote: > On Wed, Jan 25, 2017 at 7:30 AM, Todd wrote: > > First, for me, extensions are primarily useful as a single unit. So, > > practically speaking, the extension of "spam.tar.gz" isn't ".gz", it is > > ".tar.gz". So it would be nice to have some properties to make it > easier to > > deal with the "complete" extension like this. There is a "suffixes" > > property, but it returns a list, which you then have to recombine > manually. > > And as far as I can tell there is no method to return the name without > any > > extension. And there is no method for replacing all the extensions at > once. > > > > So although the names are tentative, perhaps there could be a > "fullsuffix" > > property to return the extensions as a single string, a "nosuffix" > extension > > to return the path without any extensions, and a "with_suffixes" method > that > > replaces all the suffix and can accept multiple arguments (which would > then > > be joined to create the extensions). > > +0. Not all files with multiple dots in them are actually using them > to mean multiple file extensions. Every day I'm working with files > that use dots to separate words in a title, or have section numbers > ("4.2.5 Yada Yada Yada.md" does not have a base name of "4"), etc. > Since there's no perfect way to pin these down, this needs to be a > completely separate feature, and it'd only really be useful for some > situations. So go ahead, if there's interest, but the current one > shouldn't be deprecated or anything. > > ChrisA > Of course the current ones shouldn't be deprecated, I never suggested they should be. The whole point of using new method and property names was to avoid any conflict with the existing methods. And yes, it won't work in all situations. Which method or property you would use depends on your specific needs. -------------- next part -------------- An HTML attachment was scrubbed... URL: From vamsi_ism at outlook.com Tue Jan 24 23:48:50 2017 From: vamsi_ism at outlook.com (Vamsi Krishna Avula) Date: Wed, 25 Jan 2017 04:48:50 +0000 Subject: [Python-ideas] pathlib suggestions In-Reply-To: References: , Message-ID: (I have a small question, I hope it's not off-topic for this thread.) What was the rationale behind an explicit `iterdir` method? Why not simply make the `Path` objects iterable? ________________________________________ From: Python-ideas on behalf of Todd Sent: Wednesday, January 25, 2017 3:32:14 AM To: python-ideas Subject: Re: [Python-ideas] pathlib suggestions On Tue, Jan 24, 2017 at 4:27 PM, Chris Angelico > wrote: On Wed, Jan 25, 2017 at 7:30 AM, Todd > wrote: > First, for me, extensions are primarily useful as a single unit. So, > practically speaking, the extension of "spam.tar.gz" isn't ".gz", it is > ".tar.gz". So it would be nice to have some properties to make it easier to > deal with the "complete" extension like this. There is a "suffixes" > property, but it returns a list, which you then have to recombine manually. > And as far as I can tell there is no method to return the name without any > extension. And there is no method for replacing all the extensions at once. > > So although the names are tentative, perhaps there could be a "fullsuffix" > property to return the extensions as a single string, a "nosuffix" extension > to return the path without any extensions, and a "with_suffixes" method that > replaces all the suffix and can accept multiple arguments (which would then > be joined to create the extensions). +0. Not all files with multiple dots in them are actually using them to mean multiple file extensions. Every day I'm working with files that use dots to separate words in a title, or have section numbers ("4.2.5 Yada Yada Yada.md" does not have a base name of "4"), etc. Since there's no perfect way to pin these down, this needs to be a completely separate feature, and it'd only really be useful for some situations. So go ahead, if there's interest, but the current one shouldn't be deprecated or anything. ChrisA Of course the current ones shouldn't be deprecated, I never suggested they should be. The whole point of using new method and property names was to avoid any conflict with the existing methods. And yes, it won't work in all situations. Which method or property you would use depends on your specific needs. From turnbull.stephen.fw at u.tsukuba.ac.jp Wed Jan 25 00:25:03 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Wed, 25 Jan 2017 14:25:03 +0900 Subject: [Python-ideas] pathlib suggestions In-Reply-To: References: Message-ID: <22664.13999.900053.3166@turnbull.sk.tsukuba.ac.jp> I'm just going to let fly with the +1s and -1s, don't take them too seriously, they're basically impressionistic (I'm not a huge user of pathlib yet). Todd writes: > So although the names are tentative, perhaps there could be a "fullsuffix" > property to return the extensions as a single string, -0 '.'.join(p.suffixes) vs. p.fullsuffix? TOOWTDI says no. I also don't really see the use case. > a "nosuffix" extension to return the path without any extensions, +1 (subject to name bikeshedding) .suffixes itself is kinda useless without this, and you shouldn't have to roll your own Do you propose to return a Path or a str here? > and a "with_suffixes" method that replaces all the suffix and can > accept multiple arguments (which would then be joined to create the > extensions). Do you propose to return a Path or a str here? +1 for a Path, +0 for a str. > Second, for methods like "rename" and "replace", it would be nice if there > was an "exist_ok" argument that defaults to "True" to allow for safe > renaming. -1 I don't see how this is an improvement. If it would raise if exist_ok == False, then try: p.rename(another_p, exist_ok=False) except ExistNotOKError: take_evasive_action(p) doesn't seem like a big improvement over if p.exists(): take_evasive_action(p) else: p.rename(another_p) And if it doesn't raise, then the action just silently fails? Name bikeshedding: IIRC, if an argument is essentially always going to be one of a small number of literals, Guido strongly prefers a new method (eg, rename_safely). I will admit that the current API seems strange to me: on Unix, .rename and .replace are apparently the same, and both unsafe? I would prefer .rename Unix semantics (deprecated) .rename_safely replacement for .rename, raises if exists .replace silently replace Names to be bikeshedded per usual. > Third, it would be nice if there was a "uid" and "gid" method for getting > the numeric user and group IDs for a file, +1 > or alternatively a "numeric" argument for the "owner" and "group" > methods. -1 (see "Guido prefers" above) > Fourth, for the "is_*" methods, it would be nice if there was a "strict" > argument that would raise an exception if the file or directory doesn't > exist. -1 That seems weird in a library intended for the syntactic manipulation of uninterpreted paths (even though this is a semantic operation). TOOWTDI and EIBTI, as well. For backward compatibility, strict would have to default to False. > the example for the "parts" property should probably show at least > one file with an extension, +1 Steve From edk141 at gmail.com Wed Jan 25 07:09:36 2017 From: edk141 at gmail.com (Ed Kellett) Date: Wed, 25 Jan 2017 12:09:36 +0000 Subject: [Python-ideas] pathlib suggestions In-Reply-To: <22664.13999.900053.3166@turnbull.sk.tsukuba.ac.jp> References: <22664.13999.900053.3166@turnbull.sk.tsukuba.ac.jp> Message-ID: On Wed, 25 Jan 2017 at 05:26 Stephen J. Turnbull < turnbull.stephen.fw at u.tsukuba.ac.jp> wrote: > -1 I don't see how this is an improvement. If it would raise if > exist_ok == False, then > > try: > p.rename(another_p, exist_ok=False) > except ExistNotOKError: > take_evasive_action(p) > > doesn't seem like a big improvement over > > if p.exists(): > take_evasive_action(p) > else: > p.rename(another_p) > > And if it doesn't raise, then the action just silently fails? > The latter should be if another_p.exists(), and it can race with the creation of another_p?this is a textbook motivating example for EAFP. -------------- next part -------------- An HTML attachment was scrubbed... URL: From toddrjen at gmail.com Wed Jan 25 10:04:08 2017 From: toddrjen at gmail.com (Todd) Date: Wed, 25 Jan 2017 10:04:08 -0500 Subject: [Python-ideas] pathlib suggestions In-Reply-To: <22664.13999.900053.3166@turnbull.sk.tsukuba.ac.jp> References: <22664.13999.900053.3166@turnbull.sk.tsukuba.ac.jp> Message-ID: On Wed, Jan 25, 2017 at 12:25 AM, Stephen J. Turnbull < turnbull.stephen.fw at u.tsukuba.ac.jp> wrote: > I'm just going to let fly with the +1s and -1s, don't take them too > seriously, they're basically impressionistic (I'm not a huge user of > pathlib yet). > > Todd writes: > > > So although the names are tentative, perhaps there could be a > "fullsuffix" > > property to return the extensions as a single string, > > -0 '.'.join(p.suffixes) vs. p.fullsuffix? TOOWTDI says no. I > also don't really see the use case. > > The whole point of pathlib is to provide convenience functions for common path-related operations. It is full of methods and properties that could be implemented other ways. Dealing with multi-part extensions, at least for me, is extremely common. A ".tar.gz" file is not the same as a ".tar.bz2" or a ".svg.gz". When I want to find a ".tar.gz" file, having to deal with the ".tar" and ".gz" parts separately is nothing but a nuisance. If I want to find and extract ".rar" files, I don't want ".part1.rar" files, ".part2.rar" files, and so on. So for me dealing with the extension as a single unit, rather than individual parts, is the most common approach. > > a "nosuffix" extension to return the path without any extensions, > > +1 (subject to name bikeshedding) .suffixes itself is kinda > useless without this, and you shouldn't have to roll your own > > Do you propose to return a Path or a str here? > I intend it to behave as much as possible like the existing "stem" property, so a string. > > > and a "with_suffixes" method that replaces all the suffix and can > > accept multiple arguments (which would then be joined to create the > > extensions). > > Do you propose to return a Path or a str here? +1 for a Path, +0 for > a str. > It is intended to behave as much as possible like the existing "with_suffix" method, so a Path. > > Second, for methods like "rename" and "replace", it would be nice if > there > > was an "exist_ok" argument that defaults to "True" to allow for safe > > renaming. > > -1 I don't see how this is an improvement. If it would raise if > exist_ok == False, then > > try: > p.rename(another_p, exist_ok=False) > except ExistNotOKError: > take_evasive_action(p) > > doesn't seem like a big improvement over > > if p.exists(): > take_evasive_action(p) > else: > p.rename(another_p) > > And if it doesn't raise, then the action just silently fails? > > As Ed said, this can lead to race conditions. Something could happen after you check "exists". Also, the "mkdir" method already has an "exist_ok" argument, and the "open" function has the "x" flag to raise an exception if the file already exists. It seems like a major omission to me that there are safe ways to make files and safe ways to make directories, but no safe way to move files or directories. > Name bikeshedding: IIRC, if an argument is essentially always going to > be one of a small number of literals, Guido strongly prefers a new > method (eg, rename_safely). > > File and directory handling is already full of flags like this. This argument was taken verbatim from the existing "mkdir" method for consistency. > > > Fourth, for the "is_*" methods, it would be nice if there was a "strict" > > argument that would raise an exception if the file or directory doesn't > > exist. > > -1 That seems weird in a library intended for the syntactic > manipulation of uninterpreted paths (even though this is a semantic > operation). TOOWTDI and EIBTI, as well. For backward compatibility, > strict would have to default to False. > > First, these methods only exist for "concrete" paths, which are explicitly intended for use in I/O operations. Second, as before, this argument is taken from another method. In this case, the "resolve" method has a "strict" argument. Any other approach suffers from the same race conditions as "rename" and "replace", and again it seems weird that resolving a path can be done safely but testing it can't be. And yes, the argument would have to default to "False". All of my suggestions are intended to be completely backwards-compatible. I don't see that as a problem, though. -------------- next part -------------- An HTML attachment was scrubbed... URL: From encukou at gmail.com Wed Jan 25 10:18:49 2017 From: encukou at gmail.com (Petr Viktorin) Date: Wed, 25 Jan 2017 16:18:49 +0100 Subject: [Python-ideas] pathlib suggestions In-Reply-To: References: <22664.13999.900053.3166@turnbull.sk.tsukuba.ac.jp> Message-ID: <983df9db-f05e-395c-797b-e148eaa84c48@gmail.com> On 01/25/2017 04:04 PM, Todd wrote: > On Wed, Jan 25, 2017 at 12:25 AM, Stephen J. Turnbull > > wrote: > > I'm just going to let fly with the +1s and -1s, don't take them too > seriously, they're basically impressionistic (I'm not a huge user of > pathlib yet). > > Todd writes: > > > So although the names are tentative, perhaps there could be a > "fullsuffix" > > property to return the extensions as a single string, > > -0 '.'.join(p.suffixes) vs. p.fullsuffix? TOOWTDI says no. I > also don't really see the use case. > > > The whole point of pathlib is to provide convenience functions for > common path-related operations. It is full of methods and properties > that could be implemented other ways. > > Dealing with multi-part extensions, at least for me, is extremely > common. A ".tar.gz" file is not the same as a ".tar.bz2" or a > ".svg.gz". When I want to find a ".tar.gz" file, having to deal with > the ".tar" and ".gz" parts separately is nothing but a nuisance. If I > want to find and extract ".rar" files, I don't want ".part1.rar" files, > ".part2.rar" files, and so on. So for me dealing with the extension as > a single unit, rather than individual parts, is the most common approach. But what if the .tar.gz file is called "spam-4.2.5-final.tar.gz"? Existing tools like glob and endswith() can deal with the ".tar.gz" extension reliably, but "fullsuffix" would, arguably, not give the answers you want. Perhaps more specialized tools would be useful, though, for example: repacked_path = original_path.replace_suffix(".tar.gz", ".zip") From toddrjen at gmail.com Wed Jan 25 10:33:47 2017 From: toddrjen at gmail.com (Todd) Date: Wed, 25 Jan 2017 10:33:47 -0500 Subject: [Python-ideas] pathlib suggestions In-Reply-To: <983df9db-f05e-395c-797b-e148eaa84c48@gmail.com> References: <22664.13999.900053.3166@turnbull.sk.tsukuba.ac.jp> <983df9db-f05e-395c-797b-e148eaa84c48@gmail.com> Message-ID: On Wed, Jan 25, 2017 at 10:18 AM, Petr Viktorin wrote: > On 01/25/2017 04:04 PM, Todd wrote: > >> On Wed, Jan 25, 2017 at 12:25 AM, Stephen J. Turnbull >> > > wrote: >> >> I'm just going to let fly with the +1s and -1s, don't take them too >> seriously, they're basically impressionistic (I'm not a huge user of >> pathlib yet). >> >> Todd writes: >> >> > So although the names are tentative, perhaps there could be a >> "fullsuffix" >> > property to return the extensions as a single string, >> >> -0 '.'.join(p.suffixes) vs. p.fullsuffix? TOOWTDI says no. I >> also don't really see the use case. >> >> >> The whole point of pathlib is to provide convenience functions for >> common path-related operations. It is full of methods and properties >> that could be implemented other ways. >> >> Dealing with multi-part extensions, at least for me, is extremely >> common. A ".tar.gz" file is not the same as a ".tar.bz2" or a >> ".svg.gz". When I want to find a ".tar.gz" file, having to deal with >> the ".tar" and ".gz" parts separately is nothing but a nuisance. If I >> want to find and extract ".rar" files, I don't want ".part1.rar" files, >> ".part2.rar" files, and so on. So for me dealing with the extension as >> a single unit, rather than individual parts, is the most common approach. >> > > But what if the .tar.gz file is called "spam-4.2.5-final.tar.gz"? > Existing tools like glob and endswith() can deal with the ".tar.gz" > extension reliably, but "fullsuffix" would, arguably, not give the answers > you want. > I wouldn't use it in that situation. The existing "suffix" and "stem" properties also only work reliably under certain situations. > > Perhaps more specialized tools would be useful, though, for example: > repacked_path = original_path.replace_suffix(".tar.gz", ".zip") > > That is helpful if I want to rename, not if I want to (for example) uncompress a file. -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas at kluyver.me.uk Wed Jan 25 10:45:31 2017 From: thomas at kluyver.me.uk (Thomas Kluyver) Date: Wed, 25 Jan 2017 15:45:31 +0000 Subject: [Python-ideas] pathlib suggestions In-Reply-To: References: <22664.13999.900053.3166@turnbull.sk.tsukuba.ac.jp> <983df9db-f05e-395c-797b-e148eaa84c48@gmail.com> Message-ID: <1485359131.2615445.859236800.7323E9E9@webmail.messagingengine.com> On Wed, Jan 25, 2017, at 03:33 PM, Todd wrote: > On Wed, Jan 25, 2017 at 10:18 AM, Petr Viktorin > wrote: >> But what if the .tar.gz file is called "spam-4.2.5-final.tar.gz"? >> Existing tools like glob and endswith() can deal with the ".tar.gz" >> extension reliably, but "fullsuffix" would, arguably, not give the >> answers you want. > > > I wouldn't use it in that situation. You might not, but it seems like an attractive nuisance. You can't reliably use it as a test for .tar.gz files, but it would be easy to think that you can and write buggy code using it. And I can't currently think of a general example where it would be useful. I thought about suggesting a 'hassuffix' method, but it doesn't pass the 'one way to do it' test when you can do: p.name.endswith('.tar.gz') -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephanh42 at gmail.com Wed Jan 25 10:45:53 2017 From: stephanh42 at gmail.com (Stephan Houben) Date: Wed, 25 Jan 2017 16:45:53 +0100 Subject: [Python-ideas] pathlib suggestions In-Reply-To: References: <22664.13999.900053.3166@turnbull.sk.tsukuba.ac.jp> <983df9db-f05e-395c-797b-e148eaa84c48@gmail.com> Message-ID: Hi all, It seems to me that the correct algorithm to get the "full suffix" is not to take everything after the FIRST dot, but rather to: 1. Recognize that the last suffix is one of the UNIX-style compression tools .Z, .gz, ,bz2, .xz, .lzma (at least) 2. Then add the next-to-last suffix. So we can then determine that the suffix of order.for.tar.ps.gz is .ps.gz and the basename is order.for.tar . However, I am not sure if we want to hard-code a list of such suffixes in the standard library. (Even though it could be user-extensible.) Stephan 2017-01-25 16:33 GMT+01:00 Todd : > On Wed, Jan 25, 2017 at 10:18 AM, Petr Viktorin wrote: > >> On 01/25/2017 04:04 PM, Todd wrote: >> >>> On Wed, Jan 25, 2017 at 12:25 AM, Stephen J. Turnbull >>> >> > wrote: >>> >>> I'm just going to let fly with the +1s and -1s, don't take them too >>> seriously, they're basically impressionistic (I'm not a huge user of >>> pathlib yet). >>> >>> Todd writes: >>> >>> > So although the names are tentative, perhaps there could be a >>> "fullsuffix" >>> > property to return the extensions as a single string, >>> >>> -0 '.'.join(p.suffixes) vs. p.fullsuffix? TOOWTDI says no. I >>> also don't really see the use case. >>> >>> >>> The whole point of pathlib is to provide convenience functions for >>> common path-related operations. It is full of methods and properties >>> that could be implemented other ways. >>> >>> Dealing with multi-part extensions, at least for me, is extremely >>> common. A ".tar.gz" file is not the same as a ".tar.bz2" or a >>> ".svg.gz". When I want to find a ".tar.gz" file, having to deal with >>> the ".tar" and ".gz" parts separately is nothing but a nuisance. If I >>> want to find and extract ".rar" files, I don't want ".part1.rar" files, >>> ".part2.rar" files, and so on. So for me dealing with the extension as >>> a single unit, rather than individual parts, is the most common >>> approach. >>> >> >> But what if the .tar.gz file is called "spam-4.2.5-final.tar.gz"? >> Existing tools like glob and endswith() can deal with the ".tar.gz" >> extension reliably, but "fullsuffix" would, arguably, not give the answers >> you want. >> > > > I wouldn't use it in that situation. The existing "suffix" and "stem" > properties also only work reliably under certain situations. > > >> >> Perhaps more specialized tools would be useful, though, for example: >> repacked_path = original_path.replace_suffix(".tar.gz", ".zip") >> >> > That is helpful if I want to rename, not if I want to (for example) > uncompress a file. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From toddrjen at gmail.com Wed Jan 25 10:54:58 2017 From: toddrjen at gmail.com (Todd) Date: Wed, 25 Jan 2017 10:54:58 -0500 Subject: [Python-ideas] pathlib suggestions In-Reply-To: References: <22664.13999.900053.3166@turnbull.sk.tsukuba.ac.jp> <983df9db-f05e-395c-797b-e148eaa84c48@gmail.com> Message-ID: On Wed, Jan 25, 2017 at 10:45 AM, Stephan Houben wrote: > Hi all, > > It seems to me that the correct algorithm to get the "full suffix" is not > to take everything after the FIRST dot, > but rather to: > 1. Recognize that the last suffix is one of the UNIX-style compression > tools .Z, .gz, ,bz2, .xz, .lzma (at least) > 2. Then add the next-to-last suffix. > > So we can then determine that the suffix of > order.for.tar.ps.gz > is .ps.gz and the basename is order.for.tar . > > However, I am not sure if we want to hard-code a list of such suffixes in > the standard library. > (Even though it could be user-extensible.) > > Stephan > Those are just examples that I encounter a lot, there can be other cases where multiple extensions are used. > > 2017-01-25 16:33 GMT+01:00 Todd : > >> On Wed, Jan 25, 2017 at 10:18 AM, Petr Viktorin >> wrote: >> >>> On 01/25/2017 04:04 PM, Todd wrote: >>> >>>> On Wed, Jan 25, 2017 at 12:25 AM, Stephen J. Turnbull >>>> >>> > wrote: >>>> >>>> I'm just going to let fly with the +1s and -1s, don't take them too >>>> seriously, they're basically impressionistic (I'm not a huge user of >>>> pathlib yet). >>>> >>>> Todd writes: >>>> >>>> > So although the names are tentative, perhaps there could be a >>>> "fullsuffix" >>>> > property to return the extensions as a single string, >>>> >>>> -0 '.'.join(p.suffixes) vs. p.fullsuffix? TOOWTDI says no. I >>>> also don't really see the use case. >>>> >>>> >>>> The whole point of pathlib is to provide convenience functions for >>>> common path-related operations. It is full of methods and properties >>>> that could be implemented other ways. >>>> >>>> Dealing with multi-part extensions, at least for me, is extremely >>>> common. A ".tar.gz" file is not the same as a ".tar.bz2" or a >>>> ".svg.gz". When I want to find a ".tar.gz" file, having to deal with >>>> the ".tar" and ".gz" parts separately is nothing but a nuisance. If I >>>> want to find and extract ".rar" files, I don't want ".part1.rar" files, >>>> ".part2.rar" files, and so on. So for me dealing with the extension as >>>> a single unit, rather than individual parts, is the most common >>>> approach. >>>> >>> >>> But what if the .tar.gz file is called "spam-4.2.5-final.tar.gz"? >>> Existing tools like glob and endswith() can deal with the ".tar.gz" >>> extension reliably, but "fullsuffix" would, arguably, not give the answers >>> you want. >>> >> >> >> I wouldn't use it in that situation. The existing "suffix" and "stem" >> properties also only work reliably under certain situations. >> >> >>> >>> Perhaps more specialized tools would be useful, though, for example: >>> repacked_path = original_path.replace_suffix(".tar.gz", ".zip") >>> >>> >> That is helpful if I want to rename, not if I want to (for example) >> uncompress a file. >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From toddrjen at gmail.com Wed Jan 25 10:58:29 2017 From: toddrjen at gmail.com (Todd) Date: Wed, 25 Jan 2017 10:58:29 -0500 Subject: [Python-ideas] pathlib suggestions In-Reply-To: <1485359131.2615445.859236800.7323E9E9@webmail.messagingengine.com> References: <22664.13999.900053.3166@turnbull.sk.tsukuba.ac.jp> <983df9db-f05e-395c-797b-e148eaa84c48@gmail.com> <1485359131.2615445.859236800.7323E9E9@webmail.messagingengine.com> Message-ID: On Wed, Jan 25, 2017 at 10:45 AM, Thomas Kluyver wrote: > On Wed, Jan 25, 2017, at 03:33 PM, Todd wrote: > > On Wed, Jan 25, 2017 at 10:18 AM, Petr Viktorin wrote: > > But what if the .tar.gz file is called "spam-4.2.5-final.tar.gz"? > Existing tools like glob and endswith() can deal with the ".tar.gz" > extension reliably, but "fullsuffix" would, arguably, not give the answers > you want. > > > > I wouldn't use it in that situation. > > > You might not, but it seems like an attractive nuisance. You can't > reliably use it as a test for .tar.gz files, but it would be easy to think > that you can and write buggy code using it. And I can't currently think of > a general example where it would be useful. > >From my perspective at least, those arguments apply just as well to the existing "suffix" and "stem" properties. > > I thought about suggesting a 'hassuffix' method, but it doesn't pass the > 'one way to do it' test when you can do: > > p.name.endswith('.tar.gz') > Then why is there a "match" method? It doesn't seem like the "one way to do it test" is being used for pathlib, nor do I think it really applies for a module whose whole point is to provide convenience tools. -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas at kluyver.me.uk Wed Jan 25 11:04:00 2017 From: thomas at kluyver.me.uk (Thomas Kluyver) Date: Wed, 25 Jan 2017 16:04:00 +0000 Subject: [Python-ideas] pathlib suggestions In-Reply-To: References: <22664.13999.900053.3166@turnbull.sk.tsukuba.ac.jp> <983df9db-f05e-395c-797b-e148eaa84c48@gmail.com> Message-ID: <1485360240.2621183.859261064.4B85D6A6@webmail.messagingengine.com> On Wed, Jan 25, 2017, at 03:54 PM, Todd wrote: > Those [.tar.foo] are just examples that I encounter a lot, there can > be other cases where multiple extensions are used. The real issue is that there's no definition of what an extension is. You can have dots anywhere in a filename, and it's not at all unusual for them to be used before the bit we recognise as the extension. Almost every package on PyPI has files named like 'pip-9.0.1.tar.gz', but '.0.1.tar.gz' clearly doesn't make any sense as an extension. Without a good definition of what the 'full extension' is, we can't have code to find it. Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Wed Jan 25 11:10:21 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 25 Jan 2017 16:10:21 +0000 Subject: [Python-ideas] pathlib suggestions In-Reply-To: <1485360240.2621183.859261064.4B85D6A6@webmail.messagingengine.com> References: <22664.13999.900053.3166@turnbull.sk.tsukuba.ac.jp> <983df9db-f05e-395c-797b-e148eaa84c48@gmail.com> <1485360240.2621183.859261064.4B85D6A6@webmail.messagingengine.com> Message-ID: On 25 January 2017 at 16:04, Thomas Kluyver wrote: > On Wed, Jan 25, 2017, at 03:54 PM, Todd wrote: > > Those [.tar.foo] are just examples that I encounter a lot, there can be > other cases where multiple extensions are used. > > > The real issue is that there's no definition of what an extension is. You > can have dots anywhere in a filename, and it's not at all unusual for them > to be used before the bit we recognise as the extension. Almost every > package on PyPI has files named like 'pip-9.0.1.tar.gz', but '.0.1.tar.gz' > clearly doesn't make any sense as an extension. Without a good definition of > what the 'full extension' is, we can't have code to find it. More precisely, we *can* have code to find it, but it's of necessity application-specific, and so not a good fit for a general library like the stdlib. One of the design principles for code in the stdlib is "does it solve a sufficiently general problem?" In this case, there's a general problem, which is "give me back what I think of as the suffix in this case" - but the proposed method doesn't solve that problem (because of the cases already quoted). Conversely, the problem which the proposed solution *does* solve ("give me the part of the filename after the first dot") isn't general enough to warrant going into the stdlib, because it's too often not what people actually want. Paul From thomas at kluyver.me.uk Wed Jan 25 11:13:17 2017 From: thomas at kluyver.me.uk (Thomas Kluyver) Date: Wed, 25 Jan 2017 16:13:17 +0000 Subject: [Python-ideas] pathlib suggestions In-Reply-To: References: <22664.13999.900053.3166@turnbull.sk.tsukuba.ac.jp> <983df9db-f05e-395c-797b-e148eaa84c48@gmail.com> <1485359131.2615445.859236800.7323E9E9@webmail.messagingengine.com> Message-ID: <1485360797.2623340.859271624.3186830D@webmail.messagingengine.com> On Wed, Jan 25, 2017, at 03:58 PM, Todd wrote: > On Wed, Jan 25, 2017 at 10:45 AM, Thomas Kluyver > wrote: >> __You might not, but it seems like an attractive nuisance. You can't >> reliably use it as a test for .tar.gz files, but it would be easy to >> think that you can and write buggy code using it. And I can't >> currently think of a general example where it would be useful. > > From my perspective at least, those arguments apply just as well to > the existing "suffix" and "stem" properties. To some extent it does. But the convention of looking at a single extension is common enough that there's a stronger case for providing easy access to that. > I thought about suggesting a 'hassuffix' method, but it doesn't pass > the 'one way to do it' test when you can do: >> >> p.name.endswith('.tar.gz') > Then why is there a "match" method? It doesn't seem like the "one > way to do it test" is being used for pathlib, nor do I think it > really applies for a module whose whole point is to provide > convenience tools. Everything is trade-offs: if you can justify why a new thing is useful enough, that can override the 'one way to do it' consideration. That's why we now have four kinds of string formatting. But I don't think 'X got away with it so we should allow Y too' is a compelling argument. -------------- next part -------------- An HTML attachment was scrubbed... URL: From toddrjen at gmail.com Wed Jan 25 11:15:40 2017 From: toddrjen at gmail.com (Todd) Date: Wed, 25 Jan 2017 11:15:40 -0500 Subject: [Python-ideas] pathlib suggestions In-Reply-To: <1485360240.2621183.859261064.4B85D6A6@webmail.messagingengine.com> References: <22664.13999.900053.3166@turnbull.sk.tsukuba.ac.jp> <983df9db-f05e-395c-797b-e148eaa84c48@gmail.com> <1485360240.2621183.859261064.4B85D6A6@webmail.messagingengine.com> Message-ID: On Wed, Jan 25, 2017 at 11:04 AM, Thomas Kluyver wrote: > On Wed, Jan 25, 2017, at 03:54 PM, Todd wrote: > > Those [.tar.foo] are just examples that I encounter a lot, there can be > other cases where multiple extensions are used. > > > The real issue is that there's no definition of what an extension is. You > can have dots anywhere in a filename, and it's not at all unusual for them > to be used before the bit we recognise as the extension. Almost every > package on PyPI has files named like 'pip-9.0.1.tar.gz', but '.0.1.tar.gz' > clearly doesn't make any sense as an extension. Without a good definition > of what the 'full extension' is, we can't have code to find it. > > Thomas > > Right, that is why we would have three properties 1. suffix: gets the part after the last period as a string, including the period (already exists), so "spam.tar.gz" -> ".gz" 2. fullsuffix: gets the part after the first period as a string, including the period (this is what I am proposing), so "spam.tar.gz" -> ".gz" 3. suffixes: gets the part after the first period as a list of strings split on the leading period, each including the leading period (already exists), so "spam.tar.gz" -> [".tar", ".gz"] "suffix" is only useful if you are sure only the part after the last period is useful, "fullsuffix" is only useful if you are sure the entire part after first period is useful, and "suffixes" is needed in more complicated situations. This is similar in principle to having "str.split", "str.rsplit", "str.partition", and "str.rpartition". pathlib currently has the equivalent of "str.split" (suffixes) and "str.rpartition" (suffix), but lacks the equivalent of "str.partition" (fullsuffix). -------------- next part -------------- An HTML attachment was scrubbed... URL: From encukou at gmail.com Wed Jan 25 11:16:25 2017 From: encukou at gmail.com (Petr Viktorin) Date: Wed, 25 Jan 2017 17:16:25 +0100 Subject: [Python-ideas] pathlib suggestions In-Reply-To: References: <22664.13999.900053.3166@turnbull.sk.tsukuba.ac.jp> <983df9db-f05e-395c-797b-e148eaa84c48@gmail.com> Message-ID: On 01/25/2017 04:33 PM, Todd wrote: > On Wed, Jan 25, 2017 at 10:18 AM, Petr Viktorin > wrote: > > On 01/25/2017 04:04 PM, Todd wrote: > > On Wed, Jan 25, 2017 at 12:25 AM, Stephen J. Turnbull > > >> wrote: > > I'm just going to let fly with the +1s and -1s, don't take > them too > seriously, they're basically impressionistic (I'm not a huge > user of > pathlib yet). > > Todd writes: > > > So although the names are tentative, perhaps there could be a > "fullsuffix" > > property to return the extensions as a single string, > > -0 '.'.join(p.suffixes) vs. p.fullsuffix? TOOWTDI says > no. I > also don't really see the use case. > > > The whole point of pathlib is to provide convenience functions for > common path-related operations. It is full of methods and > properties > that could be implemented other ways. > > Dealing with multi-part extensions, at least for me, is extremely > common. A ".tar.gz" file is not the same as a ".tar.bz2" or a > ".svg.gz". When I want to find a ".tar.gz" file, having to deal > with > the ".tar" and ".gz" parts separately is nothing but a > nuisance. If I > want to find and extract ".rar" files, I don't want ".part1.rar" > files, > ".part2.rar" files, and so on. So for me dealing with the > extension as > a single unit, rather than individual parts, is the most common > approach. > > > But what if the .tar.gz file is called "spam-4.2.5-final.tar.gz"? > Existing tools like glob and endswith() can deal with the ".tar.gz" > extension reliably, but "fullsuffix" would, arguably, not give the > answers you want. > > > > I wouldn't use it in that situation. The existing "suffix" and "stem" > properties also only work reliably under certain situations. Which situations do you mean? It works quite fine with multiple suffixes: The suffix of "pip-9.0.1.tar.gz" is ".gz", and sure enough, you can reasonably expect it's a gz-compressed file. If you uncompress it and strip the extension, you'll end up with a "pip-9.0.1.tar", where the suffix is ".tar" -- and humans would be surprised if it wasn't a tar archive. The function can't determine what a particular human would think of as the full (or "real") suffix in a particular situation -- but I wouldn't call it unreliable. > Perhaps more specialized tools would be useful, though, for example: > repacked_path = original_path.replace_suffix(".tar.gz", ".zip") > > > That is helpful if I want to rename, not if I want to (for example) > uncompress a file. Something like this? uncompressed = original_path.replace_suffix(".tar.gz", "") From steve.dower at python.org Wed Jan 25 13:25:40 2017 From: steve.dower at python.org (Steve Dower) Date: Wed, 25 Jan 2017 10:25:40 -0800 Subject: [Python-ideas] pathlib suggestions In-Reply-To: References: <22664.13999.900053.3166@turnbull.sk.tsukuba.ac.jp> <983df9db-f05e-395c-797b-e148eaa84c48@gmail.com> Message-ID: <03bb962f-2f29-2f3a-58a1-b9eb713a3b53@python.org> On 25Jan2017 0816, Petr Viktorin wrote: > On 01/25/2017 04:33 PM, Todd wrote: >> But what if the .tar.gz file is called "spam-4.2.5-final.tar.gz"? >> Existing tools like glob and endswith() can deal with the ".tar.gz" >> extension reliably, but "fullsuffix" would, arguably, not give the >> answers you want. >> >> I wouldn't use it in that situation. The existing "suffix" and "stem" >> properties also only work reliably under certain situations. > > Which situations do you mean? It works quite fine with multiple suffixes: > The suffix of "pip-9.0.1.tar.gz" is ".gz", and sure enough, you can > reasonably expect it's a gz-compressed file. If you uncompress it and > strip the extension, you'll end up with a "pip-9.0.1.tar", where the > suffix is ".tar" -- and humans would be surprised if it wasn't a tar > archive. > It may be handy if suffixes was a reversed tuple of suffixes (or possibly a cumulative tuple): >>> Path('pip-9.0.1.tar.gz').suffixes ('.gz', '.tar', '.1', '.0') This has a nice benefit for comparisons: >>> targzs = [f for f in all_files if f.suffixes[:2] == ('.gz', '.tar')] It doesn't necessarily improve over .endswith(), but it has a slight convenience over .split() and arguably demonstrates intent more clearly. (Though my biggest issue with all of this is case-sensitivity, which probably means we need to add comparison functions to Path flavours in order to do this stuff properly.) The "cumulative tuple" version would be like this: >>> Path('pip-9.0.1.tar.gz').suffixes ('.gz', '.tar.gz', '.1.tar.gz', '.0.1.tar.gz') This doesn't compare as nicely, since now we would use f.suffixes[1] which will raise if there is only one suffix (likely). But it does return a value which cannot be easily recreated using other functions. Cheers, Steve From toddrjen at gmail.com Wed Jan 25 13:53:33 2017 From: toddrjen at gmail.com (Todd) Date: Wed, 25 Jan 2017 13:53:33 -0500 Subject: [Python-ideas] pathlib suggestions In-Reply-To: References: <22664.13999.900053.3166@turnbull.sk.tsukuba.ac.jp> <983df9db-f05e-395c-797b-e148eaa84c48@gmail.com> Message-ID: On Wed, Jan 25, 2017 at 11:16 AM, Petr Viktorin wrote: > On 01/25/2017 04:33 PM, Todd wrote: > >> On Wed, Jan 25, 2017 at 10:18 AM, Petr Viktorin > > wrote: >> >> On 01/25/2017 04:04 PM, Todd wrote: >> >> On Wed, Jan 25, 2017 at 12:25 AM, Stephen J. Turnbull >> > >> > >> >> wrote: >> >> I'm just going to let fly with the +1s and -1s, don't take >> them too >> seriously, they're basically impressionistic (I'm not a huge >> user of >> pathlib yet). >> >> Todd writes: >> >> > So although the names are tentative, perhaps there could >> be a >> "fullsuffix" >> > property to return the extensions as a single string, >> >> -0 '.'.join(p.suffixes) vs. p.fullsuffix? TOOWTDI says >> no. I >> also don't really see the use case. >> >> >> The whole point of pathlib is to provide convenience functions for >> common path-related operations. It is full of methods and >> properties >> that could be implemented other ways. >> >> Dealing with multi-part extensions, at least for me, is extremely >> common. A ".tar.gz" file is not the same as a ".tar.bz2" or a >> ".svg.gz". When I want to find a ".tar.gz" file, having to deal >> with >> the ".tar" and ".gz" parts separately is nothing but a >> nuisance. If I >> want to find and extract ".rar" files, I don't want ".part1.rar" >> files, >> ".part2.rar" files, and so on. So for me dealing with the >> extension as >> a single unit, rather than individual parts, is the most common >> approach. >> >> >> But what if the .tar.gz file is called "spam-4.2.5-final.tar.gz"? >> Existing tools like glob and endswith() can deal with the ".tar.gz" >> extension reliably, but "fullsuffix" would, arguably, not give the >> answers you want. >> >> >> >> I wouldn't use it in that situation. The existing "suffix" and "stem" >> properties also only work reliably under certain situations. >> > > Which situations do you mean? It works quite fine with multiple suffixes: > The suffix of "pip-9.0.1.tar.gz" is ".gz", and sure enough, you can > reasonably expect it's a gz-compressed file. If you uncompress it and strip > the extension, you'll end up with a "pip-9.0.1.tar", where the suffix is > ".tar" -- and humans would be surprised if it wasn't a tar archive. > > A ".tar.gz" is not the same as a ".svg.gz". The fact that they are both gzip-compressed is an implementation detail as far as most software I deal with is concerned. My unarchiver will extract a ".tar.gz" into a directory as if it was just a ".tar", while my image viewer will view a ".svg.gz" as a vector image as if it was just a ".svg". From a user-interaction standpoint, the ".gz" part is ignored. -------------- next part -------------- An HTML attachment was scrubbed... URL: From leewangzhong+python at gmail.com Wed Jan 25 15:40:05 2017 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Wed, 25 Jan 2017 15:40:05 -0500 Subject: [Python-ideas] pathlib suggestions In-Reply-To: References: <22664.13999.900053.3166@turnbull.sk.tsukuba.ac.jp> <983df9db-f05e-395c-797b-e148eaa84c48@gmail.com> Message-ID: > A ".tar.gz" is not the same as a ".svg.gz". The fact that they are both > gzip-compressed is an implementation detail as far as most software I deal > with is concerned. My unarchiver will extract a ".tar.gz" into a directory > as if it was just a ".tar", while my image viewer will view a ".svg.gz" as a > vector image as if it was just a ".svg". From a user-interaction > standpoint, the ".gz" part is ignored. Just to be sure we're on the same page: - A .tar file is an uncompressed bundle of files. - A .gz file is a compressed version of a single file. - Technically, there's no such thing as a .tar.gz file. "x.tar.gz" means that if you unwrap it with gunzip, you'll get a file called "x.tar", which you can then unpack with tar. "x.tar.gz" is not a tar file using the gzip compression. It's a gz file which unpacks to a tar file. Conceptually, your unarchiver does it in two separate steps. Similarly, "x.svg.gz" is a gz file which unpacks to an svg file. Your viewer just knows to unzip it before use. I don't wanna appear as a naysayer, so here's an alternative suggestion: A parameter for a collection of "extension suffixes". The function will try to eat extensions from the end until it finds one NOT on the list (or it runs out). The docs can recommend `('gz', 'xz', 'bz', 'bz2', ...)`. Maybe a later Python version can use that recommendation as the default. IMO, ".part1" is not a part of the extension. You'd usually have "x.part1.rar" and "x.part2.rar" in the same folder, and it makes more sense that there are two files with base names "x.part1" and "x.part2" than to have two different files with the same base name and an extension which just keeps them ordered. From flying-sheep at web.de Thu Jan 26 05:01:02 2017 From: flying-sheep at web.de (Philipp A.) Date: Thu, 26 Jan 2017 10:01:02 +0000 Subject: [Python-ideas] pathlib suggestions In-Reply-To: References: <22664.13999.900053.3166@turnbull.sk.tsukuba.ac.jp> <983df9db-f05e-395c-797b-e148eaa84c48@gmail.com> Message-ID: How about adding a new argument to with_suffix? Path.with_suffix(suffix: str, stripped: Union[int, str, Iterable[str]]=1) stripped would either receive an int (in which case it will greedily strip up to that many suffixes), or a (optionally compound) suffix which would be stripped if present verbatim, or an iterable of suffix strings, in which case it would strip all suffixes in the iterable as many times as available. Examples: Path('flop.pkg.tar.gz').with_suffix('') ? Path('flop.pkg.tar') # current behavior Path('flop.pkg.tar.gz').with_suffix('', 2) ? Path('flop.pkg') # you have to know what you?re doing. 3 would have stripped '.pkg' too Path('flop.pkg.tar.gz').with_suffix('', '.tar.gz') ? Path('flop.pkg') Path('flop.pkg.tar.gz').with_suffix('', '.gz.tar') ? Path('flop.pkg.tar.gz') # not stripped, the suffix doesn?t appear verbatim Path('flop.pkg.tar.gz.tar').with_suffix('', ['.gz', '.tar']) ? Path('flop.pkg') # all instances stripped. probably useless. Franklin? Lee schrieb am Mi., 25. Jan. 2017 um 21:44 Uhr: > > A ".tar.gz" is not the same as a ".svg.gz". The fact that they are both > > gzip-compressed is an implementation detail as far as most software I > deal > > with is concerned. My unarchiver will extract a ".tar.gz" into a > directory > > as if it was just a ".tar", while my image viewer will view a ".svg.gz" > as a > > vector image as if it was just a ".svg". From a user-interaction > > standpoint, the ".gz" part is ignored. > > Just to be sure we're on the same page: > - A .tar file is an uncompressed bundle of files. > - A .gz file is a compressed version of a single file. > - Technically, there's no such thing as a .tar.gz file. "x.tar.gz" > means that if you unwrap it with gunzip, you'll get a file called > "x.tar", which you can then unpack with tar. > > "x.tar.gz" is not a tar file using the gzip compression. It's a gz > file which unpacks to a tar file. Conceptually, your unarchiver does > it in two separate steps. > > Similarly, "x.svg.gz" is a gz file which unpacks to an svg file. Your > viewer just knows to unzip it before use. > > I don't wanna appear as a naysayer, so here's an alternative > suggestion: A parameter for a collection of "extension suffixes". The > function will try to eat extensions from the end until it finds one > NOT on the list (or it runs out). The docs can recommend `('gz', 'xz', > 'bz', 'bz2', ...)`. Maybe a later Python version can use that > recommendation as the default. > > IMO, ".part1" is not a part of the extension. You'd usually have > "x.part1.rar" and "x.part2.rar" in the same folder, and it makes more > sense that there are two files with base names "x.part1" and "x.part2" > than to have two different files with the same base name and an > extension which just keeps them ordered. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Jan 26 11:02:55 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 26 Jan 2017 17:02:55 +0100 Subject: [Python-ideas] globals should accept parenteses for extending beyond 1 line In-Reply-To: References: <0147af7a-fcfd-3379-a1d9-f7f1ec1d1369@gmail.com> Message-ID: On 23 January 2017 at 22:29, MRAB wrote: > On 2017-01-23 20:09, Nick Timkovich wrote: >> >> Related and probably more common is the need for the line-continuation >> operator for long/multiple context managers with "with". I assume that's >> come up before, but was it also just a low priority rather than any >> technical reason? >> > It has come up before, and there is a technical reason, namely the syntactic > ambiguity when parsing. Not impossible to fix, but probably not worth the > added complexity. Right, it's the fact parentheses are already allowed there, but mean something quite different: >>> with (1, 2, 3): pass ... Traceback (most recent call last): File "", line 1, in AttributeError: __enter__ These days, I'd personally be in favour of changing the parsing of parentheses in that situation, as if we were going to add meaningful context management behaviour to tuples we would have done it by now, and having the name bindings next to their expressions is easier to read than having them all at the end: with (cm1() as a, cm2() as b, cm3() as c): ... Relative to tuples-as-context-managers, such an approach would also avoid reintroducing the old resource management problems that saw contextlib.nested removed and replaced with contextlib.ExitStack. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From victor.stinner at gmail.com Thu Jan 26 11:11:17 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 26 Jan 2017 17:11:17 +0100 Subject: [Python-ideas] Is it Python 3 yet? Message-ID: Hi, The download button of https://www.python.org/ currently gives the choice between Python 2.7 and 3.6. I read more and more articles saying that we reached a point where Python 3 became more popular than Python 2, Python 3 has now enough new features to convince developers, etc. Is it time to "hide" Python 2.7 from the default choice and only show Python 3.6 *by default*? For example, I expect a single big [DOWNLOAD] button which would start the download of Python 3.6 for my platform. If we cannot agree on hiding Python 2 by default, maybe we can at least replace the big [DOWNLOAD] button of Python 2 with a smaller button or replace it with a link to a different download page? Latest news: Django 2.0 and Pyramid 2.0 will simply drop Python 2 support. Victor From p.f.moore at gmail.com Thu Jan 26 11:21:30 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 26 Jan 2017 16:21:30 +0000 Subject: [Python-ideas] Is it Python 3 yet? In-Reply-To: References: Message-ID: On 26 January 2017 at 16:11, Victor Stinner wrote: > The download button of https://www.python.org/ currently gives the > choice between Python 2.7 and 3.6. I read more and more articles > saying that we reached a point where Python 3 became more popular than > Python 2, Python 3 has now enough new features to convince developers, > etc. > > Is it time to "hide" Python 2.7 from the default choice and only show > Python 3.6 *by default*? > > For example, I expect a single big [DOWNLOAD] button which would start > the download of Python 3.6 for my platform. +1 On a similar note, I always get caught out by the fact that the Windows default download is the 32-bit version. Are we not yet at a point where a sufficient majority of users have 64-bit machines, and 32-bit should be seen as a "specialist" choice? Paul From rainventions at gmail.com Thu Jan 26 11:23:23 2017 From: rainventions at gmail.com (Ryan Birmingham) Date: Thu, 26 Jan 2017 11:23:23 -0500 Subject: [Python-ideas] Is it Python 3 yet? In-Reply-To: References: Message-ID: It's certainly an interesting transition period. I'm not sure that the community is quite ready to just drop 2.7, but we could take a hint from angular 's solution to this issue and use small descriptions to guide more people to 3.6 rather than 2.7, then move to 2.7 being substantially smaller, then eventually to dropping 2.7. -Ryan Birmingham On 26 January 2017 at 11:11, Victor Stinner wrote: > Hi, > > The download button of https://www.python.org/ currently gives the > choice between Python 2.7 and 3.6. I read more and more articles > saying that we reached a point where Python 3 became more popular than > Python 2, Python 3 has now enough new features to convince developers, > etc. > > Is it time to "hide" Python 2.7 from the default choice and only show > Python 3.6 *by default*? > > For example, I expect a single big [DOWNLOAD] button which would start > the download of Python 3.6 for my platform. > > If we cannot agree on hiding Python 2 by default, maybe we can at > least replace the big [DOWNLOAD] button of Python 2 with a smaller > button or replace it with a link to a different download page? > > Latest news: Django 2.0 and Pyramid 2.0 will simply drop Python 2 support. > > Victor > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From prometheus235 at gmail.com Thu Jan 26 11:27:57 2017 From: prometheus235 at gmail.com (Nick Timkovich) Date: Thu, 26 Jan 2017 10:27:57 -0600 Subject: [Python-ideas] Is it Python 3 yet? In-Reply-To: References: Message-ID: After Django 1.11 (alpha 1 out now, final in few months, LTS EOL 2020) was branched out from master on GH, it was pretty impressive & heartening to see massive commits against master that removed Python 2 compatibility from such a popular library. On Thu, Jan 26, 2017 at 10:11 AM, Victor Stinner wrote: > Hi, > > The download button of https://www.python.org/ currently gives the > choice between Python 2.7 and 3.6. I read more and more articles > saying that we reached a point where Python 3 became more popular than > Python 2, Python 3 has now enough new features to convince developers, > etc. > > Is it time to "hide" Python 2.7 from the default choice and only show > Python 3.6 *by default*? > > For example, I expect a single big [DOWNLOAD] button which would start > the download of Python 3.6 for my platform. > > If we cannot agree on hiding Python 2 by default, maybe we can at > least replace the big [DOWNLOAD] button of Python 2 with a smaller > button or replace it with a link to a different download page? > > Latest news: Django 2.0 and Pyramid 2.0 will simply drop Python 2 support. > > Victor > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Jan 26 11:28:45 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 26 Jan 2017 17:28:45 +0100 Subject: [Python-ideas] Is it Python 3 yet? In-Reply-To: References: Message-ID: On 26 January 2017 at 17:11, Victor Stinner wrote: > Hi, > > The download button of https://www.python.org/ currently gives the > choice between Python 2.7 and 3.6. I read more and more articles > saying that we reached a point where Python 3 became more popular than > Python 2, Python 3 has now enough new features to convince developers, > etc. > > Is it time to "hide" Python 2.7 from the default choice and only show > Python 3.6 *by default*? > > For example, I expect a single big [DOWNLOAD] button which would start > the download of Python 3.6 for my platform. As a related point, there's an open docs issues worth mentioning: http://bugs.python.org/issue26355 That RFE covers setting a "canonical" URL similar to what ReadTheDocs supports: http://docs.readthedocs.io/en/latest/canonical.html Such a change would have two purposes: - consolidating all the links for any given major version into the latest docs for that version in search engines - migrating the legacy deep links in search engine results to the qualified Python 2 URLs Georg has indicated he's fine with the change, so it's just a matter of someone finding the time to poke at the docs build config and how RTFD did it in order to see how to set that up, and then backporting it to *all* the 3.x and 2.x branches that use Sphinx for their docs (even the ones that are otherwise closed to updates). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From victor.stinner at gmail.com Thu Jan 26 11:28:42 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 26 Jan 2017 17:28:42 +0100 Subject: [Python-ideas] Is it Python 3 yet? In-Reply-To: References: Message-ID: If you only want to vote +1 or -1 with no rationale, you may prefer to vote on my Twitter poll: https://twitter.com/VictorStinner/status/824654597235040257 Otherwise, please explain a little bit. Victor From ernest.moloko at gmail.com Thu Jan 26 11:33:01 2017 From: ernest.moloko at gmail.com (Lesego Moloko) Date: Thu, 26 Jan 2017 18:33:01 +0200 Subject: [Python-ideas] Python-ideas Digest, Vol 122, Issue 100 In-Reply-To: References: Message-ID: <558DCE98-3D23-4D18-958E-DCD3B63C9070@gmail.com> W Sent from Lesego's iPhone > On 26 Jan 2017, at 18:28, python-ideas-requests > @python.org wrote: > > Send Python-ideas mailing list submissions to > python-ideas at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/python-ideas > or, via email, send a message with subject or body 'help' to > python-ideas-request at python.org > > You can reach the person managing the list at > python-ideas-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Python-ideas digest..." > > > Today's Topics: > > 1. Re: globals should accept parenteses for extending beyond 1 > line (Nick Coghlan) > 2. Is it Python 3 yet? (Victor Stinner) > 3. Re: Is it Python 3 yet? (Paul Moore) > 4. Re: Is it Python 3 yet? (Ryan Birmingham) > 5. Re: Is it Python 3 yet? (Nick Timkovich) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 26 Jan 2017 17:02:55 +0100 > From: Nick Coghlan > To: MRAB > Cc: "python-ideas at python.org" > Subject: Re: [Python-ideas] globals should accept parenteses for > extending beyond 1 line > Message-ID: > > Content-Type: text/plain; charset=UTF-8 > >> On 23 January 2017 at 22:29, MRAB wrote: >>> On 2017-01-23 20:09, Nick Timkovich wrote: >>> >>> Related and probably more common is the need for the line-continuation >>> operator for long/multiple context managers with "with". I assume that's >>> come up before, but was it also just a low priority rather than any >>> technical reason? >>> >> It has come up before, and there is a technical reason, namely the syntactic >> ambiguity when parsing. Not impossible to fix, but probably not worth the >> added complexity. > > Right, it's the fact parentheses are already allowed there, but mean > something quite different: > >>>> with (1, 2, 3): pass > ... > Traceback (most recent call last): > File "", line 1, in > AttributeError: __enter__ > > These days, I'd personally be in favour of changing the parsing of > parentheses in that situation, as if we were going to add meaningful > context management behaviour to tuples we would have done it by now, > and having the name bindings next to their expressions is easier to > read than having them all at the end: > > with (cm1() as a, > cm2() as b, > cm3() as c): > ... > > Relative to tuples-as-context-managers, such an approach would also > avoid reintroducing the old resource management problems that saw > contextlib.nested removed and replaced with contextlib.ExitStack. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > > > ------------------------------ > > Message: 2 > Date: Thu, 26 Jan 2017 17:11:17 +0100 > From: Victor Stinner > To: python-ideas > Subject: [Python-ideas] Is it Python 3 yet? > Message-ID: > > Content-Type: text/plain; charset=UTF-8 > > Hi, > > The download button of https://www.python.org/ currently gives the > choice between Python 2.7 and 3.6. I read more and more articles > saying that we reached a point where Python 3 became more popular than > Python 2, Python 3 has now enough new features to convince developers, > etc. > > Is it time to "hide" Python 2.7 from the default choice and only show > Python 3.6 *by default*? > > For example, I expect a single big [DOWNLOAD] button which would start > the download of Python 3.6 for my platform. > > If we cannot agree on hiding Python 2 by default, maybe we can at > least replace the big [DOWNLOAD] button of Python 2 with a smaller > button or replace it with a link to a different download page? > > Latest news: Django 2.0 and Pyramid 2.0 will simply drop Python 2 support. > > Victor > > > ------------------------------ > > Message: 3 > Date: Thu, 26 Jan 2017 16:21:30 +0000 > From: Paul Moore > To: Victor Stinner > Cc: python-ideas > Subject: Re: [Python-ideas] Is it Python 3 yet? > Message-ID: > > Content-Type: text/plain; charset=UTF-8 > >> On 26 January 2017 at 16:11, Victor Stinner wrote: >> The download button of https://www.python.org/ currently gives the >> choice between Python 2.7 and 3.6. I read more and more articles >> saying that we reached a point where Python 3 became more popular than >> Python 2, Python 3 has now enough new features to convince developers, >> etc. >> >> Is it time to "hide" Python 2.7 from the default choice and only show >> Python 3.6 *by default*? >> >> For example, I expect a single big [DOWNLOAD] button which would start >> the download of Python 3.6 for my platform. > > +1 > > On a similar note, I always get caught out by the fact that the > Windows default download is the 32-bit version. Are we not yet at a > point where a sufficient majority of users have 64-bit machines, and > 32-bit should be seen as a "specialist" choice? > > Paul > > > ------------------------------ > > Message: 4 > Date: Thu, 26 Jan 2017 11:23:23 -0500 > From: Ryan Birmingham > To: Victor Stinner > Cc: python-ideas > Subject: Re: [Python-ideas] Is it Python 3 yet? > Message-ID: > > Content-Type: text/plain; charset="utf-8" > > It's certainly an interesting transition period. > I'm not sure that the community is quite ready to just drop 2.7, but we > could take a hint from angular 's solution to this > issue and use small descriptions to guide more people to 3.6 rather than > 2.7, then move to 2.7 being substantially smaller, then eventually to > dropping 2.7. > > -Ryan Birmingham > > On 26 January 2017 at 11:11, Victor Stinner > wrote: > >> Hi, >> >> The download button of https://www.python.org/ currently gives the >> choice between Python 2.7 and 3.6. I read more and more articles >> saying that we reached a point where Python 3 became more popular than >> Python 2, Python 3 has now enough new features to convince developers, >> etc. >> >> Is it time to "hide" Python 2.7 from the default choice and only show >> Python 3.6 *by default*? >> >> For example, I expect a single big [DOWNLOAD] button which would start >> the download of Python 3.6 for my platform. >> >> If we cannot agree on hiding Python 2 by default, maybe we can at >> least replace the big [DOWNLOAD] button of Python 2 with a smaller >> button or replace it with a link to a different download page? >> >> Latest news: Django 2.0 and Pyramid 2.0 will simply drop Python 2 support. >> >> Victor >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Message: 5 > Date: Thu, 26 Jan 2017 10:27:57 -0600 > From: Nick Timkovich > To: Victor Stinner > Cc: python-ideas > Subject: Re: [Python-ideas] Is it Python 3 yet? > Message-ID: > > Content-Type: text/plain; charset="utf-8" > > After Django 1.11 (alpha 1 out now, final in few months, LTS EOL 2020) was > branched out from master on GH, it was pretty impressive & heartening to > see massive commits against master that removed Python 2 compatibility from > such a popular library. > > On Thu, Jan 26, 2017 at 10:11 AM, Victor Stinner > wrote: > >> Hi, >> >> The download button of https://www.python.org/ currently gives the >> choice between Python 2.7 and 3.6. I read more and more articles >> saying that we reached a point where Python 3 became more popular than >> Python 2, Python 3 has now enough new features to convince developers, >> etc. >> >> Is it time to "hide" Python 2.7 from the default choice and only show >> Python 3.6 *by default*? >> >> For example, I expect a single big [DOWNLOAD] button which would start >> the download of Python 3.6 for my platform. >> >> If we cannot agree on hiding Python 2 by default, maybe we can at >> least replace the big [DOWNLOAD] button of Python 2 with a smaller >> button or replace it with a link to a different download page? >> >> Latest news: Django 2.0 and Pyramid 2.0 will simply drop Python 2 support. >> >> Victor >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > > ------------------------------ > > End of Python-ideas Digest, Vol 122, Issue 100 > ********************************************** From victor.stinner at gmail.com Thu Jan 26 11:32:47 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 26 Jan 2017 17:32:47 +0100 Subject: [Python-ideas] Is it Python 3 yet? In-Reply-To: References: Message-ID: 2017-01-26 17:21 GMT+01:00 Paul Moore : > On a similar note, I always get caught out by the fact that the > Windows default download is the 32-bit version. Are we not yet at a > point where a sufficient majority of users have 64-bit machines, and > 32-bit should be seen as a "specialist" choice? Ah right, I got screwed recently :-) Who still have Windows 32-bit nowadays? Some Linux distributions even *dropped* 32-bit support. Victor From p.f.moore at gmail.com Thu Jan 26 11:38:07 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 26 Jan 2017 16:38:07 +0000 Subject: [Python-ideas] Is it Python 3 yet? In-Reply-To: References: Message-ID: On 26 January 2017 at 16:11, Victor Stinner wrote: > Is it time to "hide" Python 2.7 from the default choice and only show > Python 3.6 *by default*? Actually, looking back at the "Download" dropdown for Windows, I see Python 3.6.0 Python 2.7.13 That's not really that bad (I recalled it being worse) - Python 3 is on the left, which I'd interpret as "first", but otherwise the choices are pretty equal. The problem is that because it's 32-bit, I never really look at these options, I always go straight to "Other downloads", which is in a *really* weird order - 3.5.3, then 3.5.3rc1, then 3.6.0, then 2.7.13, ... "Full list of downloads" isn't much better - 3.4.6, 3.5.3, 3.6.0, 2.7.13, 3.4.5, ... +1 on tidying up, and consistently showing an order 3.6.0, 2.7.13, "other older versions". No matter where people end up, they should always see 3.6 as the first option, with 2.7 clearly available as "the other one". +0 on de-emphasising 2.7 still further. I'm in favour, and I think that people who need 2.7 in practice probably need it for compatibility with other parts of their system and so should likely be getting it from their vendor/distro rather than python.org. +1 on making the main download for Windows 64-bit. Paul From srkunze at mail.de Thu Jan 26 12:00:45 2017 From: srkunze at mail.de (Sven R. Kunze) Date: Thu, 26 Jan 2017 18:00:45 +0100 Subject: [Python-ideas] Is it Python 3 yet? In-Reply-To: References: Message-ID: <16576e20-568d-2a16-8a01-a73a82c33d3a@mail.de> A Big Yes! On 26.01.2017 17:11, Victor Stinner wrote: > Hi, > > The download button of https://www.python.org/ currently gives the > choice between Python 2.7 and 3.6. I read more and more articles > saying that we reached a point where Python 3 became more popular than > Python 2, Python 3 has now enough new features to convince developers, > etc. > > Is it time to "hide" Python 2.7 from the default choice and only show > Python 3.6 *by default*? > > For example, I expect a single big [DOWNLOAD] button which would start > the download of Python 3.6 for my platform. > > If we cannot agree on hiding Python 2 by default, maybe we can at > least replace the big [DOWNLOAD] button of Python 2 with a smaller > button or replace it with a link to a different download page? > > Latest news: Django 2.0 and Pyramid 2.0 will simply drop Python 2 support. > > Victor > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From berker.peksag at gmail.com Thu Jan 26 12:18:42 2017 From: berker.peksag at gmail.com (=?UTF-8?Q?Berker_Peksa=C4=9F?=) Date: Thu, 26 Jan 2017 20:18:42 +0300 Subject: [Python-ideas] Is it Python 3 yet? In-Reply-To: References: Message-ID: On Thu, Jan 26, 2017 at 7:11 PM, Victor Stinner wrote: > Hi, > > The download button of https://www.python.org/ currently gives the > choice between Python 2.7 and 3.6. I read more and more articles > saying that we reached a point where Python 3 became more popular than > Python 2, Python 3 has now enough new features to convince developers, > etc. > > Is it time to "hide" Python 2.7 from the default choice and only show > Python 3.6 *by default*? > > For example, I expect a single big [DOWNLOAD] button which would start > the download of Python 3.6 for my platform. > > If we cannot agree on hiding Python 2 by default, maybe we can at > least replace the big [DOWNLOAD] button of Python 2 with a smaller > button or replace it with a link to a different download page? > > Latest news: Django 2.0 and Pyramid 2.0 will simply drop Python 2 support. +1 from me too. It should be easily implemented so let me know if there is a consensus :) --Berker From mertz at gnosis.cx Thu Jan 26 12:21:41 2017 From: mertz at gnosis.cx (David Mertz) Date: Thu, 26 Jan 2017 09:21:41 -0800 Subject: [Python-ideas] Is it Python 3 yet? In-Reply-To: References: Message-ID: Big YES! On Jan 26, 2017 12:19 PM, "Berker Peksa?" wrote: > On Thu, Jan 26, 2017 at 7:11 PM, Victor Stinner > wrote: > > Hi, > > > > The download button of https://www.python.org/ currently gives the > > choice between Python 2.7 and 3.6. I read more and more articles > > saying that we reached a point where Python 3 became more popular than > > Python 2, Python 3 has now enough new features to convince developers, > > etc. > > > > Is it time to "hide" Python 2.7 from the default choice and only show > > Python 3.6 *by default*? > > > > For example, I expect a single big [DOWNLOAD] button which would start > > the download of Python 3.6 for my platform. > > > > If we cannot agree on hiding Python 2 by default, maybe we can at > > least replace the big [DOWNLOAD] button of Python 2 with a smaller > > button or replace it with a link to a different download page? > > > > Latest news: Django 2.0 and Pyramid 2.0 will simply drop Python 2 > support. > > +1 from me too. It should be easily implemented so let me know if > there is a consensus :) > > --Berker > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Thu Jan 26 12:49:01 2017 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 26 Jan 2017 18:49:01 +0100 Subject: [Python-ideas] Is it Python 3 yet? In-Reply-To: References: Message-ID: <8bcd52e1-a7eb-81d1-4bb3-9b8f24c07e79@egenix.com> On 26.01.2017 17:11, Victor Stinner wrote: > Hi, > > The download button of https://www.python.org/ currently gives the > choice between Python 2.7 and 3.6. I read more and more articles > saying that we reached a point where Python 3 became more popular than > Python 2, Python 3 has now enough new features to convince developers, > etc. > > Is it time to "hide" Python 2.7 from the default choice and only show > Python 3.6 *by default*? > > For example, I expect a single big [DOWNLOAD] button which would start > the download of Python 3.6 for my platform. > > If we cannot agree on hiding Python 2 by default, maybe we can at > least replace the big [DOWNLOAD] button of Python 2 with a smaller > button or replace it with a link to a different download page? -1 on hiding Python 2.7. It's our LTS release, so something we should be proud of until it goes out of support. +1 on emphasizing the 3.6 button and de-emphasizing 2.7, e.g. by making the 3.6 button yellow and the 2.7 grey. Also +1 on what Paul suggested. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jan 26 2017) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From brett at python.org Thu Jan 26 13:13:38 2017 From: brett at python.org (Brett Cannon) Date: Thu, 26 Jan 2017 18:13:38 +0000 Subject: [Python-ideas] Is it Python 3 yet? In-Reply-To: References: Message-ID: On Thu, 26 Jan 2017 at 08:39 Paul Moore wrote: > On 26 January 2017 at 16:11, Victor Stinner > wrote: > > Is it time to "hide" Python 2.7 from the default choice and only show > > Python 3.6 *by default*? > > Actually, looking back at the "Download" dropdown for Windows, I see > > Python 3.6.0 Python 2.7.13 > > That's not really that bad (I recalled it being worse) - Python 3 is > on the left, which I'd interpret as "first", but otherwise the choices > are pretty equal. > > The problem is that because it's 32-bit, I never really look at these > options, I always go straight to "Other downloads", which is in a > *really* weird order - 3.5.3, then 3.5.3rc1, then 3.6.0, then 2.7.13, > ... "Full list of downloads" isn't much better - 3.4.6, 3.5.3, 3.6.0, > 2.7.13, 3.4.5, ... > > +1 on tidying up, and consistently showing an order 3.6.0, 2.7.13, > "other older versions". No matter where people end up, they should > always see 3.6 as the first option, with 2.7 clearly available as "the > other one". > > +0 on de-emphasising 2.7 still further. I'm in favour, and I think > that people who need 2.7 in practice probably need it for > compatibility with other parts of their system and so should likely be > getting it from their vendor/distro rather than python.org. > > +1 on making the main download for Windows 64-bit. > +1 to what Paul proposes. -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Thu Jan 26 17:09:09 2017 From: random832 at fastmail.com (Random832) Date: Thu, 26 Jan 2017 17:09:09 -0500 Subject: [Python-ideas] Is it Python 3 yet? In-Reply-To: References: Message-ID: <1485468549.2813607.860870640.2970243F@webmail.messagingengine.com> On Thu, Jan 26, 2017, at 11:21, Paul Moore wrote: > On a similar note, I always get caught out by the fact that the > Windows default download is the 32-bit version. Are we not yet at a > point where a sufficient majority of users have 64-bit machines, and > 32-bit should be seen as a "specialist" choice? I'm actually surprised it doesn't detect it, especially since it does detect Windows. (I bet fewer people have supported 32-bit windows versions than have Windows XP.) From mal at egenix.com Thu Jan 26 17:32:09 2017 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 26 Jan 2017 23:32:09 +0100 Subject: [Python-ideas] Is it Python 3 yet? In-Reply-To: <1485468549.2813607.860870640.2970243F@webmail.messagingengine.com> References: <1485468549.2813607.860870640.2970243F@webmail.messagingengine.com> Message-ID: <880e6e87-0e45-0d47-28b7-01a2773a23b8@egenix.com> On 26.01.2017 23:09, Random832 wrote: > On Thu, Jan 26, 2017, at 11:21, Paul Moore wrote: >> On a similar note, I always get caught out by the fact that the >> Windows default download is the 32-bit version. Are we not yet at a >> point where a sufficient majority of users have 64-bit machines, and >> 32-bit should be seen as a "specialist" choice? > > I'm actually surprised it doesn't detect it, especially since it does > detect Windows. > > (I bet fewer people have supported 32-bit windows versions than have > Windows XP.) I think you have to differentiate a bit more between having a 64-bit OS and running 64-bit applications. Many applications on Windows are still 32-bit applications and unless you process large amounts of data, a 32-bit Python system is well worth using. In some cases, it's even needed, e.g. if you have to use an extension which links to a 32-bit library. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jan 26 2017) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From p.f.moore at gmail.com Thu Jan 26 17:49:44 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 26 Jan 2017 22:49:44 +0000 Subject: [Python-ideas] Is it Python 3 yet? In-Reply-To: <880e6e87-0e45-0d47-28b7-01a2773a23b8@egenix.com> References: <1485468549.2813607.860870640.2970243F@webmail.messagingengine.com> <880e6e87-0e45-0d47-28b7-01a2773a23b8@egenix.com> Message-ID: On 26 January 2017 at 22:32, M.-A. Lemburg wrote: > On 26.01.2017 23:09, Random832 wrote: >> On Thu, Jan 26, 2017, at 11:21, Paul Moore wrote: >>> On a similar note, I always get caught out by the fact that the >>> Windows default download is the 32-bit version. Are we not yet at a >>> point where a sufficient majority of users have 64-bit machines, and >>> 32-bit should be seen as a "specialist" choice? >> >> I'm actually surprised it doesn't detect it, especially since it does >> detect Windows. >> >> (I bet fewer people have supported 32-bit windows versions than have >> Windows XP.) > > I think you have to differentiate a bit more between having a > 64-bit OS and running 64-bit applications. > > Many applications on Windows are still 32-bit applications and > unless you process large amounts of data, a 32-bit Python > system is well worth using. In some cases, it's even needed, > e.g. if you have to use an extension which links to a 32-bit > library. I agree that there are use cases for a 32-bit Python. But for the *average* user, I'd argue in favour of a 64-bit build as the default download. Paul From eryksun at gmail.com Thu Jan 26 18:25:59 2017 From: eryksun at gmail.com (eryk sun) Date: Thu, 26 Jan 2017 23:25:59 +0000 Subject: [Python-ideas] Is it Python 3 yet? In-Reply-To: References: <1485468549.2813607.860870640.2970243F@webmail.messagingengine.com> <880e6e87-0e45-0d47-28b7-01a2773a23b8@egenix.com> Message-ID: On Thu, Jan 26, 2017 at 10:49 PM, Paul Moore wrote: > On 26 January 2017 at 22:32, M.-A. Lemburg wrote: >> On 26.01.2017 23:09, Random832 wrote: >>> On Thu, Jan 26, 2017, at 11:21, Paul Moore wrote: >>>> On a similar note, I always get caught out by the fact that the >>>> Windows default download is the 32-bit version. Are we not yet at a >>>> point where a sufficient majority of users have 64-bit machines, and >>>> 32-bit should be seen as a "specialist" choice? >>> >>> I'm actually surprised it doesn't detect it, especially since it does >>> detect Windows. >>> >>> (I bet fewer people have supported 32-bit windows versions than have >>> Windows XP.) >> >> I think you have to differentiate a bit more between having a >> 64-bit OS and running 64-bit applications. >> >> Many applications on Windows are still 32-bit applications and >> unless you process large amounts of data, a 32-bit Python >> system is well worth using. In some cases, it's even needed, >> e.g. if you have to use an extension which links to a 32-bit >> library. > > I agree that there are use cases for a 32-bit Python. But for the > *average* user, I'd argue in favour of a 64-bit build as the default > download. Preferring the 64-bit version would be a friendlier experience for novices in general nowadays. I've had to explain WOW64 file-system redirection [1] and registry redirection [2] too many times to people who are using 32-bit Python on 64-bit Windows. I've seen people waste over a day on this silly problem. They can't imagine that Windows is basically lying to them. [1]: https://msdn.microsoft.com/en-us/library/aa384187 [2]: https://msdn.microsoft.com/en-us/library/aa384232 From rob.cliffe at btinternet.com Thu Jan 26 19:45:26 2017 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Fri, 27 Jan 2017 00:45:26 +0000 Subject: [Python-ideas] Is it Python 3 yet? In-Reply-To: <8bcd52e1-a7eb-81d1-4bb3-9b8f24c07e79@egenix.com> References: <8bcd52e1-a7eb-81d1-4bb3-9b8f24c07e79@egenix.com> Message-ID: On 26/01/2017 17:49, M.-A. Lemburg wrote: > > -1 on hiding Python 2.7. It's our LTS release, so something > we should be proud of until it goes out of support. > > +1 on emphasizing the 3.6 button and de-emphasizing 2.7, e.g. > by making the 3.6 button yellow and the 2.7 grey. > > Quite. Please, de-emphasize Python 2.7 if appropriate (it seems to be the consensus), but do not hide it. It is stable, and IMHO the most tried-and-tested version of Python. As a 2.7 user, I would like to *decide* when to upgrade to Python 3, not to be pressured into it by events. Best wishes, Rob Cliffe From njs at pobox.com Thu Jan 26 20:23:38 2017 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 26 Jan 2017 17:23:38 -0800 Subject: [Python-ideas] Is it Python 3 yet? In-Reply-To: <880e6e87-0e45-0d47-28b7-01a2773a23b8@egenix.com> References: <1485468549.2813607.860870640.2970243F@webmail.messagingengine.com> <880e6e87-0e45-0d47-28b7-01a2773a23b8@egenix.com> Message-ID: On Thu, Jan 26, 2017 at 2:32 PM, M.-A. Lemburg wrote: > On 26.01.2017 23:09, Random832 wrote: >> On Thu, Jan 26, 2017, at 11:21, Paul Moore wrote: >>> On a similar note, I always get caught out by the fact that the >>> Windows default download is the 32-bit version. Are we not yet at a >>> point where a sufficient majority of users have 64-bit machines, and >>> 32-bit should be seen as a "specialist" choice? >> >> I'm actually surprised it doesn't detect it, especially since it does >> detect Windows. >> >> (I bet fewer people have supported 32-bit windows versions than have >> Windows XP.) > > I think you have to differentiate a bit more between having a > 64-bit OS and running 64-bit applications. > > Many applications on Windows are still 32-bit applications and > unless you process large amounts of data, a 32-bit Python > system is well worth using. In some cases, it's even needed, > e.g. if you have to use an extension which links to a 32-bit > library. It's also relatively common to need a 64-bit Python, e.g. if running programs that need more than 4 GiB of address space. (Data analysts run into this fairly often.) I don't know enough about Windows to have an informed opinion about how the trade-offs work out, but as an additional data point, it looks like in the last ~week of PyPI downloads, 32-bit windows wheels have been downloaded 379943 times, and 64-bit windows wheels have been downloaded 331933 times [1], so it's pretty evenly split 53% / 47%. -n [1] SELECT COUNT(*) AS downloads, REGEXP_EXTRACT(file.filename, r"(win32|win_amd64)\.whl") as windows_bitness, FROM TABLE_DATE_RANGE( [the-psf:pypi.downloads], TIMESTAMP("20170119"), TIMESTAMP("20170126") ) GROUP BY windows_bitness ORDER BY downloads DESC LIMIT 1000 -- Nathaniel J. Smith -- https://vorpus.org From python at mrabarnett.plus.com Thu Jan 26 20:26:22 2017 From: python at mrabarnett.plus.com (MRAB) Date: Fri, 27 Jan 2017 01:26:22 +0000 Subject: [Python-ideas] globals should accept parenteses for extending beyond 1 line In-Reply-To: References: <0147af7a-fcfd-3379-a1d9-f7f1ec1d1369@gmail.com> Message-ID: <42f67207-e83b-589b-d1e8-bc3cf692caa6@mrabarnett.plus.com> On 2017-01-26 16:02, Nick Coghlan wrote: > On 23 January 2017 at 22:29, MRAB wrote: > > On 2017-01-23 20:09, Nick Timkovich wrote: > >> > >> Related and probably more common is the need for the line-continuation > >> operator for long/multiple context managers with "with". I assume that's > >> come up before, but was it also just a low priority rather than any > >> technical reason? > >> > > It has come up before, and there is a technical reason, namely the syntactic > > ambiguity when parsing. Not impossible to fix, but probably not worth the > > added complexity. > > Right, it's the fact parentheses are already allowed there, but mean > something quite different: > > >>> with (1, 2, 3): pass > ... > Traceback (most recent call last): > File "", line 1, in > AttributeError: __enter__ > > These days, I'd personally be in favour of changing the parsing of > parentheses in that situation, as if we were going to add meaningful > context management behaviour to tuples we would have done it by now, > and having the name bindings next to their expressions is easier to > read than having them all at the end: > > with (cm1() as a, > cm2() as b, > cm3() as c): > ... > > Relative to tuples-as-context-managers, such an approach would also > avoid reintroducing the old resource management problems that saw > contextlib.nested removed and replaced with contextlib.ExitStack. > Just because the 'with' is followed by a '(', It doesn't necessarily mean that it's a tuple. The 'as' is preceded by an expression, which could start with '('. OTOH, I can't remember ever seeing the expression start with '('; it's usually the name of a callable. From bussonniermatthias at gmail.com Thu Jan 26 21:46:08 2017 From: bussonniermatthias at gmail.com (Matthias Bussonnier) Date: Thu, 26 Jan 2017 18:46:08 -0800 Subject: [Python-ideas] Is it Python 3 yet? In-Reply-To: References: <1485468549.2813607.860870640.2970243F@webmail.messagingengine.com> <880e6e87-0e45-0d47-28b7-01a2773a23b8@egenix.com> Message-ID: On Thu, Jan 26, 2017 at 5:23 PM, Nathaniel Smith wrote: > It's also relatively common to need a 64-bit Python, e.g. if running > programs that need more than 4 GiB of address space. (Data analysts > run into this fairly often.) > > I don't know enough about Windows to have an informed opinion about > how the trade-offs work out, but as an additional data point, it looks > like in the last ~week of PyPI downloads, 32-bit windows wheels have > been downloaded 379943 times, and 64-bit windows wheels have been > downloaded 331933 times [1], so it's pretty evenly split 53% / 47%. > How much of that is because of the default download on python.org ? Also % seem swapped depending on python2 vs Python3, and quite different. Python 3 190466 win_amd64 ~ 60% 275949 win32 Python 2 3139051 win32 ~ 87% 463554 win_amd64 -- M SELECT COUNT(*) AS downloads, REGEXP_EXTRACT(file.filename, r"(win32|win_amd64)\.whl") as windows_bitness, REGEXP_EXTRACT(details.python, r"(^\d)") as python FROM TABLE_DATE_RANGE( [the-psf:pypi.downloads], TIMESTAMP("20170119"), TIMESTAMP("20170126") ) WHERE REGEXP_EXTRACT(file.filename, r"(win32|win_amd64)\.whl") <> 'null' GROUP BY windows_bitness, python ORDER BY python DESC, downloads DESC LIMIT 1000 From njs at pobox.com Thu Jan 26 22:20:53 2017 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 26 Jan 2017 19:20:53 -0800 Subject: [Python-ideas] Is it Python 3 yet? In-Reply-To: References: <1485468549.2813607.860870640.2970243F@webmail.messagingengine.com> <880e6e87-0e45-0d47-28b7-01a2773a23b8@egenix.com> Message-ID: On Thu, Jan 26, 2017 at 6:46 PM, Matthias Bussonnier wrote: > On Thu, Jan 26, 2017 at 5:23 PM, Nathaniel Smith wrote: > >> It's also relatively common to need a 64-bit Python, e.g. if running >> programs that need more than 4 GiB of address space. (Data analysts >> run into this fairly often.) >> >> I don't know enough about Windows to have an informed opinion about >> how the trade-offs work out, but as an additional data point, it looks >> like in the last ~week of PyPI downloads, 32-bit windows wheels have >> been downloaded 379943 times, and 64-bit windows wheels have been >> downloaded 331933 times [1], so it's pretty evenly split 53% / 47%. >> > > How much of that is because of the default download on python.org ? > > Also % seem swapped depending on python2 vs Python3, and quite different. > > Python 3 > 190466 win_amd64 ~ 60% > 275949 win32 Did you get something reversed here? > Python 2 > 3139051 win32 ~ 87% > 463554 win_amd64 I also don't know why your numbers are so much larger than mine... -n -- Nathaniel J. Smith -- https://vorpus.org From bussonniermatthias at gmail.com Thu Jan 26 22:49:58 2017 From: bussonniermatthias at gmail.com (Matthias Bussonnier) Date: Thu, 26 Jan 2017 19:49:58 -0800 Subject: [Python-ideas] Is it Python 3 yet? In-Reply-To: References: <1485468549.2813607.860870640.2970243F@webmail.messagingengine.com> <880e6e87-0e45-0d47-28b7-01a2773a23b8@egenix.com> Message-ID: On Thu, Jan 26, 2017 at 7:20 PM, Nathaniel Smith wrote: > I also don't know why your numbers are so much larger than mine... That's because copy/pasting from the html table prepend the row number to the download count. >> Also % seem swapped depending on python2 vs Python3, and quite different. > Did you get something reversed here? Still small majority of 64bit on Py3, but large majority of 32 Bit download on py2. Python 3 90466 win_amd64 ~ 54% 75949 win32 Python 2 139051 win32 ~ 70% 63554 win_amd64 -- M From tjreedy at udel.edu Fri Jan 27 00:13:50 2017 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 27 Jan 2017 00:13:50 -0500 Subject: [Python-ideas] Is it Python 3 yet? In-Reply-To: <880e6e87-0e45-0d47-28b7-01a2773a23b8@egenix.com> References: <1485468549.2813607.860870640.2970243F@webmail.messagingengine.com> <880e6e87-0e45-0d47-28b7-01a2773a23b8@egenix.com> Message-ID: On 1/26/2017 5:32 PM, M.-A. Lemburg wrote: > Many applications on Windows are still 32-bit applications and > unless you process large amounts of data, a 32-bit Python > system is well worth using. In some cases, it's even needed, > e.g. if you have to use an extension which links to a 32-bit > library. I look through the list of a few hundred windows packages at http://www.lfd.uci.edu/~gohlke/pythonlibs/ The two packages that require CUDA 8 and CUDNN are 64-bit only. As far as I saw in a careful check, all other windows binaries are available in both 32- and 64-bit versions. The situation may be different on PyPI, but win64 will cover most thing likely to be used by a beginner. -- Terry Jan Reedy From denis.akhiyarov at gmail.com Fri Jan 27 01:22:21 2017 From: denis.akhiyarov at gmail.com (Denis Akhiyarov) Date: Thu, 26 Jan 2017 22:22:21 -0800 (PST) Subject: [Python-ideas] Is it Python 3 yet? In-Reply-To: References: <1485468549.2813607.860870640.2970243F@webmail.messagingengine.com> <880e6e87-0e45-0d47-28b7-01a2773a23b8@egenix.com> Message-ID: The problem is not in Python packages, but when gluing Python with other Windows apps or libraries. From p.f.moore at gmail.com Fri Jan 27 03:45:45 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 27 Jan 2017 08:45:45 +0000 Subject: [Python-ideas] Is it Python 3 yet? In-Reply-To: References: <1485468549.2813607.860870640.2970243F@webmail.messagingengine.com> <880e6e87-0e45-0d47-28b7-01a2773a23b8@egenix.com> Message-ID: Resending because Google Groups handling of mailing lists is broken :-( Sorry to anyone who gets double posts. On 27 January 2017 at 08:39, Paul Moore wrote: > On 27 January 2017 at 06:22, Denis Akhiyarov wrote: >> The problem is not in Python packages, but when gluing Python with other Windows apps or libraries. > > I would argue that anyone doing that is capable of looking for the > version they need. The proposal is simply to make the 64-bit version > what we offer by default, not to remove the 32-bit versions or even to > make them less prominent anywhere other than on the front page. > > Paul From mal at egenix.com Fri Jan 27 04:07:17 2017 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 27 Jan 2017 10:07:17 +0100 Subject: [Python-ideas] Default Python Windows version on python.org (was: Is it Python 3 yet?) In-Reply-To: References: <1485468549.2813607.860870640.2970243F@webmail.messagingengine.com> <880e6e87-0e45-0d47-28b7-01a2773a23b8@egenix.com> Message-ID: <6d88544f-089c-32b9-fcc7-e0bf2f73721e@egenix.com> On 27.01.2017 06:13, Terry Reedy wrote: > On 1/26/2017 5:32 PM, M.-A. Lemburg wrote: > >> Many applications on Windows are still 32-bit applications and >> unless you process large amounts of data, a 32-bit Python >> system is well worth using. In some cases, it's even needed, >> e.g. if you have to use an extension which links to a 32-bit >> library. > > I look through the list of a few hundred windows packages at > http://www.lfd.uci.edu/~gohlke/pythonlibs/ > > The two packages that require CUDA 8 and CUDNN are 64-bit only. As far > as I saw in a careful check, all other windows binaries are available in > both 32- and 64-bit versions. The situation may be different on PyPI, > but win64 will cover most thing likely to be used by a beginner. 32-bit vs. 64-bit is a still very much a conscious choice on Windows x64, and so whether or not a beginner chose to install 3rd party libs as 32-bit or 64-bit version is not something we can really tell from looking at the browser info. It would probably be better to make the choice for Python a conscious one as well by offering both alternatives or at least make it clear that the default is e.g. x64. Some cases where you'd prefer 32-bit over 64-bit: - MS Office: https://support.office.com/en-us/article/Choose-the-64-bit-or-32-bit-version-of-Office-2dee7807-8f95-4d0c-b5fe-6c6f49b8d261 - LibreOffice: https://ask.libreoffice.org/en/question/55819/version-5-choose-32-bit-or-64-bit/ - Anything to do with media codecs - Anything that still supports older Windows versions (vendors often don't ship 64-bit variants due to this) You just have to compare the number of entries in your "Programs" dir with the "Programs (x86)" dir to see how common 32-bit applications are today. It's also possible that an application of library installs both 32-bit and 64-bit variants. You can then run into issues when configuring these. The ODBC manager on Windows x64 is a prominent example: there are actually two versions of this, one for 32-bit drivers and one for 64-bit drivers - using distinct configurations. 32-bit apps only see the drivers configured with the 32-bit manager, 64-bit apps only the ones configured with the 64-bit variant. Anyway, I agree that defaulting to x64 is the way forward, and defaulting to x64 for Python on Windows x64 is a good approach, but making the default choice clear to the beginner is probably just as needed to at least give them a hint at what the cause of their problems could be. They have to make the same choice with many other applications as well, so it's not like they've never seen this before. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jan 27 2017) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From stephanh42 at gmail.com Fri Jan 27 04:32:26 2017 From: stephanh42 at gmail.com (Stephan Houben) Date: Fri, 27 Jan 2017 10:32:26 +0100 Subject: [Python-ideas] Is it Python 3 yet? In-Reply-To: References: <1485468549.2813607.860870640.2970243F@webmail.messagingengine.com> <880e6e87-0e45-0d47-28b7-01a2773a23b8@egenix.com> Message-ID: Hi all, FWIW, I got the following statement from here: https://github.com/numpy/numpy/wiki/Numerical-software-on-Windows "Standard numpy and scipy binary releases on Windows use pre-compiled ATLAS libraries and are 32-bit only because of the difficulty of compiling ATLAS on 64-bit Windows. " Might want to double-check with the numpy folks; it would be too bad if numpy wouldn't work on the preferred Windows Python. Stephan 2017-01-27 9:45 GMT+01:00 Paul Moore : > Resending because Google Groups handling of mailing lists is broken > :-( Sorry to anyone who gets double posts. > > On 27 January 2017 at 08:39, Paul Moore wrote: > > On 27 January 2017 at 06:22, Denis Akhiyarov > wrote: > >> The problem is not in Python packages, but when gluing Python with > other Windows apps or libraries. > > > > I would argue that anyone doing that is capable of looking for the > > version they need. The proposal is simply to make the 64-bit version > > what we offer by default, not to remove the 32-bit versions or even to > > make them less prominent anywhere other than on the front page. > > > > Paul > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Jan 27 04:38:49 2017 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 27 Jan 2017 01:38:49 -0800 Subject: [Python-ideas] Is it Python 3 yet? In-Reply-To: References: <1485468549.2813607.860870640.2970243F@webmail.messagingengine.com> <880e6e87-0e45-0d47-28b7-01a2773a23b8@egenix.com> Message-ID: On Fri, Jan 27, 2017 at 1:32 AM, Stephan Houben wrote: > Hi all, > > FWIW, I got the following statement from here: > > https://github.com/numpy/numpy/wiki/Numerical-software-on-Windows > > "Standard numpy and scipy binary releases on Windows use pre-compiled ATLAS > libraries and are 32-bit only because of the difficulty of compiling ATLAS > on 64-bit Windows. " > > Might want to double-check with the numpy folks; it would > be too bad if numpy wouldn't work on the preferred Windows Python. That's out of date -- official numpy releases have switched from ATLAS to OpenBLAS (which requires some horrible frankencompiler system, but it seems to work for now...), and there are 32- and 64-bit Windows wheels up on PyPI: https://pypi.python.org/pypi/numpy/ 64-bit is definitely what I'd recommend as a default to someone wanting to use numpy, because when working with arrays it's too easy to hit the 32-bit address space limit. -n -- Nathaniel J. Smith -- https://vorpus.org From tjreedy at udel.edu Fri Jan 27 05:25:27 2017 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 27 Jan 2017 05:25:27 -0500 Subject: [Python-ideas] Is it Python 3 yet? In-Reply-To: References: <1485468549.2813607.860870640.2970243F@webmail.messagingengine.com> <880e6e87-0e45-0d47-28b7-01a2773a23b8@egenix.com> Message-ID: On 1/27/2017 4:38 AM, Nathaniel Smith wrote: > On Fri, Jan 27, 2017 at 1:32 AM, Stephan Houben wrote: >> Hi all, >> >> FWIW, I got the following statement from here: >> >> https://github.com/numpy/numpy/wiki/Numerical-software-on-Windows >> >> "Standard numpy and scipy binary releases on Windows use pre-compiled ATLAS >> libraries and are 32-bit only because of the difficulty of compiling ATLAS >> on 64-bit Windows. " >> >> Might want to double-check with the numpy folks; it would >> be too bad if numpy wouldn't work on the preferred Windows Python. > > That's out of date Would be nice if it were updated... -- official numpy releases have switched from ATLAS > to OpenBLAS (which requires some horrible frankencompiler system, but > it seems to work for now...), and there are 32- and 64-bit Windows > wheels up on PyPI: https://pypi.python.org/pypi/numpy/ and from NumPy, a fundamental package needed for scientific computing with Python. Numpy+MKL is linked to the Intel? Math Kernel Library and includes required DLLs in the numpy.core directory. numpy?1.11.3+mkl?cp27?cp27m?win32.whl numpy?1.11.3+mkl?cp27?cp27m?win_amd64.whl etc. All the several packages that require numpy also come in both versions. > 64-bit is definitely what I'd recommend as a default to someone > wanting to use numpy, because when working with arrays it's too easy > to hit the 32-bit address space limit. > > -n > -- Terry Jan Reedy From anthony at xtfx.me Fri Jan 27 12:54:56 2017 From: anthony at xtfx.me (C Anthony Risinger) Date: Fri, 27 Jan 2017 11:54:56 -0600 Subject: [Python-ideas] Is it Python 3 yet? In-Reply-To: References: <1485468549.2813607.860870640.2970243F@webmail.messagingengine.com> <880e6e87-0e45-0d47-28b7-01a2773a23b8@egenix.com> Message-ID: So I realize this is subjective and just a personal experience, but over the last 3-5 years I've really watched Python usage and popularity decline in the "hearts and minds" of my peers, across a few different companies I work with. At my current gig we don't even use Python anymore for tools that will be distributed to an end user; we only use Python for internal tooling. With a still difficult distribution/compatibility story, I've watched dozens of instances where people choose something else, usually Node or Golang. The primary uses here are api and microservice-type applications, developer tooling, and CLI apps. Even recent additions like `async` keyword are causing more problems because it's not a useful general-purpose concurrency primitive eg. like a goroutine or greenlets. I know the scientific community is a big and important part of the Python ecosystem, but I honestly believe other parts of Python are suffering from any dragging of feet at this point. Python 3 has been out nearly a decade, and I think it would be super for the community to take a bold stance (is it still bold 9 years later?) and really stand behind Python 3, prominently, almost actively working to diminish Python 2. I've been hearing and reading about both for a long time, and honestly I'd love one of them to go away! I don't even care which :-) On Fri, Jan 27, 2017 at 4:25 AM, Terry Reedy wrote: > On 1/27/2017 4:38 AM, Nathaniel Smith wrote: > >> On Fri, Jan 27, 2017 at 1:32 AM, Stephan Houben < >> stephanh42-Re5JQEeQqe8AvxtiuMwx3w at public.gmane.org> wrote: >> >>> Hi all, >>> >>> FWIW, I got the following statement from here: >>> >>> https://github.com/numpy/numpy/wiki/Numerical-software-on-Windows >>> >>> "Standard numpy and scipy binary releases on Windows use pre-compiled >>> ATLAS >>> libraries and are 32-bit only because of the difficulty of compiling >>> ATLAS >>> on 64-bit Windows. " >>> >>> Might want to double-check with the numpy folks; it would >>> be too bad if numpy wouldn't work on the preferred Windows Python. >>> >> >> That's out of date >> > > Would be nice if it were updated... > > -- official numpy releases have switched from ATLAS > >> to OpenBLAS (which requires some horrible frankencompiler system, but >> it seems to work for now...), and there are 32- and 64-bit Windows >> wheels up on PyPI: https://pypi.python.org/pypi/numpy/ >> > > and from > > NumPy, a fundamental package needed for scientific computing with Python. > Numpy+MKL is linked to the Intel? Math Kernel Library and includes > required DLLs in the numpy.core directory. > > numpy?1.11.3+mkl?cp27?cp27m?win32.whl > numpy?1.11.3+mkl?cp27?cp27m?win_amd64.whl > etc. > > All the several packages that require numpy also come in both versions. > > 64-bit is definitely what I'd recommend as a default to someone >> wanting to use numpy, because when working with arrays it's too easy >> to hit the 32-bit address space limit. >> >> -n >> >> > > -- > Terry Jan Reedy > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- C Anthony -------------- next part -------------- An HTML attachment was scrubbed... URL: From brentbrinkley at gmail.com Fri Jan 27 16:07:27 2017 From: brentbrinkley at gmail.com (Brent Brinkley) Date: Fri, 27 Jan 2017 16:07:27 -0500 Subject: [Python-ideas] A more readable way to nest functions Message-ID: HI Everyone, I?m relatively new to the world of python but in my short time here I?ve fallen in love with how readable this language is. One issue that I?ve seen in a lot of languages struggle with is nested function calls. Parenthesis when nested inherently create readability issues. I stumbled upon what I believe is an elegant solution within the elm platform in their use of the backward pipe operator <|. Current Ex. Suggested Structure This aligns with the Zen of Python in the following ways Simple is better than complex Flat is better than nested Sparse is better than dense Readability counts Practicality beats purity Ways it may conflict Explicit is better than implicit Special cases aren't special enough to break the rules Just curious to see what the rest of the community thinks ? Best Regards, Brent -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2017-01-26 at 2.43.34 PM.png Type: image/png Size: 13716 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2017-01-26 at 12.36.26 PM.png Type: image/png Size: 14199 bytes Desc: not available URL: From jmcs at jsantos.eu Fri Jan 27 16:19:34 2017 From: jmcs at jsantos.eu (=?UTF-8?B?Sm/Do28gU2FudG9z?=) Date: Fri, 27 Jan 2017 21:19:34 +0000 Subject: [Python-ideas] A more readable way to nest functions In-Reply-To: References: Message-ID: Hi, This would break apart as soon as one of left functions takes more than one parameter. Best regards, Jo?o Santos On Fri, 27 Jan 2017, 22:08 Brent Brinkley, wrote: > HI Everyone, > > I?m relatively new to the world of python but in my short time here I?ve > fallen in love with how readable this language is. One issue that I?ve seen > in a lot of languages struggle with is nested function calls. Parenthesis > when nested inherently create readability issues. I stumbled upon what I > believe is an elegant solution within the elm platform in their use of the > backward pipe operator <|. > > > Current Ex. > > > Suggested Structure > > This aligns with the Zen of Python in the following ways > > > - Simple is better than complex > - Flat is better than nested > - Sparse is better than dense > - Readability counts > - Practicality beats purity > > > Ways it may conflict > > > - Explicit is better than implicit > - Special cases aren't special enough to break the rules > > > Just curious to see what the rest of the community thinks ? > > Best Regards, > > Brent > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2017-01-26 at 2.43.34 PM.png Type: image/png Size: 13716 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2017-01-26 at 12.36.26 PM.png Type: image/png Size: 14199 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2017-01-26 at 2.43.34 PM.png Type: image/png Size: 13716 bytes Desc: not available URL: From ethan at stoneleaf.us Fri Jan 27 16:28:54 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 27 Jan 2017 13:28:54 -0800 Subject: [Python-ideas] A more readable way to nest functions In-Reply-To: References: Message-ID: <588BBB96.4080906@stoneleaf.us> On 01/27/2017 01:07 PM, Brent Brinkley wrote: > I?m relatively new to the world of python Welcome! > but in my short time here I?ve > fallen in love with how readable this language is. One issue that I?ve > seen in a lot of languages struggle with is nested function calls. > Parenthesis when nested inherently create readability issues. I stumbled > upon what I believe is an elegant solution within the elm platform in > their use of the backward pipe operator <|. Please use text -- it save responders from having to reenter the non-text content> > Suggested structure: > > print() <| some_func() <| another_func("Hello") My first question is what does this look like when print() and some_func() have other parameters? In other words, what would this look like? print('hello', name, some_func('whatsit', another_func('good-bye')), sep=' .-. ') Currently, I would format that as: print( 'hello', name, some_func( 'whatsit', another_func( 'good-bye') ), ), sep=' .-. ', ) Okay, maybe a few more new-lines than such a short example requires, but that's the idea. -- ~Ethan~ From random832 at fastmail.com Fri Jan 27 17:50:44 2017 From: random832 at fastmail.com (Random832) Date: Fri, 27 Jan 2017 17:50:44 -0500 Subject: [Python-ideas] Is it Python 3 yet? In-Reply-To: References: <1485468549.2813607.860870640.2970243F@webmail.messagingengine.com> <880e6e87-0e45-0d47-28b7-01a2773a23b8@egenix.com> Message-ID: <1485557444.4118792.862047608.6A62527A@webmail.messagingengine.com> On Fri, Jan 27, 2017, at 12:54, C Anthony Risinger wrote: > I know the scientific community is a big and important part of the > Python ecosystem, but I honestly believe other parts of Python are > suffering from any dragging of feet at this point. Python 3 has been > out nearly a decade, and I think it would be super for the community > to take a bold stance (is it still bold 9 years later?) and really > stand behind Python 3, prominently, almost actively working to > diminish Python 2. This particular subthread is regarding whether to make a 64-bit version of python 2 and/or 3 (whatever is done regarding the other question) the default download button for users coming from Win64 browsers. At least, the bits you're responding to are talking about 32-bit libraries rather than Python 2. From anthony at xtfx.me Fri Jan 27 21:11:22 2017 From: anthony at xtfx.me (C Anthony Risinger) Date: Fri, 27 Jan 2017 20:11:22 -0600 Subject: [Python-ideas] Is it Python 3 yet? In-Reply-To: <1485557444.4118792.862047608.6A62527A@webmail.messagingengine.com> References: <1485468549.2813607.860870640.2970243F@webmail.messagingengine.com> <880e6e87-0e45-0d47-28b7-01a2773a23b8@egenix.com> <1485557444.4118792.862047608.6A62527A@webmail.messagingengine.com> Message-ID: On Jan 27, 2017 4:51 PM, "Random832" wrote: On Fri, Jan 27, 2017, at 12:54, C Anthony Risinger wrote: > I know the scientific community is a big and important part of the > Python ecosystem, but I honestly believe other parts of Python are > suffering from any dragging of feet at this point. Python 3 has been > out nearly a decade, and I think it would be super for the community > to take a bold stance (is it still bold 9 years later?) and really > stand behind Python 3, prominently, almost actively working to > diminish Python 2. This particular subthread is regarding whether to make a 64-bit version of python 2 and/or 3 (whatever is done regarding the other question) the default download button for users coming from Win64 browsers. At least, the bits you're responding to are talking about 32-bit libraries rather than Python 2. Yeah, I guess I was trying to push against any further stagnation, of any kind, on forward-facing questions like 32/64 bit and 2/3 version. I hesitated to say anything because I don't feel I'm adding much concrete or even useful information to the conversation, but it's something that's been building internally for a long time while observing the overarching tone and outcomes of Python threads. I can't articulate it we'll, or even fully isolate the reasons for it. All I really know is how I feel when peers ask me about Python or the reading I get when others speak about their experience using it. Python is absolutely one of my favorite languages to write, yet I find myself recommending against it, and watching others do the same. Python comes with caveats and detailed explanations out the gate and people simply perceive higher barriers and more chores. I don't have any truly constructive input so I'll stop here; I only wanted to voice that in my tiny tiny bubble, I'm watching market share diminish, it's unfortunate, and I'm not sure what to do about it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From anthony at xtfx.me Sat Jan 28 02:20:11 2017 From: anthony at xtfx.me (C Anthony Risinger) Date: Sat, 28 Jan 2017 01:20:11 -0600 Subject: [Python-ideas] A more readable way to nest functions In-Reply-To: <588BBB96.4080906@stoneleaf.us> References: <588BBB96.4080906@stoneleaf.us> Message-ID: On Fri, Jan 27, 2017 at 3:28 PM, Ethan Furman wrote: > On 01/27/2017 01:07 PM, Brent Brinkley wrote: > >> Suggested structure: >> >> print() <| some_func() <| another_func("Hello") >> > > My first question is what does this look like when print() and some_func() > have other parameters? In other words, what would this look like? > > print('hello', name, some_func('whatsit', another_func('good-bye')), > sep=' .-. ') The Elixir pipe operator looks pretty close to the suggested style, but the argument order is reversed: another_func('good-bye') |> some_func('whatsit') |> print('hello', name, sep=' .-. ') This isn't exactly equivalent to the example though because the result of each call is passed as the first argument to the next function. I think it looks nice when it's the right fit, but it's limited to the first argument. -- C Anthony -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Sat Jan 28 06:26:33 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 28 Jan 2017 11:26:33 +0000 Subject: [Python-ideas] Using Python for end user applications Message-ID: On 28 January 2017 at 02:11, C Anthony Risinger wrote: > I can't articulate it we'll, or even fully isolate the reasons for it. All I > really know is how I feel when peers ask me about Python or the reading I > get when others speak about their experience using it. Python is absolutely > one of my favorite languages to write, yet I find myself recommending > against it, and watching others do the same. Python comes with caveats and > detailed explanations out the gate and people simply perceive higher > barriers and more chores. Picking up on this and the comment you made in the original post > With a still difficult distribution/compatibility story, I've watched dozens of instances > where people choose something else, usually Node or Golang. Can you explain why you recommend against Python, in a bit more detail? If you are an enthusiastic Python user, but you are steering people away from Python, then it would be worth understanding why. As you mention end user applications and distribution, one of my first questions would be what platform you work on. Following on from that, what sort of end user applications are you looking at? If we're talking here about games for iOS, then that's a much different situation than GUI apps for Windows or command line tools for Linux. My personal feeling is that Python happily works in the "Command line tools for Linux" area (except possibly with regard to C extensions where the plethora of Linux ABIs makes things hard). But other areas less so. I've been having good experiences making standalone applications with the new Windows "embedded" distributions, but that is relatively new, and still has a lot of rough edges. I'm working on a project to bundle a working zipapp with the embedded distribution to make a standalone exe - would having something like that make any difference in your environment? So I think it would be good to understand precisely where and why you feel that you need to recommend Go or Node over Python. It's possible that we have to accept that your situation is simply not a use case that Python is well suited for, but equally it may be that there's something we can do. Paul From edk141 at gmail.com Sat Jan 28 07:41:24 2017 From: edk141 at gmail.com (Ed Kellett) Date: Sat, 28 Jan 2017 12:41:24 +0000 Subject: [Python-ideas] A more readable way to nest functions In-Reply-To: <588BBB96.4080906@stoneleaf.us> References: <588BBB96.4080906@stoneleaf.us> Message-ID: On Fri, 27 Jan 2017 at 21:29 Ethan Furman wrote: On 01/27/2017 01:07 PM, Brent Brinkley wrote: > Suggested structure: > > print() <| some_func() <| another_func("Hello") My first question is what does this look like when print() and some_func() have other parameters? In other words, what would this look like? print('hello', name, some_func('whatsit', another_func('good-bye')), sep=' .-. ') This idea doesn't solve the general problem well, but I'm not convinced that it needs to; that can be addressed by making partial function application syntax nicer. Although I think it's probably fairly useful anyway. FWIW, I'd spell it without the (), so it's simply a right-associative binary operator on expressions, (a -> b, a) -> b, rather than magic syntax. print XYZ some_func XYZ another_func("Hello") -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sat Jan 28 07:55:50 2017 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 28 Jan 2017 23:55:50 +1100 Subject: [Python-ideas] A more readable way to nest functions In-Reply-To: References: <588BBB96.4080906@stoneleaf.us> Message-ID: On Sat, Jan 28, 2017 at 11:41 PM, Ed Kellett wrote: > FWIW, I'd spell it without the (), so it's simply a right-associative binary > operator on expressions, (a -> b, a) -> b, rather than magic syntax. > > print XYZ some_func XYZ another_func("Hello") I'm not entirely sure I understand your example; are you using "XYZ" as an operator, or a token (another parameter)? I think probably the former, but you may mean this differently. In any case, it's a new syntax that does exactly the same thing that we can already do, just with more restrictions, and arguably more readably. That means it has to have a SIGNIFICANT advantage over the current syntax - a pretty high bar. I don't see that it's cleared that bar; it is, at best, a small and incremental change. ChrisA From z+py+pyideas at m0g.net Sat Jan 28 09:16:27 2017 From: z+py+pyideas at m0g.net (zmo) Date: Sat, 28 Jan 2017 15:16:27 +0100 Subject: [Python-ideas] A more readable way to nest functions In-Reply-To: References: <588BBB96.4080906@stoneleaf.us> Message-ID: <20170128141627.4i3xeyqgh27duqv5@BuGz.eclipse.m0g.net> Hi list o/ This idea sounds fun, so as a thought experiment why not imagine one way of integrating it in what I believe would be pythonic enough. On Sat, Jan 28, 2017 at 12:41:24PM +0000, Ed Kellett wrote: > FWIW, I'd spell it without the (), so it's simply a right-associative > binary operator on expressions, (a -> b, a) -> b, rather than magic syntax. > print XYZ some_func XYZ another_func("Hello") I agree this would look a bit more elegant. To focus on the feature of that operator, instead of how to write it, I'll use XYZ instead of <| in this post. So, considering it's decided that the RHS is in charge of filling up all the arguments of the LHS, how to deal with positional and keyword arguments without introducing new syntax? Should it be by returning a tuple of positional iterable and keyword dict? i.e.: def fn_a(*args, **kwarg): print("args: {}, kwarg: {}".format(args, kwarg)) def fn_b(): return (1,2,3), {'a':1, 'b':2, 'c':3} fn_a XYZ fn_b() but then if we pass only positional would the following be ok? def fn_b(): return (1,2,3) or should it look like this one, it being a tuple, but with the second part being empty: def fn_b(): return (1,2,3), so to avoid confusing if we want to pass a dict as second positional argument of fn_a(): def fn_b(): return (1, {'a': 2}), anyway, I guess it's pretty safe to assume that if fn_b() returns a scalar, it'll be easy to assume it's just a single positional argument. That being said, then if the chosen syntax is like the following: > print XYZ some_func XYZ another_func("Hello") and given we decide to apply the rules I'm suggesting above, why not make this function dumb simple, it being: * "take the RHS scalar or tuple, and apply it as arguments to the LHS" Meaning that the above could also be written as: print XYZ some_func XYZ another_func XYZ "Hello" Then the basic operator definition could be done with a dunder looking like: def __application__(self, other): if isinstance(other, Iterable): if (len(other) == 2 and isinstance(other[0], tuple) and isinstance(other[1], dict)): return self(*other[0], **other[1]) elif (len(other) == 1 and isinstance(other[0], tuple): return self(*other[0]) return self(other) In practice, such a scheme would make it possible to have: print XYZ (("Hello World",), {"file": sys.stderr}) Another thing I'm wondering, should the whole syntax be an expression? I believe it should, so it fits in python3 logic of everything ? except control statements ? is an expression: print(fn_a XYZ fn_b(), file=sys.stderr) But the danger is that it might lead to very long lines: print XYZ my_first_function XYZ my_second_function XYZ my_third_function XYZ my_fourth_function leading to either continuing spaces or wrapping in parenthesis: print XYZ my_first_function \ XYZ my_second_function \ XYZ my_third_function \ XYZ my_fourth_function (print XYZ my_first_function XYZ my_second_function XYZ my_third_function XYZ my_fourth_function) but it would then be avoiding the silly stack of closing parenthesis: print(my_first_function( my_second_function( my_third_function( my_fourth_function())))) All in all, it can be a nice syntactic sugar to have which could make it more flexible working with higher order functions, but it with the way I'm suggesting to comply with python's arguments handling, it offers little advantages when the RHS is not filling LHS arguments: >>> print(all(map(lambda x: x>2, filter(lambda x: isinstance(x, int), range(0,3))))) True vs >>> print XYZ all XYZ map XYZ (lambda x: x>2, filter(lambda x: isinstance(x, int), range(0,3))), True Here, applying map onto all onto print offers a great readability, but for passing arguments to map, not so much. So the question end up being: is application of *all* arguments of a function from return value of another function a common enough pattern to justify a new syntax that would make it better *only* then? Or maybe instead of passing a tuple of parameters could we stack parameters up with the XYZ operator up until a callable is reached, so that: >>> print XYZ all XYZ map XYZ lambda x: x>2 XYZ filter XYZ lambda x: isinstance(x, int) XYZ range(0,3) But then how can it be told that we want: `(lambda x: isinstance(x), range(0,3)` to be fed to `filter`, and not `range(0,3)` to be fed to `lambda x: isinstance(x, int)`? But then it would be just another way to introduce currying as a language feature with an operator, so we should then just discuss on how to add currying as a language syntax "by the book", but I'm pretty sure that's a topic already discussed before I joined this list ;-) that was my ?0.02 Cheers, -- zmo From edk141 at gmail.com Sat Jan 28 11:37:09 2017 From: edk141 at gmail.com (Ed Kellett) Date: Sat, 28 Jan 2017 16:37:09 +0000 Subject: [Python-ideas] A more readable way to nest functions In-Reply-To: <20170128141627.4i3xeyqgh27duqv5@BuGz.eclipse.m0g.net> References: <588BBB96.4080906@stoneleaf.us> <20170128141627.4i3xeyqgh27duqv5@BuGz.eclipse.m0g.net> Message-ID: On Sat, 28 Jan 2017 at 14:27 zmo via Python-ideas wrote: > I agree this would look a bit more elegant. To focus on the feature of > that operator, instead of how to write it, I'll use XYZ instead of <| in > this post. My thoughts exactly :) > So, considering it's decided that the RHS is in charge of filling up all > the arguments of the LHS, how to deal with positional and keyword > arguments without introducing new syntax? > My instinct is that we don't need to deal with that; that's what partial application is for. To be fair, I'd advocate better syntax for that, but it's another issue. > anyway, I guess it's pretty safe to assume that if fn_b() returns a > scalar, it'll be easy to assume it's just a single positional argument. > > > print XYZ some_func XYZ another_func("Hello") > > [...] > > Meaning that the above could also be written as: > > print XYZ some_func XYZ another_func XYZ "Hello" That looks good to me, but I think another_func("Hello") is the better one to recommend. I think it makes it slightly more obvious what is going on. > Then the basic operator definition could be done with a dunder > looking like: [...] I think the special-casiness here is unfortunate and would cause problems. a(b()) doesn't randomly pass kwargs to a if b happens to return a certain kind of thing. -------------- next part -------------- An HTML attachment was scrubbed... URL: From elazarg at gmail.com Sat Jan 28 12:06:30 2017 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Sat, 28 Jan 2017 17:06:30 +0000 Subject: [Python-ideas] A more readable way to nest functions In-Reply-To: References: <588BBB96.4080906@stoneleaf.us> <20170128141627.4i3xeyqgh27duqv5@BuGz.eclipse.m0g.net> Message-ID: The title is misleading : it should be "nesting function calls" Elazar ?????? ???, 28 ????' 2017, 18:38, ??? Ed Kellett ?: > On Sat, 28 Jan 2017 at 14:27 zmo via Python-ideas > wrote: > > I agree this would look a bit more elegant. To focus on the feature of > that operator, instead of how to write it, I'll use XYZ instead of <| in > this post. > > > My thoughts exactly :) > > > So, considering it's decided that the RHS is in charge of filling up all > the arguments of the LHS, how to deal with positional and keyword > arguments without introducing new syntax? > > > My instinct is that we don't need to deal with that; that's what partial > application is for. To be fair, I'd advocate better syntax for that, but > it's another issue. > > > anyway, I guess it's pretty safe to assume that if fn_b() returns a > scalar, it'll be easy to assume it's just a single positional argument. > > > print XYZ some_func XYZ another_func("Hello") > > [...] > > > > Meaning that the above could also be written as: > > print XYZ some_func XYZ another_func XYZ "Hello" > > > That looks good to me, but I think another_func("Hello") is the better one > to recommend. I think it makes it slightly more obvious what is going on. > > > Then the basic operator definition could be done with a dunder > > looking like: [...] > > > I think the special-casiness here is unfortunate and would cause problems. > a(b()) doesn't randomly pass kwargs to a if b happens to return a certain > kind of thing. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Jan 28 21:30:13 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 29 Jan 2017 13:30:13 +1100 Subject: [Python-ideas] A more readable way to nest functions In-Reply-To: <20170128141627.4i3xeyqgh27duqv5@BuGz.eclipse.m0g.net> References: <588BBB96.4080906@stoneleaf.us> <20170128141627.4i3xeyqgh27duqv5@BuGz.eclipse.m0g.net> Message-ID: <20170129023012.GL7345@ando.pearwood.info> On Sat, Jan 28, 2017 at 03:16:27PM +0100, zmo via Python-ideas wrote: > Hi list o/ > > This idea sounds fun, so as a thought experiment why not imagine one > way of integrating it in what I believe would be pythonic enough. This idea is sometimes called "the Collection Pipeline" design pattern, and is used in various command shells. Martin Fowler wrote about this design pattern here: https://martinfowler.com/articles/collection-pipeline/ and I wrote a recipe for it: https://code.activestate.com/recipes/580625-collection-pipeline-in-python/ with a working, although basic, implementation. The recipe shows that we don't need new syntax for this sort of feature. I'm rather partial to either the | or >> operators, both of which are rarely used except by ints. Nor does it need to be a built-in part of the language. It could be a third-party module, or a library module. I think that the most important feature of pipeline syntax is that we write the functions in the same order that they are applied, instead of backwards. Instead of: print(list(map(float, filter(lambda n: 20 < n < 30, data)))) where you have to read all the way to the right to find out what you are operating on, and then read backwards to the left in order to follow the execution order, a pipeline starts with the argument and then applies the functions in execution order: data | Filter(lambda n: 20 < n < 30) | Map(float) | List | Print (In principle, Python built-ins could support this sort of syntax so I could write filter, map, list, print rather than custom versions Filter, Map, etc. That would feel very natural to a language like Haskell, for example, where partial function application is a fundamental part of the language. But for Python that would be a *major* change, and not one I wish to propose. Easier to just have a separate, parallel set of pipeline functions, with an easy way to create new ones. A module is perfect for that.) Now we can see that these sorts of pipelines are best suited for a particular style of programming. It doesn't work so well for arbitrary function calls where the data arg could end up in any argument position: aardvark(1, 2, cheese('a', eggs(spam(arg), 'b')), 4) But I don't see that as a problem. This is not a replacement for regular function call syntax in its full generality, but a powerful design pattern for solving certain kinds of problems. > On Sat, Jan 28, 2017 at 12:41:24PM +0000, Ed Kellett wrote: > > FWIW, I'd spell it without the (), so it's simply a right-associative > > binary operator on expressions, (a -> b, a) -> b, rather than magic syntax. > > print XYZ some_func XYZ another_func("Hello") > > I agree this would look a bit more elegant. To focus on the feature of > that operator, instead of how to write it, I'll use XYZ instead of <| in > this post. > > So, considering it's decided that the RHS is in charge of filling up all > the arguments of the LHS, Is that actually decided? That seems to break the advantage of a pipeline: the left-to-right order. To understand your syntax, you have to read from the right backwards to the left: # print(list(map(float, filter(lambda n: 20 < n < 30, data)))) print XYZ list XYZ map(float) XYZ filter(lambda n: 20 < n < 30, data) That's actually longer than the current syntax. Actually, I don't think this would work using your idea. filter would need to pass on *all* of map's arguments, not just the data argument: filter(float, lambda n: 20 < n < 30, data,) # returns a tuple (float, FilterObject) which gives us: print XYZ list XYZ map XYZ filter(float, lambda n: 20 < n < 30, data) But of course filter doesn't actually have that syntax, so either we have a new, parallel series of functions including Filter(...) or we write something like: print XYZ list XYZ map XYZ lambda (f1, f2, arg): (f1, filter(f2, arg))(float, lambda n: 20 < n < 30, data) which is simply horrid. Maybe there could be a series of helper functions, but I don't think this idea is workable. See below. > how to deal with positional and keyword > arguments without introducing new syntax? Should it be by returning a > tuple of positional iterable and keyword dict? i.e.: > > def fn_a(*args, **kwarg): > print("args: {}, kwarg: {}".format(args, kwarg)) > > def fn_b(): > return (1,2,3), {'a':1, 'b':2, 'c':3} > > fn_a XYZ fn_b() The problem is that each function needs to know what arguments the *next* function expects. That means that the function on the right needs to have every argument used by the entire pipeline, and each function has to take the arguments it needs and pass on the rest. It also means that everything is very sensitive to the order that arguments are expected: def spam(func, data): ... def ham(argument, function): ... spam XYZ foo(bar, data) ham XYZ foo(bar, data) What should foo() return? [...] > In practice, such a scheme would make it possible to have: > > print XYZ (("Hello World",), {"file": sys.stderr}) In what way is this even close to an improvement over the existing function call syntax? print XYZ (("Hello World",), {"file": sys.stderr}) print("Hello World", file=sys.stderr) If "Hello World" wasn't a literal, but came from somewhere else: print XYZ ((greetings(),), {"file": sys.stderr}) print(greetings(), file=sys.stderr) so you're not even avoiding nested parentheses. > All in all, it can be a nice syntactic sugar to have which could make it > more flexible working with higher order functions, but it with the way > I'm suggesting to comply with python's arguments handling, it offers > little advantages when the RHS is not filling LHS arguments: > > >>> print(all(map(lambda x: x>2, filter(lambda x: isinstance(x, int), range(0,3))))) > True > > vs > > >>> print XYZ all XYZ map XYZ (lambda x: x>2, filter(lambda x: isinstance(x, int), range(0,3))), > True I think that "literal advantage" is being very kind. The best you can say is that you save two pairs of parentheses at the cost of three operators and moving arguments away from the functions that use them. > Here, applying map onto all onto print offers a great readability, I don't think so. At *best*, it is no better than what we already have: print XYZ all XYZ map XYZ ... print ( all ( map ( ... but moving the arguments away from where they are used makes it unspeakable. Consider: def double(values): for v in values: return 2*v print(max(map(float, double(range(5))))) How would I use your syntax? print XYZ max XYZ map float XYZ double XYZ range XYZ 5 doesn't work without new syntax, and print XYZ max XYZ map XYZ double XYZ range XYZ (float, 5) doesn't work without re-writing range and double to pass on unused arguments. I'd need partial application: from functools import partial print XYZ max XYZ partial(map, float) XYZ double XYZ range XYZ 5 which is now starting to look like a collection pipeline written out backwards: 5 | Range | Apply(double) | Map(float) | Max | Print where (again) the Capital letter functions will be pipe-compatible versions of the usual range, map, etc. They don't necessarily have to be prepared before hand: many could be a simple wrapper around the built-in: Max = Apply(max) There may be ways to avoid even that. A third-party library is a good place to experiment with these questions, this is in no way ready for the standard library, let alone a new operator. [...] > But then it would be just another way to introduce currying as a > language feature with an operator, so we should then just discuss on how > to add currying as a language syntax "by the book", but I'm pretty sure > that's a topic already discussed before I joined this list ;-) The easiest way to support currying, or at least some form of it, is: from functools import partial as p p(map, float) # curries map with a single argument float which is not quite the map(float) syntax Haskell programmers expect, but its not awful. -- Steve From z+py+pyideas at m0g.net Sun Jan 29 06:08:02 2017 From: z+py+pyideas at m0g.net (zmo) Date: Sun, 29 Jan 2017 12:08:02 +0100 Subject: [Python-ideas] A more readable way to nest functions In-Reply-To: <20170129023012.GL7345@ando.pearwood.info> References: <588BBB96.4080906@stoneleaf.us> <20170128141627.4i3xeyqgh27duqv5@BuGz.eclipse.m0g.net> <20170129023012.GL7345@ando.pearwood.info> Message-ID: <20170129110802.6gtgfrcs4yiyenfn@BuGz.eclipse.m0g.net> tl;dr: I agree with you, Steven, as proven by my former post, augmented with the details of your reply: there's no advantage to add a new operator and language construct for this use case.? On Sun, Jan 29, 2017 at 01:30:13PM +1100, Steven D'Aprano wrote: > On Sat, Jan 28, 2017 at 03:16:27PM +0100, zmo via Python-ideas wrote: > > This idea sounds fun, so as a thought experiment why not imagine one > > way of integrating it in what I believe would be pythonic enough. > This idea is sometimes called "the Collection Pipeline" design pattern, > and is used in various command shells. Martin Fowler wrote about this > design pattern here: > https://martinfowler.com/articles/collection-pipeline/ > and I wrote a recipe for it: > https://code.activestate.com/recipes/580625-collection-pipeline-in-python/ > with a working, although basic, implementation. > print(list(map(float, filter(lambda n: 20 < n < 30, data)))) > [?] > data | Filter(lambda n: 20 < n < 30) | Map(float) | List | Print It's indeed an interesting tip and idea, and using the pipe is not a bad idea as it's a good mnemonic for anyone who used a shell. About reading order, I'm personally agnostic. > (In principle, Python built-ins could support this sort of syntax so I > could write filter, map, list, print rather than custom versions Filter, > Map, etc. [?] But for Python that would be a *major* change, and not one I > wish to propose. [?]) Even as an external library, I would use that kind of syntax with extreme care in python. As a python developer, one of the things I really do enjoy is that any python code looks like a python code, and that's because changing meaning of operators depending on the context is discouraged. Then, unlike Scala, C++ or Ruby, you never end up with the language looking like a new DSL for each application or framework. > > On Sat, Jan 28, 2017 at 12:41:24PM +0000, Ed Kellett wrote: > > So, considering it's decided that the RHS is in charge of filling up all > > the arguments of the LHS, > Is that actually decided? it's not, it's part of the thought experiment of 'if we had such syntax', how could we handle arguments? > [?] so either we have a new, parallel series of functions including > Filter(...) or we write something like: > print XYZ list XYZ map XYZ lambda (f1, f2, arg): (f1, filter(f2, arg))(float, lambda n: 20 < n < 30, data) > which is simply horrid. Maybe there could be a series of helper > functions, but I don't think this idea is workable. [?] > > [?] > > All in all, it can be a nice syntactic sugar to have which could make it > > more flexible working with higher order functions, but it with the way > > I'm suggesting to comply with python's arguments handling, it offers > > little advantages when the RHS is not filling LHS arguments: > > [?] > I think that "literal advantage" is being very kind. The best you can > say is that you save two pairs of parentheses at the cost of three > operators and moving arguments away from the functions that use them. I said "little" not "literal" ? I started the whole reasoning trying to be objective and figure how such a new syntax would be integrated in python and what good use could be made of it. And in the end, I end up with something that can offer a nice syntax for a very niche case, and wouldn't be of much use most of the time. The fact that it can be implemented with some operator overload, as you nicely demonstrated just proves the fact further: this is not a good idea. > [...] > > But then it would be just another way to introduce currying as a > > language feature with an operator, so we should then just discuss on how > > to add currying as a language syntax "by the book", but I'm pretty sure > > that's a topic already discussed before I joined this list ;-) > The easiest way to support currying, or at least some form of it, is: > from functools import partial as p > p(map, float) # curries map with a single argument float > which is not quite the map(float) syntax Haskell programmers expect, > but its not awful. Indeed, I love having that available as a function! We could reopen the debate as to whether we should implement currying into python, but since my last post I've done a bit of searching, and found out it's been discussed 14 years ago: https://mail.python.org/pipermail/python-dev/2004-February/042668.html https://www.python.org/dev/peps/pep-0309/ and a few discussions, implementations of (real) currying published more recently: https://mtomassoli.wordpress.com/2012/03/18/currying-in-python/ http://code.activestate.com/recipes/577928-indefinite-currying-decorator-with-greedy-call-and/ https://gist.github.com/JulienPalard/021f1c7332507d6a494b I could argue that a nicer syntactic sugar and having it as a language feature could help in having it supported in a more optimised fashion, instead of using an added layer of abstraction. But, I won't ^^ Cheers, -- zmo From gerald.britton at gmail.com Sun Jan 29 07:37:11 2017 From: gerald.britton at gmail.com (Gerald Britton) Date: Sun, 29 Jan 2017 07:37:11 -0500 Subject: [Python-ideas] What are your opinions on .NET Core vs Python? Message-ID: It's an apples/oranges comparison. .NET is a library that can be used from many languages, including Python. (Not just IronPython, but also Python for .NET (pythonnet.sourceforge*.*net *))* Python is a language that can use many libraries, including .NET The set of libraries that can be used from all the languages that can also use .NET (out of the box, that is) is smaller. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ned at nedbatchelder.com Sun Jan 29 09:00:15 2017 From: ned at nedbatchelder.com (Ned Batchelder) Date: Sun, 29 Jan 2017 09:00:15 -0500 Subject: [Python-ideas] What are your opinions on .NET Core vs Python? In-Reply-To: References: Message-ID: <8c872aa7-b6d6-dfda-fdac-192bde189012@nedbatchelder.com> On 1/29/17 7:37 AM, Gerald Britton wrote: > It's an apples/oranges comparison. > > .NET is a library that can be used from many languages, including > Python. (Not just IronPython, but also Python for .NET > (pythonnet.sourceforge*.*net*))* > * > * > Python is a language that can use many libraries, including .NET > > The set of libraries that can be used from all the languages that can > also use .NET (out of the box, that is) is smaller. > > This list is for discussing proposals for changes to Python. For open-ended discussion, Python-List would be a better bet. --Ned. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Sun Jan 29 15:38:06 2017 From: mertz at gnosis.cx (David Mertz) Date: Sun, 29 Jan 2017 12:38:06 -0800 Subject: [Python-ideas] A more readable way to nest functions In-Reply-To: <20170129110802.6gtgfrcs4yiyenfn@BuGz.eclipse.m0g.net> References: <588BBB96.4080906@stoneleaf.us> <20170128141627.4i3xeyqgh27duqv5@BuGz.eclipse.m0g.net> <20170129023012.GL7345@ando.pearwood.info> <20170129110802.6gtgfrcs4yiyenfn@BuGz.eclipse.m0g.net> Message-ID: The `tools` (and `cytoolz` that has an identical API) provides an `@curry` decorator that is more general and elegant than the links earlier IMO. Maybe I'm biased because I work with principal author Matt Rocklin, but toolz is really neat. See: http://toolz.readthedocs.io/en/latest/curry.html On Sun, Jan 29, 2017 at 3:08 AM, zmo via Python-ideas < python-ideas at python.org> wrote: > tl;dr: I agree with you, Steven, as proven by my former post, augmented > with the details of your reply: there's no advantage to add a new > operator and language construct for this use case.? > > On Sun, Jan 29, 2017 at 01:30:13PM +1100, Steven D'Aprano wrote: > > On Sat, Jan 28, 2017 at 03:16:27PM +0100, zmo via Python-ideas wrote: > > > This idea sounds fun, so as a thought experiment why not imagine one > > > way of integrating it in what I believe would be pythonic enough. > > > This idea is sometimes called "the Collection Pipeline" design pattern, > > and is used in various command shells. Martin Fowler wrote about this > > design pattern here: > > https://martinfowler.com/articles/collection-pipeline/ > > and I wrote a recipe for it: > > https://code.activestate.com/recipes/580625-collection- > pipeline-in-python/ > > with a working, although basic, implementation. > > print(list(map(float, filter(lambda n: 20 < n < 30, data)))) > > [?] > > data | Filter(lambda n: 20 < n < 30) | Map(float) | List | Print > > It's indeed an interesting tip and idea, and using the pipe is not a bad > idea as it's a good mnemonic for anyone who used a shell. About reading > order, I'm personally agnostic. > > > (In principle, Python built-ins could support this sort of syntax so I > > could write filter, map, list, print rather than custom versions Filter, > > Map, etc. [?] But for Python that would be a *major* change, and not one > I > > wish to propose. [?]) > > Even as an external library, I would use that kind of syntax with > extreme care in python. As a python developer, one of the things I > really do enjoy is that any python code looks like a python code, and > that's because changing meaning of operators depending on the context is > discouraged. > > Then, unlike Scala, C++ or Ruby, you never end up with the language > looking like a new DSL for each application or framework. > > > > On Sat, Jan 28, 2017 at 12:41:24PM +0000, Ed Kellett wrote: > > > So, considering it's decided that the RHS is in charge of filling up > all > > > the arguments of the LHS, > > Is that actually decided? > > it's not, it's part of the thought experiment of 'if we had such syntax', > how could we handle arguments? > > > [?] so either we have a new, parallel series of functions including > > Filter(...) or we write something like: > > print XYZ list XYZ map XYZ lambda (f1, f2, arg): (f1, filter(f2, > arg))(float, lambda n: 20 < n < 30, data) > > which is simply horrid. Maybe there could be a series of helper > > functions, but I don't think this idea is workable. [?] > > > > [?] > > > All in all, it can be a nice syntactic sugar to have which could make > it > > > more flexible working with higher order functions, but it with the way > > > I'm suggesting to comply with python's arguments handling, it offers > > > little advantages when the RHS is not filling LHS arguments: > > > [?] > > I think that "literal advantage" is being very kind. The best you can > > say is that you save two pairs of parentheses at the cost of three > > operators and moving arguments away from the functions that use them. > > I said "little" not "literal" ? I started the whole reasoning trying to > be objective and figure how such a new syntax would be integrated in > python and what good use could be made of it. And in the end, I end up > with something that can offer a nice syntax for a very niche case, and > wouldn't be of much use most of the time. > > The fact that it can be implemented with some operator overload, as you > nicely demonstrated just proves the fact further: this is not a good > idea. > > > [...] > > > But then it would be just another way to introduce currying as a > > > language feature with an operator, so we should then just discuss on > how > > > to add currying as a language syntax "by the book", but I'm pretty sure > > > that's a topic already discussed before I joined this list ;-) > > The easiest way to support currying, or at least some form of it, is: > > from functools import partial as p > > p(map, float) # curries map with a single argument float > > which is not quite the map(float) syntax Haskell programmers expect, > > but its not awful. > > Indeed, I love having that available as a function! We could reopen the > debate as to whether we should implement currying into python, but since > my last post I've done a bit of searching, and found out it's been > discussed 14 years ago: > > https://mail.python.org/pipermail/python-dev/2004-February/042668.html > https://www.python.org/dev/peps/pep-0309/ > > and a few discussions, implementations of (real) currying published more > recently: > > https://mtomassoli.wordpress.com/2012/03/18/currying-in-python/ > http://code.activestate.com/recipes/577928-indefinite- > currying-decorator-with-greedy-call-and/ > https://gist.github.com/JulienPalard/021f1c7332507d6a494b > > I could argue that a nicer syntactic sugar and having it as a language > feature could help in having it supported in a more optimised fashion, > instead of using an added layer of abstraction. But, I won't ^^ > > Cheers, > > -- > zmo > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From songofacandy at gmail.com Mon Jan 30 04:21:57 2017 From: songofacandy at gmail.com (INADA Naoki) Date: Mon, 30 Jan 2017 18:21:57 +0900 Subject: [Python-ideas] https://docs.python.org/fr/ ? In-Reply-To: <56ED7EBD.5050804@palard.fr> References: <56ED7EBD.5050804@palard.fr> Message-ID: There are some updates about this topic. And I have something to discuss to get things forward. We (Japanese translation team) and Julien start sharing one Transifex project. Please see this dashboard. We have nice progress. https://www.transifex.com/python-doc/python-35/dashboard/ Julien want hosting french documentation at https://docs.python.org/fr/ I don't have strong opinion about where to hosting, but I want to share more efforts (automated build + hosting) with Julien and other language communities. Since Berker Peksag against about hosting translated document on docs.python.org [1], I'm considering about using github pages. I got "python-docs" organization already for it [2]. So translated documents can be hosted on URL like https://python-docs.github.io/py35-ja/ or https://python-docs.github.io/py35/fr/ . (first part of the path should be same to repository name). I already uses github pages for testing Japanese translation [3]. It's nice place to host webpage. We can get https and CDN for free. [1]: https://github.com/python/docsbuild-scripts/pull/8 [2]: https://github.com/python-docs [3]: https://python-doc-ja.github.io/py35/ So I want to discuss (and get consensus) about where should we host translated document. a) translated docs on docs.python.org / issue tracker on github.com/python-docs/ b) translated docs on python-docs.github.io / issue tracker on github.com/python-docs/ c) We shouldn't use neither "python-docs" nor "docs.python.org". Translation should be in other community. d) Other idea? What do you think? Regards, On Sun, Mar 20, 2016 at 1:30 AM, Julien Palard wrote: > o/ > > The french translation of the Python Documentation [1][2] has translated 20% > of the pageviews of docs.python.org. I think it's the right moment to push > it do docs.python.org. So there's some questions ! And I'd like feedback. > > TL;DR (with my personal choices): > - URL may be "http://docs.python.org/fr/" > - For localized variations of languages we should use dash and lowercase > like "docs.python.org/pt-br/" > - po files may be hosted on the python's github > - existing script to build doc may be patched to build translations > - each translations may crosslink to others > - untranslated strings may be visually marked as so > > I also opened: http://bugs.python.org/issue26546. > > # Chronology, dependencies > > The only blocking decision here is the URL, (also reviewing my patch ...), > with those two, translated docs can be pushed to production, and the other > steps can be discussed and applied one by one. > > # The URL > > ## CCTLD vs path vs subdomain > > I think we should use a variation of "docs.python.org/fr/" for simplicity > and clarity. > > I think we should avoid using CCTLDs as they're sometime hard or near > impossible to obtain (may cost a lot of time), also some are expensive, so > it's time and money we clearly don't need to loose. > > Last possibility I see is to use a subdomain, like fr.docs.python.org or > docs.fr.python.org but I don't think it's the role / responsibility of the > sub-domain to do it. > > So I'm for docs.python.org/LANGUAGE_TAG/ (without moving current > documentation inside a /en/). > > ## Language tag in path > > ### Dropping the default locale of a language > > I personally think we should not show the region in case it's redundant: so > to use "fr" instead of "fr-FR", "de" instead of "de-DE", but keeping the > possibility to use a locale code when it's not redundant like for "pt-br" or > "de-AT" (German ('de') as used in Austria ('AT')). > > I think so because I don't think we'll have a lot of locale variations (like > de-AT, fr-CH, fr-CA, ...) so it will be most of the time redundant (visually > heavy, longer to type, longer to read) but we'll still need some locale > (pt-BR typically). > > ### gettext VS IETF language tag format > > gettext goes by using an underscore between language and locale [3] and IETF > goes by using a dash [4][5]. > > As sphinx is using gettext, and gettext uses underscore we may choose > underscore too. But URLs are not here to leak the underlying implementation, > and the IETF looks like to be the standard way to represent language tags. > Also I visually prefer the dash over the underscore, so I'm for the dash > here. > > ### Lower case vs upper case local tag > > RFC 5646 section-2.1 tells us language tags are not case sensitive, yet > ISO3166-1 recommends that country codes (part of the language tag) be > capitalized. I personally prefer the all-lowercase one as paths in URLs > typically are lowercase. I searched for `inurl:"pt-br"` to see if I'm not > too far away from the usage here and usage seems to agree with me, although > there's some "pt-BR" in urls. > > # Where to host the translated files > > Currently we're hosting the *po* files in the afpy's (Francophone > association for python) [6] github [1] but it may make sense to use (in the > generation scripts) a more controlled / restricted clone in the python > github, at least to have a better view of who can push on the documentation. > > We may want to choose between aggregating all translations under the same > git repository but I don't feel it's useful. > > # How to > > Currently, a python script [7] is used to generate `docs.python.org`, I > proposed a patch in [8] to make this script clone and build the french > translation too, it's a simple and effective way, I don't think we need more > ? Any idea welcome. > > In our side, we have a Makefile [12] to build the translated doc which is > only a thin layer on top of the Sphinx Makefile. So my proposed patch to > build scripts "just" delegate the build to our Makefile which itself > delegate the hard work to the Sphinx Makefile. > > # Next ? > > ## Document how to translate Python > > I think I can (should) write a documentation on "how to start a Python doc > translation project" and "how to migrate existing [9][10][11] python doc > translation projects to docs.python.org" if french does goes docs.python.org > because it may hopefully motivate people to do the same, and I think our > structure is a nice way to do it (A Makefile to generate the doc, all > versions translated, people mainly working on latest version, scripts to > propagating translations to older version, etc...). > > ## Crosslinking between existing translations > > Once the translations are on `docs.python.org`, crosslinks may be > established so people on a version can be aware of other version, and easily > switch to them. I'm not a UI/UX man but I think we may have a select box > right before the existing select box about version, on the top-left corner. > Right before because it'll reflect the path: /fr/3.5/ -> [select box > fr][select box 3.5]. > > ## Marking as "untranslated, you can help" the untranslated paragraphs > > The translations will always need work to follow upstream modifications: > marking untranslated paragraphs as so may transform the "Oh they suck, this > paragraph is not even translated :-(" to "Hey, nice I can help translating > that !". There's an opened sphinx-doc ticket to do so [13] but I have not > worked on it yet. As previously said I'm real bad at designing user > interfaces, so I don't even visualize how I'd like it to be. > > > [1] http://www.afpy.org/doc/python/3.5/ > [2] https://github.com/afpy/python_doc_fr > [3] https://www.gnu.org/software/gettext/manual/html_node/Locale-Names.html > [4] http://tools.ietf.org/html/rfc5646 > [5] https://en.wikipedia.org/wiki/IETF_language_tag > [6] http://www.afpy.org/ > [7] https://github.com/python/docsbuild-scripts/ > [8] http://bugs.python.org/issue26546 > [9] http://docs.python.jp/3/ > [10] https://github.com/python-doc-ja/python-doc-ja > [11] http://docs.python.org.ar/tutorial/3/index.html > [12] https://github.com/AFPy/python_doc_fr/blob/master/Makefile > [13] https://github.com/sphinx-doc/sphinx/issues/1246 > > -- > Julien Palard > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From jsbueno at python.org.br Mon Jan 30 07:57:07 2017 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Mon, 30 Jan 2017 10:57:07 -0200 Subject: [Python-ideas] A more readable way to nest functions In-Reply-To: References: <588BBB96.4080906@stoneleaf.us> <20170128141627.4i3xeyqgh27duqv5@BuGz.eclipse.m0g.net> <20170129023012.GL7345@ando.pearwood.info> <20170129110802.6gtgfrcs4yiyenfn@BuGz.eclipse.m0g.net> Message-ID: Just for one mor eexample - I have a toy implementation here as well: https://github.com/jsbueno/chillicurry I decided to use the "." operator itself, and frame introspection to retrieve the function to be called from the calling context - that is rude, I know. But what I like of this approach is that by using the "." I have full control of the objects called as functions on the chain - which allowed me to use a constant value that will be replaced by the "dynamic" parameter from previous function calls. So, if I want "max(len(mytext), 10)", I can write "curry.max(DELAY, 10).len(mytext)" - the "max" call will only take place after len is evaluated. (And for functions wth a single parameter, there is no need for that) Lessons learned: 1) One wanting curry can do so without changing Python Syntax 2) For control of functions that need more than one parameter, one needs a lazy-call mechanism. Possibly the "lazy call" mechanism would be a more interesting "add on" to the language than currying per se. (It is due to not being able to lazy call that I resort to the transforms using ".") That said, if anyone like the the "chillicurry" approach enough that wants to help polishing it enough for pypi, just get in touch. js -><- On 29 January 2017 at 18:38, David Mertz wrote: > The `tools` (and `cytoolz` that has an identical API) provides an `@curry` > decorator that is more general and elegant than the links earlier IMO. > Maybe I'm biased because I work with principal author Matt Rocklin, but > toolz is really neat. > > See: http://toolz.readthedocs.io/en/latest/curry.html > > > > On Sun, Jan 29, 2017 at 3:08 AM, zmo via Python-ideas < > python-ideas at python.org> wrote: > >> tl;dr: I agree with you, Steven, as proven by my former post, augmented >> with the details of your reply: there's no advantage to add a new >> operator and language construct for this use case.? >> >> On Sun, Jan 29, 2017 at 01:30:13PM +1100, Steven D'Aprano wrote: >> > On Sat, Jan 28, 2017 at 03:16:27PM +0100, zmo via Python-ideas wrote: >> > > This idea sounds fun, so as a thought experiment why not imagine one >> > > way of integrating it in what I believe would be pythonic enough. >> >> > This idea is sometimes called "the Collection Pipeline" design pattern, >> > and is used in various command shells. Martin Fowler wrote about this >> > design pattern here: >> > https://martinfowler.com/articles/collection-pipeline/ >> > and I wrote a recipe for it: >> > https://code.activestate.com/recipes/580625-collection-pipel >> ine-in-python/ >> > with a working, although basic, implementation. >> > print(list(map(float, filter(lambda n: 20 < n < 30, data)))) >> > [?] >> > data | Filter(lambda n: 20 < n < 30) | Map(float) | List | Print >> >> It's indeed an interesting tip and idea, and using the pipe is not a bad >> idea as it's a good mnemonic for anyone who used a shell. About reading >> order, I'm personally agnostic. >> >> > (In principle, Python built-ins could support this sort of syntax so I >> > could write filter, map, list, print rather than custom versions Filter, >> > Map, etc. [?] But for Python that would be a *major* change, and not >> one I >> > wish to propose. [?]) >> >> Even as an external library, I would use that kind of syntax with >> extreme care in python. As a python developer, one of the things I >> really do enjoy is that any python code looks like a python code, and >> that's because changing meaning of operators depending on the context is >> discouraged. >> >> Then, unlike Scala, C++ or Ruby, you never end up with the language >> looking like a new DSL for each application or framework. >> >> > > On Sat, Jan 28, 2017 at 12:41:24PM +0000, Ed Kellett wrote: >> > > So, considering it's decided that the RHS is in charge of filling up >> all >> > > the arguments of the LHS, >> > Is that actually decided? >> >> it's not, it's part of the thought experiment of 'if we had such syntax', >> how could we handle arguments? >> >> > [?] so either we have a new, parallel series of functions including >> > Filter(...) or we write something like: >> > print XYZ list XYZ map XYZ lambda (f1, f2, arg): (f1, filter(f2, >> arg))(float, lambda n: 20 < n < 30, data) >> > which is simply horrid. Maybe there could be a series of helper >> > functions, but I don't think this idea is workable. [?] >> >> > > [?] >> > > All in all, it can be a nice syntactic sugar to have which could make >> it >> > > more flexible working with higher order functions, but it with the way >> > > I'm suggesting to comply with python's arguments handling, it offers >> > > little advantages when the RHS is not filling LHS arguments: >> > > [?] >> > I think that "literal advantage" is being very kind. The best you can >> > say is that you save two pairs of parentheses at the cost of three >> > operators and moving arguments away from the functions that use them. >> >> I said "little" not "literal" ? I started the whole reasoning trying to >> be objective and figure how such a new syntax would be integrated in >> python and what good use could be made of it. And in the end, I end up >> with something that can offer a nice syntax for a very niche case, and >> wouldn't be of much use most of the time. >> >> The fact that it can be implemented with some operator overload, as you >> nicely demonstrated just proves the fact further: this is not a good >> idea. >> >> > [...] >> > > But then it would be just another way to introduce currying as a >> > > language feature with an operator, so we should then just discuss on >> how >> > > to add currying as a language syntax "by the book", but I'm pretty >> sure >> > > that's a topic already discussed before I joined this list ;-) >> > The easiest way to support currying, or at least some form of it, is: >> > from functools import partial as p >> > p(map, float) # curries map with a single argument float >> > which is not quite the map(float) syntax Haskell programmers expect, >> > but its not awful. >> >> Indeed, I love having that available as a function! We could reopen the >> debate as to whether we should implement currying into python, but since >> my last post I've done a bit of searching, and found out it's been >> discussed 14 years ago: >> >> https://mail.python.org/pipermail/python-dev/2004-February/042668.html >> https://www.python.org/dev/peps/pep-0309/ >> >> and a few discussions, implementations of (real) currying published more >> recently: >> >> https://mtomassoli.wordpress.com/2012/03/18/currying-in-python/ >> http://code.activestate.com/recipes/577928-indefinite-curryi >> ng-decorator-with-greedy-call-and/ >> https://gist.github.com/JulienPalard/021f1c7332507d6a494b >> >> I could argue that a nicer syntactic sugar and having it as a language >> feature could help in having it supported in a more optimised fashion, >> instead of using an added layer of abstraction. But, I won't ^^ >> >> Cheers, >> >> -- >> zmo >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Mon Jan 30 09:41:06 2017 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 30 Jan 2017 09:41:06 -0500 Subject: [Python-ideas] https://docs.python.org/fr/ ? In-Reply-To: References: <56ED7EBD.5050804@palard.fr> Message-ID: On 1/30/2017 4:21 AM, INADA Naoki wrote: A followup to Julien Palard's post Mar 2016. > We (Japanese translation team) and Julien start sharing one Transifex project. I am in favor of translations AND of making them easy to find. Sharing infrastructure seems sensible. Aside from the base url, I think each repository should mimic the current hierarchical structure of docs.python.org: .../version/document/page.html#paragraph > Please see this dashboard. We have nice progress. > https://www.transifex.com/python-doc/python-35/dashboard/ > > Julien want hosting french documentation at https://docs.python.org/fr/ and doc.python.org/jp, es, etc, as discussed in https://bugs.python.org/issue26546 Regardless of where the pages are physically located, which is to say, what the real base url is, I presume that docs.python.org/fr could be redirected to that location. Assuming a duplicated page hierarchy as suggested above, this would mean that one could access a translation of a page by inserting the country code into the url on the url bar. This in turn means that one could open both English original and translation on side-by-side tabs or windows by right-clicking on one of the permalinks on the page and selecting 'Open in new tab/window' and then opening translation on the current tab, as suggested above. (This could be made easier, but is good enough to start.) > I don't have strong opinion about where to hosting, but I want to share > more efforts (automated build + hosting) with Julien and other language > communities. > > Since Berker Peksag against about hosting translated document on > docs.python.org [1], I read the discussion and Berker is understandably against having to deal with translation issues on bugs.python.org. To mitigate against this, each translated page should have at the top the equivalent of "This is an unofficial translation of . It may be either incomplete or incorrect. Found a translation bug?" The last would link to a language-appropriate version of https://docs.python.org/3/bugs.html. In the page footer, "Fould a bug" and its link would be similarly replaced. If it seems necessary, use the red warning colors. On bugs.python.org, 'Documentation' could be augmented to 'Documentation - English original' Note that this is still a character short of the longest component: '2to3 (2.x to 3.x conversion tool)' My impression is that one of the arguments for moving to githup was that we could somehow enable people to submit doc pull requests while reading the web page. (I don't know the details.) The hope was and would be that this would mostly eliminate tracker issues for simple typo fixes, which are a nuisance in the sense of having a high overhead to improvement ratio. Any such requests generated on translated page would go to the translation site. If in spite of the above and any other efforts, there were still misdirected reports on bugs.python.org, the translators should deal with them. In any case, Berker and anyone else could ignore them. A last resort would be to remove the redirect links. -- Terry Jan Reedy From berker.peksag at gmail.com Mon Jan 30 10:08:39 2017 From: berker.peksag at gmail.com (=?UTF-8?Q?Berker_Peksa=C4=9F?=) Date: Mon, 30 Jan 2017 18:08:39 +0300 Subject: [Python-ideas] [docs] https://docs.python.org/fr/ ? In-Reply-To: References: <56ED7EBD.5050804@palard.fr> Message-ID: On Mon, Jan 30, 2017 at 12:21 PM, INADA Naoki wrote: > There are some updates about this topic. > And I have something to discuss to get things forward. > > We (Japanese translation team) and Julien start sharing one Transifex project. > Please see this dashboard. We have nice progress. > https://www.transifex.com/python-doc/python-35/dashboard/ > > Julien want hosting french documentation at https://docs.python.org/fr/ > I don't have strong opinion about where to hosting, but I want to share > more efforts (automated build + hosting) with Julien and other language > communities. > > Since Berker Peksag against about hosting translated document on > docs.python.org [1], I'm considering about using github pages. > > I got "python-docs" organization already for it [2]. So translated documents > can be hosted on URL like https://python-docs.github.io/py35-ja/ or > https://python-docs.github.io/py35/fr/ . (first part of the path > should be same to > repository name). > > I already uses github pages for testing Japanese translation [3]. > It's nice place to host webpage. We can get https and CDN for free. > > [1]: https://github.com/python/docsbuild-scripts/pull/8 > [2]: https://github.com/python-docs > [3]: https://python-doc-ja.github.io/py35/ > > > So I want to discuss (and get consensus) about where should we host > translated document. > > a) translated docs on docs.python.org / issue tracker on github.com/python-docs/ > b) translated docs on python-docs.github.io / issue tracker on > github.com/python-docs/ +1 for b) or any idea that would indicate that the Python developers don't maintain translations of the official documentation. I don't have a strong opinion on naming the GitHub organization (maybe python-docs-translations?) but that can be discussed later. Another advantage of this approach is that you can have separate issue trackers for each language (e.g. python-docs-fr) so people can easily report documentation issues in their native languages. --Berker From brett at python.org Mon Jan 30 13:03:43 2017 From: brett at python.org (Brett Cannon) Date: Mon, 30 Jan 2017 18:03:43 +0000 Subject: [Python-ideas] [docs] https://docs.python.org/fr/ ? In-Reply-To: References: <56ED7EBD.5050804@palard.fr> Message-ID: On Mon, 30 Jan 2017 at 07:16 Berker Peksa? wrote: > On Mon, Jan 30, 2017 at 12:21 PM, INADA Naoki > wrote: > > There are some updates about this topic. > > And I have something to discuss to get things forward. > > > > We (Japanese translation team) and Julien start sharing one Transifex > project. > > Please see this dashboard. We have nice progress. > > https://www.transifex.com/python-doc/python-35/dashboard/ > > > > Julien want hosting french documentation at https://docs.python.org/fr/ > > I don't have strong opinion about where to hosting, but I want to share > > more efforts (automated build + hosting) with Julien and other language > > communities. > > > > Since Berker Peksag against about hosting translated document on > > docs.python.org [1], I'm considering about using github pages. > > > > I got "python-docs" organization already for it [2]. So translated > documents > > can be hosted on URL like https://python-docs.github.io/py35-ja/ or > > https://python-docs.github.io/py35/fr/ . (first part of the path > > should be same to > > repository name). > > > > I already uses github pages for testing Japanese translation [3]. > > It's nice place to host webpage. We can get https and CDN for free. > > > > [1]: https://github.com/python/docsbuild-scripts/pull/8 > > [2]: https://github.com/python-docs > > [3]: https://python-doc-ja.github.io/py35/ > > > > > > So I want to discuss (and get consensus) about where should we host > > translated document. > > > > a) translated docs on docs.python.org / issue tracker on > github.com/python-docs/ > > b) translated docs on python-docs.github.io / issue tracker on > > github.com/python-docs/ > > +1 for b) or any idea that would indicate that the Python developers > don't maintain translations of the official documentation. I don't > have a strong opinion on naming the GitHub organization (maybe > python-docs-translations?) but that can be discussed later. Another > advantage of this approach is that you can have separate issue > trackers for each language (e.g. python-docs-fr) so people can easily > report documentation issues in their native languages. > Does hosting on Read the Docs makes any of this easier/harder? -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikhailwas at gmail.com Mon Jan 30 14:52:10 2017 From: mikhailwas at gmail.com (Mikhail V) Date: Mon, 30 Jan 2017 20:52:10 +0100 Subject: [Python-ideas] A more readable way to nest functions In-Reply-To: References: Message-ID: On 27 January 2017 at 22:07, Brent Brinkley wrote: > HI Everyone, > > One issue that I?ve seen in a lot of languages struggle with is nested > function calls. > Parenthesis when nested inherently create readability issues. > > Yes there is such issue. I don't see however that a radical change to nested notation can be a solution here. Not because it is too hard to find a consensus about how it should be, but also because in some sense the parenthesis nesting is sort of optimal for general case. It is hard to explain in simple words, but some of the answers gave hints already -- if you try some more complex nesting with your proposed example, it will end up with even *worse* schemas than parenthesis notation. One important note about parenthesis itself: often it looks so bad simply because in monospaced font, e.g. Courier, parenthesis is sort of "slightly bent letter I" which results in really bad look, and the *correct* parenthesis character is a rounded bracket, which extends to bottom ant top much further than letters. It has thin endpoints and thicker middle. And it should be given some space from left and right. So a big part of the problem lies in the font. As for the problem of long equations. Here could be many proposals. My opinion: nested equations must be broken into series of smaller ones, but unfortunately Python does not provide a standard solution here. *Theoretically* I see a solution by 'inlined' statements. Take a long example: print ( merge (a, b, merge ( long_variable2, long_variable2 ) ) Now just split it in 2 lines: tmp <> merge ( long_variable2, long_variable2 ) print ( merge (a, b, tmp ) ) So I'd for example invent a special sign which just marks statements that will be first collected as inline text, sort of macros. But as always, in such cases there is little chance to find any consensus due to many reasons. One of the reasons that there are too few good looking characters out there, and same applies for any possible improvement. Mikhail -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Mon Jan 30 15:25:54 2017 From: mertz at gnosis.cx (David Mertz) Date: Mon, 30 Jan 2017 12:25:54 -0800 Subject: [Python-ideas] A more readable way to nest functions In-Reply-To: References: Message-ID: On Mon, Jan 30, 2017 at 11:52 AM, Mikhail V wrote: > *Theoretically* I see a solution by 'inlined' statements. > Take a long example: > > print ( merge (a, b, merge ( long_variable2, long_variable2 ) ) > > Now just split it in 2 lines: > > tmp <> merge ( long_variable2, long_variable2 ) > print ( merge (a, b, tmp ) ) > > So I'd for example invent a special sign which just marks > statements that will be first collected as inline text, sort of macros. > I have a great idea for this special sign. We could use the equal sign '=' for this purpose of assigning a value into a temporary name. :-) tmp = merge(long_variable2, long_variable2) print (merge(a, b, tmp) ) -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikhailwas at gmail.com Mon Jan 30 18:54:01 2017 From: mikhailwas at gmail.com (Mikhail V) Date: Tue, 31 Jan 2017 00:54:01 +0100 Subject: [Python-ideas] A more readable way to nest functions In-Reply-To: References: Message-ID: On 30 January 2017 at 21:25, David Mertz wrote: > On Mon, Jan 30, 2017 at 11:52 AM, Mikhail V wrote: > >> *Theoretically* I see a solution by 'inlined' statements. >> Take a long example: >> >> print ( merge (a, b, merge ( long_variable2, long_variable2 ) ) >> >> Now just split it in 2 lines: >> >> tmp <> merge ( long_variable2, long_variable2 ) >> print ( merge (a, b, tmp ) ) >> >> So I'd for example invent a special sign which just marks >> statements that will be first collected as inline text, sort of macros. >> > > I have a great idea for this special sign. We could use the equal sign > '=' for this purpose of assigning a value into a temporary name. :-) > > tmp = merge(long_variable2, long_variable2) > print (merge(a, b, tmp) ) > > > Great idea :) But actually that was my idea initially, so just breaking it into two lines solves the readability issue perfectly with long expressions. Although if one is chasing some kind of optimisations... I don't know, I see very often people want to stick everything in one big expression. -------------- next part -------------- An HTML attachment was scrubbed... URL: From robertomartinezp at gmail.com Tue Jan 31 02:13:15 2017 From: robertomartinezp at gmail.com (=?UTF-8?Q?Roberto_Mart=C3=ADnez?=) Date: Tue, 31 Jan 2017 07:13:15 +0000 Subject: [Python-ideas] A decorator to call super() Message-ID: Hi, I find this type of code quite often: class MyOverridedClass(MyBaseClass): def mymethod(self, foo, **kwargs): # Do something return super().mymethod(**kwargs) What about creating a decorator to call super() after/before the overrided method? Something like that: class MyOverridedClass(MyBaseClass): @extendsuper def mymethod(self, foo): # Do something Sorry if this has already been proposed, I have not found anything similar in the list. Best regards, Roberto -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Tue Jan 31 07:07:52 2017 From: srkunze at mail.de (Sven R. Kunze) Date: Tue, 31 Jan 2017 13:07:52 +0100 Subject: [Python-ideas] A decorator to call super() In-Reply-To: References: Message-ID: <459115ea-f00c-dbe3-26e7-6abb134fe339@mail.de> Hi Roberto, On 31.01.2017 08:13, Roberto Mart?nez wrote: > class MyOverridedClass(MyBaseClass): > def mymethod(self, foo, **kwargs): > # Do something > return super().mymethod(**kwargs) > > What about creating a decorator to call super() after/before the > overrided method? Something like that: > > class MyOverridedClass(MyBaseClass): > @extendsuper > def mymethod(self, foo): > # Do something I could find this useful. There's just on bikeshedding issue: When should "super().mymethod(**kwargs)" be called: *before*, *after* or inbetween my specialized code? Depending on the baseclass either of those three is necessary. As far as I can tell, we encounter all of them regularly. Best, Sven -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Tue Jan 31 07:13:15 2017 From: srkunze at mail.de (Sven R. Kunze) Date: Tue, 31 Jan 2017 13:13:15 +0100 Subject: [Python-ideas] A more readable way to nest functions In-Reply-To: References: Message-ID: <564200a8-97a9-551e-6cdf-dd76a1996f8c@mail.de> On 31.01.2017 00:54, Mikhail V wrote: > Great idea :) But actually that was my idea initially, so just > breaking it into two lines solves the readability issue perfectly with > long expressions. Although if one is chasing some kind of > optimisations... I don't know, I see very often people want to stick > everything in one big expression. Because it's natural. It's *sometimes* the best way to convey the data processing pipeline. It's the connection between separate parts that needs to be conveyed. Furthermore, inventing artificial names is sometimes not the best way. So, I think the behavior you've described can be explained quite easily. Best, Sven -------------- next part -------------- An HTML attachment was scrubbed... URL: From robertomartinezp at gmail.com Tue Jan 31 09:20:02 2017 From: robertomartinezp at gmail.com (=?UTF-8?Q?Roberto_Mart=C3=ADnez?=) Date: Tue, 31 Jan 2017 14:20:02 +0000 Subject: [Python-ideas] A decorator to call super() In-Reply-To: <459115ea-f00c-dbe3-26e7-6abb134fe339@mail.de> References: <459115ea-f00c-dbe3-26e7-6abb134fe339@mail.de> Message-ID: I think both are useful. I would make this configurable with a flag: class MyOverridedClass(MyBaseClass): @extendsuper(after=True) def mymethod(self, foo): ... Or maybe a pair of decorator is a better option: @pre_super and @post_super El mar., 31 ene. 2017 a las 13:07, Sven R. Kunze () escribi?: > Hi Roberto, > > > On 31.01.2017 08:13, Roberto Mart?nez wrote: > > class MyOverridedClass(MyBaseClass): > def mymethod(self, foo, **kwargs): > # Do something > return super().mymethod(**kwargs) > > What about creating a decorator to call super() after/before the overrided > method? Something like that: > > class MyOverridedClass(MyBaseClass): > @extendsuper > def mymethod(self, foo): > # Do something > > > I could find this useful. There's just on bikeshedding issue: > > When should "super().mymethod(**kwargs)" be called: *before*, *after* or > inbetween my specialized code? > > Depending on the baseclass either of those three is necessary. As far as I > can tell, we encounter all of them regularly. > > Best, > Sven > -------------- next part -------------- An HTML attachment was scrubbed... URL: From turnbull.stephen.fw at u.tsukuba.ac.jp Tue Jan 31 09:32:38 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Tue, 31 Jan 2017 23:32:38 +0900 Subject: [Python-ideas] A decorator to call super() In-Reply-To: References: Message-ID: <22672.40966.35986.131212@turnbull.sk.tsukuba.ac.jp> Roberto Mart?nez writes: > What about creating a decorator to call super() after/before the > overrided method? I think this is a reasonable idea, but you can do it yourself in a few lines, can't you? Are there any "gotchas" that make it hard to do correctly? Like Sven Kunze, I'm concerned about trying to standardize in the stdlib because a single method has ambiguous semantics (before, after, "during", asynchronously, ...) and the arguments to the decorated method are restricted so people will inevitably get it wrong. Alternatively, if you specify the semantics with an argument you end up with something like Lisp's "advice" function, which is a big hairball, or multiple decorators, to disambiguate. Personally, I don't think the explicit invocation is such a big deal to need a standardized decorator in the stdlib. YMMV, just expressing a few ideas. Steve From thomas at kluyver.me.uk Tue Jan 31 10:05:44 2017 From: thomas at kluyver.me.uk (Thomas Kluyver) Date: Tue, 31 Jan 2017 15:05:44 +0000 Subject: [Python-ideas] A decorator to call super() In-Reply-To: <22672.40966.35986.131212@turnbull.sk.tsukuba.ac.jp> References: <22672.40966.35986.131212@turnbull.sk.tsukuba.ac.jp> Message-ID: <1485875144.3543228.865523792.02B46626@webmail.messagingengine.com> On Tue, Jan 31, 2017, at 02:32 PM, Stephen J. Turnbull wrote: > Personally, I don't think the explicit invocation is such a big deal > to need a standardized decorator in the stdlib. +1. It's one line either way, and the explicit call to super() seems clearer for people reading the code. From jsbueno at python.org.br Tue Jan 31 10:57:39 2017 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Tue, 31 Jan 2017 13:57:39 -0200 Subject: [Python-ideas] A decorator to call super() In-Reply-To: <1485875144.3543228.865523792.02B46626@webmail.messagingengine.com> References: <22672.40966.35986.131212@turnbull.sk.tsukuba.ac.jp> <1485875144.3543228.865523792.02B46626@webmail.messagingengine.com> Message-ID: On 31 January 2017 at 13:05, Thomas Kluyver wrote: > On Tue, Jan 31, 2017, at 02:32 PM, Stephen J. Turnbull wrote: >> Personally, I don't think the explicit invocation is such a big deal >> to need a standardized decorator in the stdlib. > > +1. It's one line either way, and the explicit call to super() seems > clearer for people reading the code. I agree that the explict call to super() is clear and concise enough - moreover you are in full control of where to call, plus what parameters to forward. BUT - no, it is _not_ an easy decorator to craft - and I don't think it can be made to work cleanly without depending on implementation details of cPython. I've been hitting the Python shell for 40+ minutes now, trying to get, in pure Python, a way for a method decorator to get a reference to the superclass in the way "super" does - it is not feasible without a metaclass. (I mean...it may be feasable, but one will be tough - one does not have a reference to the superclasses inside a class body as it is being parsed - I tried to trick the Python runtime into creating an empty __class__ cell in the decorator body for a decorator defined outside the class, and have that filled in, but it does not work as well). Still, such @pre_super and @post_super decorators might be something cute to have around - and can't be made in a trivial way either on the code base or on a pure-python 3rd party package. I would say I am +0 to "+0.5" on them. js -><- From jsbueno at python.org.br Tue Jan 31 10:58:25 2017 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Tue, 31 Jan 2017 13:58:25 -0200 Subject: [Python-ideas] A decorator to call super() In-Reply-To: References: <22672.40966.35986.131212@turnbull.sk.tsukuba.ac.jp> <1485875144.3543228.865523792.02B46626@webmail.messagingengine.com> Message-ID: BTW, if one can come up with a pure-Python implementation for these, I'd like to take a peek at the code, please. On 31 January 2017 at 13:57, Joao S. O. Bueno wrote: > On 31 January 2017 at 13:05, Thomas Kluyver wrote: >> On Tue, Jan 31, 2017, at 02:32 PM, Stephen J. Turnbull wrote: >>> Personally, I don't think the explicit invocation is such a big deal >>> to need a standardized decorator in the stdlib. >> >> +1. It's one line either way, and the explicit call to super() seems >> clearer for people reading the code. > > I agree that the explict call to super() is clear and concise enough > - moreover you are in full control of > where to call, plus what parameters to forward. > > BUT - no, it is _not_ an easy decorator to craft - and I don't think > it can be made to work > cleanly without depending on implementation details of cPython. > > I've been hitting the Python shell for 40+ minutes now, trying to get, > in pure Python, a way > for a method decorator to get a reference to the superclass in the way > "super" does - it is not feasible > without a metaclass. (I mean...it may be feasable, but one will be > tough - one does not have a reference > to the superclasses inside a class body as it is being parsed - I > tried to trick the Python runtime into > creating an empty __class__ cell in the decorator body for a decorator > defined outside the class, and have > that filled in, but it does not work as well). > > Still, such @pre_super and @post_super decorators might be something > cute to have around - and > can't be made in a trivial way either on the code base or on a > pure-python 3rd party package. > > I would say I am +0 to "+0.5" on them. > > > js > -><- From yselivanov.ml at gmail.com Tue Jan 31 14:08:58 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 31 Jan 2017 14:08:58 -0500 Subject: [Python-ideas] A decorator to call super() In-Reply-To: References: <22672.40966.35986.131212@turnbull.sk.tsukuba.ac.jp> <1485875144.3543228.865523792.02B46626@webmail.messagingengine.com> Message-ID: On 2017-01-31 10:57 AM, Joao S. O. Bueno wrote: > BUT - no, it is_not_ an easy decorator to craft - and I don't think > it can be made to work > cleanly without depending on implementation details of cPython. Pure Python version is certainly possible with descriptor protocol. PoC: https://gist.github.com/1st1/ebee935256c7cc35c38cc3f73f00461d -1 on having a built-in like this. Yury From ned at nedbatchelder.com Tue Jan 31 14:33:57 2017 From: ned at nedbatchelder.com (Ned Batchelder) Date: Tue, 31 Jan 2017 14:33:57 -0500 Subject: [Python-ideas] A decorator to call super() In-Reply-To: References: Message-ID: <6c998ac7-fb4c-5818-67a3-6a906538474e@nedbatchelder.com> On 1/31/17 2:13 AM, Roberto Mart?nez wrote: > Hi, > > I find this type of code quite often: > > class MyOverridedClass(MyBaseClass): > def mymethod(self, foo, **kwargs): > # Do something > return super().mymethod(**kwargs) > > What about creating a decorator to call super() after/before the > overrided method? Something like that: > > class MyOverridedClass(MyBaseClass): > @extendsuper > def mymethod(self, foo): > # Do something > > Sorry if this has already been proposed, I have not found anything > similar in the list. With all of the possible details that need to be covered (before/after, what args to pass along, what to do with the return value), this doesn't seem like a good idea to me. The most common use of super is in __init__, where the value should not be returned, and the example given here returns the value, so right off the bat, the example is at odds with common usage. The super call is just one line, and the decorator would be one line, so there's no savings, no improvement to expressivitiy, and a loss of clarity: -1. --Ned. From rymg19 at gmail.com Tue Jan 31 15:38:00 2017 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Tue, 31 Jan 2017 20:38:00 +0000 Subject: [Python-ideas] A decorator to call super() In-Reply-To: References: <22672.40966.35986.131212@turnbull.sk.tsukuba.ac.jp> <1485875144.3543228.865523792.02B46626@webmail.messagingengine.com> Message-ID: <5y8jf4zho2nnwvtxpptzj8dzd-0@mailer.nylas.com> https://github.com/kirbyfan64/mirasu \-- Ryan (????) Yoko Shimomura > ryo (supercell/EGOIST) > Hiroyuki Sawano >> everyone else -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsbueno at python.org.br Tue Jan 31 15:55:08 2017 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Tue, 31 Jan 2017 18:55:08 -0200 Subject: [Python-ideas] A decorator to call super() In-Reply-To: <6c998ac7-fb4c-5818-67a3-6a906538474e@nedbatchelder.com> References: <6c998ac7-fb4c-5818-67a3-6a906538474e@nedbatchelder.com> Message-ID: Sure - thanks - I did not even consider the descriptor mechanism, as I got focused in getting the equivalent from the __class__ cell inside the decorator code. And of course, now there is the "__init_subclass__" mechanism - a mixin version using that was as straight forward as it can be as well. On 31 January 2017 at 17:33, Ned Batchelder wrote: > On 1/31/17 2:13 AM, Roberto Mart?nez wrote: >> Hi, >> >> I find this type of code quite often: >> >> class MyOverridedClass(MyBaseClass): >> def mymethod(self, foo, **kwargs): >> # Do something >> return super().mymethod(**kwargs) >> >> What about creating a decorator to call super() after/before the >> overrided method? Something like that: >> >> class MyOverridedClass(MyBaseClass): >> @extendsuper >> def mymethod(self, foo): >> # Do something >> >> Sorry if this has already been proposed, I have not found anything >> similar in the list. > With all of the possible details that need to be covered (before/after, > what args to pass along, what to do with the return value), this doesn't > seem like a good idea to me. The most common use of super is in > __init__, where the value should not be returned, and the example given > here returns the value, so right off the bat, the example is at odds > with common usage. > > The super call is just one line, and the decorator would be one line, so > there's no savings, no improvement to expressivitiy, and a loss of > clarity: -1. > > --Ned. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/