From mail.yogi841 at gmail.com Sat Dec 1 03:58:21 2018 From: mail.yogi841 at gmail.com (Adam Johnson) Date: Sat, 1 Dec 2018 08:58:21 +0000 Subject: [Python-ideas] __len__() for map() In-Reply-To: <20181201011734.GN4319@ando.pearwood.info> References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181201011734.GN4319@ando.pearwood.info> Message-ID: On Sat, 1 Dec 2018 at 01:17, Steven D'Aprano wrote: > > In principle, we could make this work, by turning the output of map() > into a view like dict.keys() etc, or a lazy sequence type like range(). > wrapping the underlying sequence. That might be worth exploring. I can't > think of any obvious problems with a view-like interface, but that > doesn't mean there aren't any. I've spent like 30 seconds thinking about > it, so the fact that I can't see any problems with it means little. Something to consider that, so far, seems to have been overlooked is that the total length of the resulting map isn't only dependent upon the iterable, but also the mapped function. It is a pretty pathological case, but there is no guarantee that the function is a pure function, free from side effects. If the iterable is mutable and the mapped function has a reference to it (either from scoping or the iterable (in)directly containing a reference to itself), there is nothing to prevent the function modifying the iterable as the map is evaluated. For example, map can be used as a filter: it = iter((0, 16, 1, 4, 8, 29, 2, 13, 42)) def filter_odd(x): while x % 2 == 0: x = next(it) return x tuple(map(filter_odd, it)) # (1, 29, 13) The above also illustrates the second way the total length of the map could differ from the length input iterable, even if is immutable. If StopIteration is raised within the mapped function, map finishes early, so can be used in a manner similar to takewhile: def takewhile_lessthan4(x): if x < 4: return x raise StopIteration tuple(map(takewhile_lessthan4, range(9))) # (0, 1, 2, 3) I really don't understand why this is true, under 'normal' usage, map shouldn't have any reason to silently swallow a StopIteration raised _within_ the mapped function. As I opened with, I wouldn't consider using map in either of these ways to be a good idea, and anyone doing so should probably be persuaded to find better alternatives, but it might be something to bear in mind. AJ From greg.ewing at canterbury.ac.nz Sat Dec 1 05:44:07 2018 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 01 Dec 2018 23:44:07 +1300 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181201011734.GN4319@ando.pearwood.info> Message-ID: <5C0265F7.3070303@canterbury.ac.nz> Adam Johnson wrote: > def takewhile_lessthan4(x): > if x < 4: > return x > raise StopIteration > > tuple(map(takewhile_lessthan4, range(9))) > # (0, 1, 2, 3) > > I really don't understand why this is true, under 'normal' usage, map > shouldn't have any reason to silently swallow a StopIteration raised > _within_ the mapped function. It's not -- the StopIteration isn't terminating the map, it's terminating the iteration being performed by tuple(). It's easy to show that map() is not swallowing the StopIteration: >>> m = map(takewhile_lessthan4, range(9)) >>> next(m) 0 >>> next(m) 1 >>> next(m) 2 >>> next(m) 3 >>> next(m) Traceback (most recent call last): File "", line 1, in File "", line 4, in takewhile_lessthan4 StopIteration -- Greg From mail.yogi841 at gmail.com Sat Dec 1 07:45:08 2018 From: mail.yogi841 at gmail.com (Adam Johnson) Date: Sat, 1 Dec 2018 12:45:08 +0000 Subject: [Python-ideas] __len__() for map() In-Reply-To: <5C0265F7.3070303@canterbury.ac.nz> References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181201011734.GN4319@ando.pearwood.info> <5C0265F7.3070303@canterbury.ac.nz> Message-ID: On Sat, 1 Dec 2018 at 10:44, Greg Ewing wrote: > It's not -- the StopIteration isn't terminating the map, > it's terminating the iteration being performed by tuple(). That was a poor choice of wording on my part, it's rather that map doesn't do anything special in that regard. To whatever is iterating over the map, any unexpected StopIteration from the function isn't distinguishable from the expected one from the iterable(s) being exhausted. This issue was dealt with in generators by PEP-479 (by replacing the StopIteration with a RuntimeError). Whilst map, filter, and others may not be generators, I would expect them to be consistent with that PEP when handling the same issue. From paul-python at svensson.org Sat Dec 1 11:07:53 2018 From: paul-python at svensson.org (Paul Svensson) Date: Sat, 1 Dec 2018 11:07:53 -0500 (EST) Subject: [Python-ideas] __len__() for map() In-Reply-To: <20181201011734.GN4319@ando.pearwood.info> References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181201011734.GN4319@ando.pearwood.info> Message-ID: On Sat, 1 Dec 2018, Steven D'Aprano wrote: > On Thu, Nov 29, 2018 at 08:13:12PM -0500, Paul Svensson wrote: > >> What's being proposed is simple, either: >> * len(map(f, x)) == len(x), or >> * both raise TypeError > > Simple, obvious, and problematic. > > Here's a map object I prepared earlier: > > from itertools import islice > mo = map(lambda x: x, "aardvark") > list(islice(mo, 3)) > > If I now pass you the map object, mo, what should len(mo) return? Five > or eight? mo = "aardvark" list(islice(mo, 3)) By what magic would the length change? Per the proposal, it can only be eight. Of course, that means mo can't, in this case, be an iterator. That's what the proposal would change. /Paul From mertz at gnosis.cx Sat Dec 1 11:27:31 2018 From: mertz at gnosis.cx (David Mertz) Date: Sat, 1 Dec 2018 11:27:31 -0500 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181201011734.GN4319@ando.pearwood.info> Message-ID: A proposal to make map() not return an iterator seems like a non-starter. Yes, Python 2 worked that way, but that was a long time ago and we know better now. In the simple example it doesn't matter much: mo = map(lambda x: x, "aardvark") But map() is more useful for the non-toy case: mo = map(expensive_db_lookup, list_of_keys) list_of_keys can be a concrete list, but I'm using map() mainly specifically to get lazy iterator behavior. On Sat, Dec 1, 2018, 11:10 AM Paul Svensson On Sat, 1 Dec 2018, Steven D'Aprano wrote: > > > On Thu, Nov 29, 2018 at 08:13:12PM -0500, Paul Svensson wrote: > > > >> What's being proposed is simple, either: > >> * len(map(f, x)) == len(x), or > >> * both raise TypeError > > > > Simple, obvious, and problematic. > > > > Here's a map object I prepared earlier: > > > > from itertools import islice > > mo = map(lambda x: x, "aardvark") > > list(islice(mo, 3)) > > > > If I now pass you the map object, mo, what should len(mo) return? Five > > or eight? > > mo = "aardvark" > list(islice(mo, 3)) > > By what magic would the length change? > Per the proposal, it can only be eight. > Of course, that means mo can't, in this case, be an iterator. > That's what the proposal would change. > > /Paul > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Dec 1 11:53:20 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 2 Dec 2018 03:53:20 +1100 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181201011734.GN4319@ando.pearwood.info> Message-ID: <20181201165320.GQ4319@ando.pearwood.info> On Sat, Dec 01, 2018 at 11:07:53AM -0500, Paul Svensson wrote: [...] > >Here's a map object I prepared earlier: > > > >from itertools import islice > >mo = map(lambda x: x, "aardvark") > >list(islice(mo, 3)) > > > >If I now pass you the map object, mo, what should len(mo) return? Five > >or eight? > > mo = "aardvark" > list(islice(mo, 3)) > > By what magic would the length change? > Per the proposal, it can only be eight. > Of course, that means mo can't, in this case, be an iterator. > That's what the proposal would change. I already discussed that: map is not currently a sequence, and just giving it a __len__ is not going to make it one. Making it a sequence, or a view of a sequence, is a bigger change, but worth considering, as I already said in part of my post you deleted. However, it is also a backwards incompatible change. In case its not obvious from my example above, I'll be explicit: # current behaviour mo = map(lambda x: x, "aardvark") list(islice(mo, 3)) # discard the first three items assert ''.join(mo) == 'dvark' => passes # future behaviour, with your proposal mo = map(lambda x: x, "aardvark") list(islice(mo, 3)) # discard the first three items assert ''.join(mo) == 'dvark' => fails with AssertionError Given the certainty that this change will break code (I know it will break *my* code, as I often rely on map() being an iterator not a sequence) it might be better to introduce a new "mapview" type rather than change the behaviour of map() itself. On the other hand, since the fix is simple enough: mo = iter(mo) perhaps all we need is a depreciation period of at least one full release before changing the behaviour. Either way, this isn't a simple or obvious change, and will probably need a PEP to nut out all the fine details. -- Steve From mertz at gnosis.cx Sat Dec 1 12:06:23 2018 From: mertz at gnosis.cx (David Mertz) Date: Sat, 1 Dec 2018 12:06:23 -0500 Subject: [Python-ideas] __len__() for map() In-Reply-To: <20181201165320.GQ4319@ando.pearwood.info> References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181201011734.GN4319@ando.pearwood.info> <20181201165320.GQ4319@ando.pearwood.info> Message-ID: On Sat, Dec 1, 2018, 11:54 AM Steven D'Aprano # current behaviour > mo = map(lambda x: x, "aardvark") > list(islice(mo, 3)) # discard the first three items > assert ''.join(mo) == 'dvark' > => passes > > # future behaviour, with your proposal > assert ''.join(mo) == 'dvark' > => fails with AssertionError > > Given the certainty that this change will break code (I know it will > break *my* code, as I often rely on map() being an iterator not a > sequence) it might be better to introduce a new "mapview" type rather than > change the behaviour of map() itself. On the other hand, since the fix is > simple enough: > > mo = iter(mo) > Given that the anti-fix is just as simple and currently available, I don't see why we'd want a change: # map->sequence mo = list(mo) FWIW, I actually do write exactly that code fairly often, it's not hard. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Dec 1 12:10:18 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 2 Dec 2018 04:10:18 +1100 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181201011734.GN4319@ando.pearwood.info> Message-ID: <20181201171018.GR4319@ando.pearwood.info> On Sat, Dec 01, 2018 at 11:27:31AM -0500, David Mertz wrote: > A proposal to make map() not return an iterator seems like a non-starter. > Yes, Python 2 worked that way, but that was a long time ago and we know > better now. Paul is certainly not suggesting reverting the behaviour to the Python2 map, at the very least map(func, iterator) will continue to return an iterator. What Paul is *precisely* proposing isn't clear to me, except that map(func, sequence) will be "loosely" a sequence. What that means is not obvious. What is especially unclear is what his map() will do when passed multiple iterable arguments. [...] > list_of_keys can be a concrete list, but I'm using map() mainly > specifically to get lazy iterator behavior. Indeed. That's often why I use it too. But there is a good use-case for having map(), or a map-like function, provide either a lazy sequence like range() or a view. But the devil is in the details. Terry was right to encourage people to experiment with their own map-like function (a subclass?) to identify any tricky corners in the proposal. -- Steve From steve at pearwood.info Sat Dec 1 12:23:07 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 2 Dec 2018 04:23:07 +1100 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181201011734.GN4319@ando.pearwood.info> <20181201165320.GQ4319@ando.pearwood.info> Message-ID: <20181201172307.GS4319@ando.pearwood.info> On Sat, Dec 01, 2018 at 12:06:23PM -0500, David Mertz wrote: > Given that the anti-fix is just as simple and currently available, I don't > see why we'd want a change: > > # map->sequence > mo = list(mo) > > FWIW, I actually do write exactly that code fairly often, it's not hard. Sure, but that makes a copy of the original data and means you lose the benefit of map being lazy. Naturally we will always have the ability to call list and eagerly convert to a sequence, but these proposals are for a way of getting the advantages of sequence-like behaviour while still keeping the advantages of laziness. With iterators, the only way to get that advantage of laziness is to give up the ability to query length, random access to items, etc even when the underlying data is a sequence and that information would have been readily available. We can, at least sometimes, have the best of both worlds. Maybe. -- Steve From mertz at gnosis.cx Sat Dec 1 12:28:16 2018 From: mertz at gnosis.cx (David Mertz) Date: Sat, 1 Dec 2018 12:28:16 -0500 Subject: [Python-ideas] __len__() for map() In-Reply-To: <20181201172307.GS4319@ando.pearwood.info> References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181201011734.GN4319@ando.pearwood.info> <20181201165320.GQ4319@ando.pearwood.info> <20181201172307.GS4319@ando.pearwood.info> Message-ID: Other than being able to ask len(), are there any advantages to a slightly less opaque map()? Getting the actual result of applying the function to the element is necessarily either eager or lazy, you can't have both. On Sat, Dec 1, 2018, 12:24 PM Steven D'Aprano On Sat, Dec 01, 2018 at 12:06:23PM -0500, David Mertz wrote: > > > Given that the anti-fix is just as simple and currently available, I > don't > > see why we'd want a change: > > > > # map->sequence > > mo = list(mo) > > > > FWIW, I actually do write exactly that code fairly often, it's not hard. > > Sure, but that makes a copy of the original data and means you lose the > benefit of map being lazy. > > Naturally we will always have the ability to call list and eagerly > convert to a sequence, but these proposals are for a way of getting the > advantages of sequence-like behaviour while still keeping the advantages > of laziness. > > With iterators, the only way to get that advantage of laziness is > to give up the ability to query length, random access to items, etc even > when the underlying data is a sequence and that information would have > been readily available. We can, at least sometimes, have the best of > both worlds. Maybe. > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Dec 1 14:08:03 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 2 Dec 2018 06:08:03 +1100 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181201011734.GN4319@ando.pearwood.info> <20181201165320.GQ4319@ando.pearwood.info> <20181201172307.GS4319@ando.pearwood.info> Message-ID: <20181201190803.GT4319@ando.pearwood.info> On Sat, Dec 01, 2018 at 12:28:16PM -0500, David Mertz wrote: > Other than being able to ask len(), are there any advantages to a slightly > less opaque map()? Getting the actual result of applying the function to > the element is necessarily either eager or lazy, you can't have both. I don't understand the point you think you are making here. There's no fundamental need to make a copy of a sequence just to apply a map function to it, especially if the function is cheap. (If it is expensive, you might want to add a cache.) This proof of concept wrapper class could have been written any time since Python 1.5 or earlier: class lazymap: def __init__(self, function, sequence): self.function = function self.wrapped = sequence def __len__(self): return len(self.wrapped) def __getitem__(self, item): return self.function(self.wrapped[item]) It is fully iterable using the sequence protocol, even in Python 3: py> x = lazymap(str.upper, 'aardvark') py> list(x) ['A', 'A', 'R', 'D', 'V', 'A', 'R', 'K'] Mapped items are computed on demand, not up front. It doesn't make a copy of the underlying sequence, it can be iterated over and over again, it has a length and random access. And if you want an iterator, you can just pass it to the iter() function. There are probably bells and whistles that can be added (a nicer repr? any other sequence methods? a cache?) and I haven't tested it fully. For backwards compatibilty reasons, we can't just make map() work like this, because that's a change in behaviour. There may be tricky corner cases I haven't considered, but as a proof of concept I think it shows that the basic premise is sound and worth pursuing. -- Steve From mertz at gnosis.cx Sat Dec 1 14:26:41 2018 From: mertz at gnosis.cx (David Mertz) Date: Sat, 1 Dec 2018 14:26:41 -0500 Subject: [Python-ideas] __len__() for map() In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181201011734.GN4319@ando.pearwood.info> <20181201165320.GQ4319@ando.pearwood.info> <20181201172307.GS4319@ando.pearwood.info> Message-ID: To illustrate the distinction that someone (I think Steven D'Aprano) makes, I think these two (modestly tested, but could have flaws) implementations are both sensible for some purposes. Both are equally "obvious," yet they are different: >>> import sys >>> from itertools import count >>> class map1(object): ... def __init__(self, fn, *seqs): ... try: # See if there is a length ... self._len = min(map(len, seqs)) ... except: # Fallback isn't in any sense accurate, just "large" ... self._len = sys.maxsize ... self._fn = fn ... self._seqs = seqs ... self._iters = [iter(seq) for seq in seqs] ... def __iter__(self): ... return self ... def __next__(self): ... args = [next(it) for it in self._iters] ... return self._fn(*args) ... def __len__(self): ... return self._len ... >>> class map2(map1): ... def __init__(self, fn, *seqs): ... super().__init__(fn, *seqs) ... def __next__(self): ... self._len -= 1 ... return super().__next__() ... >>> m1 = map1(add, [1,2,3,4], (5,6,7)) >>> len(m1) 3 >>> next(m1) 6 >>> len(m1) 3 >>> m2 = map2(add, [1,2,3,4], (5,6,7)) >>> len(m2) 3 >>> next(m2) 6 >>> len(m2) 2 >>> m1_inf = map1(lambda x: x, count()) >>> len(m1_inf) 9223372036854775807 >>> next(m1_inf) 0 >>> next(m1_inf) 1 I wasn't sure what to set self._len to where it doesn't make sense. I thought of None which makes len(mo) raise one exception, or -1 which makes len(mo) raise a different exception. I just choose an arbitrary "big" value in the above implementation. mo.__length_hint__() is a possibility, but that is specialized, not a way of providing a response to len(mo). I don't have to, but I do keep around mo._seqs as a handle to the underlying sequences. In concept those could be re-inspected for other properties as the user of the classes desired. On Sat, Dec 1, 2018 at 12:28 PM David Mertz wrote: > Other than being able to ask len(), are there any advantages to a slightly > less opaque map()? Getting the actual result of applying the function to > the element is necessarily either eager or lazy, you can't have both. > > On Sat, Dec 1, 2018, 12:24 PM Steven D'Aprano >> On Sat, Dec 01, 2018 at 12:06:23PM -0500, David Mertz wrote: >> >> > Given that the anti-fix is just as simple and currently available, I >> don't >> > see why we'd want a change: >> > >> > # map->sequence >> > mo = list(mo) >> > >> > FWIW, I actually do write exactly that code fairly often, it's not hard. >> >> Sure, but that makes a copy of the original data and means you lose the >> benefit of map being lazy. >> >> Naturally we will always have the ability to call list and eagerly >> convert to a sequence, but these proposals are for a way of getting the >> advantages of sequence-like behaviour while still keeping the advantages >> of laziness. >> >> With iterators, the only way to get that advantage of laziness is >> to give up the ability to query length, random access to items, etc even >> when the underlying data is a sequence and that information would have >> been readily available. We can, at least sometimes, have the best of >> both worlds. Maybe. >> >> >> -- >> Steve >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Sat Dec 1 20:07:16 2018 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 02 Dec 2018 14:07:16 +1300 Subject: [Python-ideas] Suggested MapView object (Re: __len__() for map()) In-Reply-To: <20181201190803.GT4319@ando.pearwood.info> References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181201011734.GN4319@ando.pearwood.info> <20181201165320.GQ4319@ando.pearwood.info> <20181201172307.GS4319@ando.pearwood.info> <20181201190803.GT4319@ando.pearwood.info> Message-ID: <5C033044.9080907@canterbury.ac.nz> Steven D'Aprano wrote: > For backwards compatibilty reasons, we can't just make map() work like > this, because that's a change in behaviour. Actually, I think it's possible to get the best of both worlds. Consider this: from operator import itemgetter class MapView: def __init__(self, func, *args): self.func = func self.args = args self.iterator = None def __len__(self): return min(map(len, self.args)) def __getitem__(self, i): return self.func(*list(map(itemgetter(i), self.args))) def __iter__(self): return self def __next__(self): if not self.iterator: self.iterator = map(self.func, *self.args) return next(self.iterator) If you give it sequences, it behaves like a sequence: >>> a = [1, 2, 3, 4, 5] >>> b = [2, 3, 5] >>> from math import pow >>> m = MapView(pow, a, b) >>> print(list(m)) [1.0, 8.0, 243.0] >>> print(list(m)) [1.0, 8.0, 243.0] >>> print(len(m)) 3 >>> print(m[1]) 8.0 If you give it iterators, it behaves like an iterator: >>> m = MapView(pow, iter(a), iter(b)) >>> print(next(m)) 1.0 >>> print(list(m)) [8.0, 243.0] >>> print(list(m)) [] >>> print(len(m)) Traceback (most recent call last): File "", line 1, in File "/Users/greg/foo/mapview/mapview.py", line 14, in __len__ return min(map(len, self.args)) TypeError: object of type 'list_iterator' has no len() If you use it as an iterator after giving it sequences, it also behaves like an iterator: >>> m = MapView(pow, a, b) >>> print(next(m)) 1.0 >>> print(next(m)) 8.0 What do people think? Could we drop something like this in as a replacement for map() without disturbing anything too much? -- Greg From rosuav at gmail.com Sat Dec 1 20:24:19 2018 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 2 Dec 2018 12:24:19 +1100 Subject: [Python-ideas] Suggested MapView object (Re: __len__() for map()) In-Reply-To: <5C033044.9080907@canterbury.ac.nz> References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181201011734.GN4319@ando.pearwood.info> <20181201165320.GQ4319@ando.pearwood.info> <20181201172307.GS4319@ando.pearwood.info> <20181201190803.GT4319@ando.pearwood.info> <5C033044.9080907@canterbury.ac.nz> Message-ID: On Sun, Dec 2, 2018 at 12:08 PM Greg Ewing wrote: > class MapView: > def __len__(self): > return min(map(len, self.args)) > > def __iter__(self): > return self > > def __next__(self): > if not self.iterator: > self.iterator = map(self.func, *self.args) > return next(self.iterator) I can't help thinking that it will be extremely surprising to have the length remain the same while the items get consumed. After you take a couple of elements off, the length of the map is exactly the same, yet the length of a list constructed from that map won't be. Are there any other non-pathological examples where len(x) != len(list(x))? ChrisA From greg.ewing at canterbury.ac.nz Sun Dec 2 08:04:31 2018 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 03 Dec 2018 02:04:31 +1300 Subject: [Python-ideas] Suggested MapView object (Re: __len__() for map()) In-Reply-To: References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181201011734.GN4319@ando.pearwood.info> <20181201165320.GQ4319@ando.pearwood.info> <20181201172307.GS4319@ando.pearwood.info> <20181201190803.GT4319@ando.pearwood.info> <5C033044.9080907@canterbury.ac.nz> Message-ID: <5C03D85F.2040702@canterbury.ac.nz> Chris Angelico wrote: > I can't help thinking that it will be extremely surprising to have the > length remain the same while the items get consumed. That can be fixed. The following version raises an exception if you try to find the length after having used it as an iterator. (I also fixed a bug -- I had screwed up the sequence case, and it wasn't re-iterating properly.) class MapView: def __init__(self, func, *args): self.func = func self.args = args self.iterator = None def __len__(self): return min(map(len, self.args)) def __getitem__(self, i): return self.func(*list(map(itemgetter(i), self.args))) def __iter__(self): return map(self.func, *self.args) def __next__(self): if not self.iterator: self.iterator = iter(self) return next(self.iterator) >>> a = [1, 2, 3, 4, 5] >>> b = [2, 3, 5] >>> m = MapView(pow, a, b) >>> print(next(m)) 1 >>> print(len(m)) Traceback (most recent call last): File "", line 1, in File "/Users/greg/foo/mapview/mapview.py", line 12, in __len__ raise TypeError("Mapping iterator has no len()") TypeError: Mapping iterator has no len() It will still report a length if you use len() *before* starting to use it as an iterator, but the length it returns is correct at that point, so I don't think that's a problem. > Are there any > other non-pathological examples where len(x) != len(list(x))? No longer a problem: >>> m = MapView(pow, a, b) >>> len(m) == len(list(m)) True -- Greg From steve at pearwood.info Sun Dec 2 08:43:24 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 3 Dec 2018 00:43:24 +1100 Subject: [Python-ideas] Suggested MapView object (Re: __len__() for map()) In-Reply-To: <5C03D85F.2040702@canterbury.ac.nz> References: <20181201011734.GN4319@ando.pearwood.info> <20181201165320.GQ4319@ando.pearwood.info> <20181201172307.GS4319@ando.pearwood.info> <20181201190803.GT4319@ando.pearwood.info> <5C033044.9080907@canterbury.ac.nz> <5C03D85F.2040702@canterbury.ac.nz> Message-ID: <20181202134324.GV4319@ando.pearwood.info> On Mon, Dec 03, 2018 at 02:04:31AM +1300, Greg Ewing wrote: > Chris Angelico wrote: > >I can't help thinking that it will be extremely surprising to have the > >length remain the same while the items get consumed. > > That can be fixed. The following version raises an exception if > you try to find the length after having used it as an iterator. That's not really a "fix" as such, more of a violation of the principle of least astonishment. Perhaps more like the principle of most astonishment: the object changes from sized to unsized even if you don't modify its value or its type, but merely if you look at it the wrong way: # This is okay, doesn't change the nature of the object. for i in range(sys.maxint): try: print(mapview[i]) except IndexError: break # But this unexpectedly changes it from sized to unsized. for x in mapview: break That makes this object a fragile thing that can unexpectedly change from sized to unsized. Neither fish nor fowl with a confusing API that is not quite a sequence, not quite an iterator, not quite sized, but just enough of each to lead people into error. Or... at least that's what the code is supposed to do, the code you give doesn't actually work that way: > class MapView: > def __init__(self, func, *args): > self.func = func > self.args = args > self.iterator = None > def __len__(self): > return min(map(len, self.args)) > def __getitem__(self, i): > return self.func(*list(map(itemgetter(i), self.args))) > def __iter__(self): > return map(self.func, *self.args) > def __next__(self): > if not self.iterator: > self.iterator = iter(self) > return next(self.iterator) > > >>> a = [1, 2, 3, 4, 5] > >>> b = [2, 3, 5] > >>> m = MapView(pow, a, b) > >>> print(next(m)) > 1 > >>> print(len(m)) > Traceback (most recent call last): > File "", line 1, in > File "/Users/greg/foo/mapview/mapview.py", line 12, in __len__ > raise TypeError("Mapping iterator has no len()") > TypeError: Mapping iterator has no len() I can't reproduce that behaviour with the code you give above. When I try it, it returns the length 3, even after the iterator has been completely consumed. I daresay you could jerry-rig something to "fix" this bug, but I think this is a poor API that tries to make a single type act like two conceptually different things at the same time. -- Steve From greg.ewing at canterbury.ac.nz Sun Dec 2 17:52:07 2018 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 03 Dec 2018 11:52:07 +1300 Subject: [Python-ideas] Suggested MapView object (Re: __len__() for map()) In-Reply-To: <20181202134324.GV4319@ando.pearwood.info> References: <20181201011734.GN4319@ando.pearwood.info> <20181201165320.GQ4319@ando.pearwood.info> <20181201172307.GS4319@ando.pearwood.info> <20181201190803.GT4319@ando.pearwood.info> <5C033044.9080907@canterbury.ac.nz> <5C03D85F.2040702@canterbury.ac.nz> <20181202134324.GV4319@ando.pearwood.info> Message-ID: <5C046217.7010805@canterbury.ac.nz> Steven D'Aprano wrote: > Perhaps more like the principle of most > astonishment: the object changes from sized to unsized even if you don't > modify its value or its type, but merely if you look at it the wrong > way: Yes, but keep in mind the purpose of the whole thing is to provide a sequence interface while not breaking old code that expects an iterator interface. Code that was written to work with the existing map() will not be calling len() on it at all, because that would never have worked. > Neither fish nor fowl with a confusing API that is not > quite a sequence, not quite an iterator, not quite sized, but just > enough of each to lead people into error. Yes, it's a compromise in the interests of backwards compatibility. But there are no surprises as long as you stick to one interface or the other. Weird things happen if you mix them up, but sane code won't be doing that. > I can't reproduce that behaviour with the code you give above. When I > try it, it returns the length 3, even after the iterator has been > completely consumed. It sounds like you were still using the old version with a broken __iter__() method. This is my current complete code together with test cases: #----------------------------------------------------------- from operator import itemgetter class MapView: def __init__(self, func, *args): self.func = func self.args = args self.iterator = None def __len__(self): if self.iterator: raise TypeError("Mapping iterator has no len()") return min(map(len, self.args)) def __getitem__(self, i): return self.func(*list(map(itemgetter(i), self.args))) def __iter__(self): return map(self.func, *self.args) def __next__(self): if not self.iterator: self.iterator = iter(self) return next(self.iterator) if __name__ == "__main__": a = [1, 2, 3, 4, 5] b = [2, 3, 5] print("As a sequence:") m = MapView(pow, a, b) print(list(m)) print(list(m)) print(len(m)) print(m[1]) print() print("As an iterator:") m = MapView(pow, iter(a), iter(b)) print(next(m)) print(list(m)) print(list(m)) try: print(len(m)) except Exception as e: print("***", e) print() print("As an iterator over sequences:") m = MapView(pow, a, b) print(next(m)) print(next(m)) try: print(len(m)) except Exception as e: print("***", e) #----------------------------------------------------------- This is the output I get: As a sequence: [1, 8, 243] [1, 8, 243] 3 8 As an iterator: 1 [8, 243] [] *** Mapping iterator has no len() As an iterator over sequences: 1 8 *** Mapping iterator has no len() -- Greg From abedillon at gmail.com Wed Dec 5 21:43:44 2018 From: abedillon at gmail.com (Abe Dillon) Date: Wed, 5 Dec 2018 20:43:44 -0600 Subject: [Python-ideas] [Brainstorm] Testing with Documented ABCs In-Reply-To: References:

Message-ID: [Marko Ristin-Kaufmann] > > What we do need at this moment, IMO, is a broad practical experience of > using contracts in Python. Once you make a change to the language, it's > impossible to undo. In contrast to what has been suggested in the previous > discussions (including my own voiced opinions), I actually now don't think > that introducing a language change would be beneficial *at this precise > moment*. I agree. That's why I prefaced this topic with [Brainstorm]. I want to explore the solution space to this problem and discuss some of the pros and cons of different ideas, *not* proceed straight to action. I also wanted to bring three thoughts to the table: 1. Fuzz testing and stateful testing like that provided by hypothesis might work together with contracts in an interesting way. 2. Tying tests/contracts to the bits of documentation that they validate is a great way to keep documentation in sync with code, but doctest does it a bit "backwards". Like in icontract-sphinx (or even this) it's better to construct documentation (partially) from test code than to write test code within documentation. In general, I find the relationship between documentation, testing, and type-checking interesting. The problems they each address seem to overlap quite a bit. 3. There seems like a lot of opportunity for the re-use of contracts, so maybe we should consider a mechanism to facilitate that. [Marko Ristin-Kaufmann] > I'd prefer to hear from people who actually use contracts in their > professional Python programming -- apart from the noisy syntax, how was the > experience? Did it help you catch bugs (and how many)? Were there big > problems with maintainability? Could you easily refactor? What were the > limits of the contracts you encountered? What kind of snapshot mechanism do > we need? How did you deal with multi-threading? And so on. That's a good point. I would argue that the concept of contracts isn't new, so there should be at least a few cases that we can draw on where others have tread before us (which you've obviously done to a large degree). That's not to belittle the work you've done on icontracts. It's a great tool for the reasons you describe. [Marko Ristin-Kaufmann] > *Multiple predicates per decorator. *The problem is that you can not deal > with toggling/describing individual contracts easily. While you can hack > your way through it (considering the arguments in the sequence, for > example), we found it clearer to have separate decorators. Moreover, > tracebacks are much easier to read, which is important when you debug a > program. I suppose it may be difficult to implement a clean, *backwards-compatible* solution, but yes; going through the arguments in a sequence would be my naive solution. Each entry has an optional description, a callable, and an optional tag or level to enable toggling (I would follow a simple model such as logging levels) *in that order*. It makes sense that the text description come first because that's the most relevant to a reader (like a doc-string), then the corresponding code, then the toggling flag which will often be an optimization detail which generally fall behind code correctness in priority. It may be less straight-forward to parse, but I wouldn't call it a "hack". I guess I'm not sure what to say about tracebacks being hard to read. [Marko Ristin-Kaufmann] > *Practicality of decorators. *We have retrospective meetings at the > company and I frequently survey the opinions related to the contracts > (explicitly asking about the readability and maintainability) -- so far > nobody had any difficulties and nobody was bothered by the noisy syntax. That's fair enough. I think the implementation you've come up with is pretty close to optimally concise given the tools at your disposal. I think something like Eiffel is a good goal for Python to eventually shoot for, but without new syntax; each step between icontracts and an Eiffel-esque platonic ideal would require significant hackery with diminishing returns on investment. On Thu, Nov 29, 2018 at 1:05 AM Marko Ristin-Kaufmann < marko.ristin at gmail.com> wrote: > Hi Abe, > Thanks for your suggestions! We actually already considered the two > alternatives you propose. > > *Multiple predicates per decorator. *The problem is that you can not deal > with toggling/describing individual contracts easily. While you can hack > your way through it (considering the arguments in the sequence, for > example), we found it clearer to have separate decorators. Moreover, > tracebacks are much easier to read, which is important when you debug a > program. > > *AST magic. *The problem with any approach based on parsing (be it > parsing the code or the description) is that parsing is slow so you end up > spending a lot of cycles on contracts which might not be enabled (many > contracts are applied only in the testing environment, not int he > production). Hence you must have an approach that offers practically zero > overhead cost to importing a module when its contracts are turned off. > > Decoding byte-code does not work as current decoding libraries can not > keep up with the changes in the language and the compiler hence they are > always lagging behind. > > *Practicality of decorators. *We have retrospective meetings at the > company and I frequently survey the opinions related to the contracts > (explicitly asking about the readability and maintainability) -- so far > nobody had any difficulties and nobody was bothered by the noisy syntax. > The decorator syntax is simply not beautiful, no discussion about that. But > when it comes to maintenance, there's a linter included ( > https://github.com/Parquery/pyicontract-lint), and if you want contracts > rendered in an appealing way, there's a documentation tool for sphinx ( > https://github.com/Parquery/sphinx-icontract). The linter facilitates the > maintainability a lot and sphinx tool gives you nice documentation for a > library so that you don't even have to look into the source code that often > if you don't want to. > > We need to be careful not to mistake issues of aesthetics for practical > issues. Something might not be beautiful, but can be useful unless it's > unreadable. > > *Conclusion. *What we do need at this moment, IMO, is a broad practical > experience of using contracts in Python. Once you make a change to the > language, it's impossible to undo. In contrast to what has been suggested > in the previous discussions (including my own voiced opinions), I actually > now don't think that introducing a language change would be beneficial *at > this precise moment*. We don't know what the use cases are, and there is > no practical experience to base the language change on. > > I'd prefer to hear from people who actually use contracts in their > professional Python programming -- apart from the noisy syntax, how was the > experience? Did it help you catch bugs (and how many)? Were there big > problems with maintainability? Could you easily refactor? What were the > limits of the contracts you encountered? What kind of snapshot mechanism do > we need? How did you deal with multi-threading? And so on. > > icontract library is already practically usable and, if you don't use > inheritance, dpcontracts is usable as well. I would encourage everybody to > try out programming with contracts using an existing library and just hold > their nose when writing the noisy syntax. Once we unearthed deeper problems > related to contracts, I think it will be much easier and much more > convincing to write a proposal for introducing contracts in the core > language. If I had to write a proposal right now, it would be only based on > the experience of writing a humble 100K code base by a team of 5-10 people. > Not very convincing. > > > Cheers, > Marko > > On Thu, 29 Nov 2018 at 02:26, Abe Dillon wrote: > >> Marko, I have a few thoughts that might improve icontract. >> First, multiple clauses per decorator: >> >> @pre( >> *lambda* x: x >= 0, >> *lambda* y: y >= 0, >> *lambda* width: width >= 0, >> *lambda* height: height >= 0, >> *lambda* x, width, img: x + width <= width_of(img), >> *lambda* y, height, img: y + height <= height_of(img)) >> @post( >> *lambda* self: (self.x, self.y) in self, >> *lambda* self: (self.x+self.width-1, self.y+self.height-1) in self, >> *lambda* self: (self.x+self.width, self.y+self.height) not in self) >> *def* __init__(self, img: np.ndarray, x: int, y: int, width: int, >> height: int) -> None: >> self.img = img[y : y+height, x : x+width].copy() >> self.x = x >> self.y = y >> self.width = width >> self.height = height >> >> *def* __contains__(self, pt: Tuple[int, int]) -> bool: >> x, y = pt >> return (self.x <= x < self.x + self.width) and (self.y <= y < self.y + >> self.height) >> >> >> You might be able to get away with some magic by decorating a method just >> to flag it as using contracts: >> >> >> @contract # <- does byte-code and/or AST voodoo >> *def* __init__(self, img: np.ndarray, x: int, y: int, width: int, >> height: int) -> None: >> pre(x >= 0, >> y >= 0, >> width >= 0, >> height >= 0, >> x + width <= width_of(img), >> y + height <= height_of(img)) >> >> # this would probably be declared at the class level >> inv(*lambda* self: (self.x, self.y) in self, >> *lambda* self: (self.x+self.width-1, self.y+self.height-1) in >> self, >> *lambda* self: (self.x+self.width, self.y+self.height) not in >> self) >> >> self.img = img[y : y+height, x : x+width].copy() >> self.x = x >> self.y = y >> self.width = width >> self.height = height >> >> That might be super tricky to implement, but it saves you some lambda >> noise. Also, I saw a forked thread in which you were considering some sort >> of transpiler with similar syntax to the above example. That also works. >> Another thing to consider is that the role of descriptors >> >> overlaps some with the role of invariants. I don't know what to do with >> that knowledge, but it seems like it might be useful. >> >> Anyway, I hope those half-baked thoughts have *some* value... >> >> On Wed, Nov 28, 2018 at 1:12 AM Marko Ristin-Kaufmann < >> marko.ristin at gmail.com> wrote: >> >>> Hi Abe, >>> >>> I've been pulling a lot of ideas from the recent discussion on design by >>>> contract (DBC), the elegance and drawbacks >>>> of doctests >>>> , and the amazing talk >>>> given by Hillel Wayne at >>>> this year's PyCon entitled "Beyond Unit Tests: Taking your Tests to the >>>> Next Level". >>>> >>> >>> Have you looked at the recent discussions regarding design-by-contract >>> on this list ( >>> https://groups.google.com/forum/m/#!topic/python-ideas/JtMgpSyODTU >>> and the following forked threads)? >>> >>> You might want to have a look at static checking techniques such as >>> abstract interpretation. I hope to be able to work on such a tool for >>> Python in some two years from now. We can stay in touch if you are >>> interested. >>> >>> Re decorators: to my own surprise, using decorators in a larger code >>> base is completely practical including the readability and maintenance of >>> the code. It's neither that ugly nor problematic as it might seem at first >>> look. >>> >>> We use our https://github.com/Parquery/icontract at the company. Most >>> of the design choices come from practical issues we faced -- so you might >>> want to read the doc even if you don't plant to use the library. >>> >>> Some of the aspects we still haven't figured out are: how to approach >>> multi-threading (locking around the whole function with an additional >>> decorator?) and granularity of contract switches (right now we use >>> always/optimized, production/non-optimized and teating/slow, but it seems >>> that a larger system requires finer categories). >>> >>> Cheers Marko >>> >>> >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From marko.ristin at gmail.com Fri Dec 7 02:15:39 2018 From: marko.ristin at gmail.com (Marko Ristin-Kaufmann) Date: Fri, 7 Dec 2018 08:15:39 +0100 Subject: [Python-ideas] [Brainstorm] Testing with Documented ABCs In-Reply-To: References:

Message-ID: Hi Abe, I agree. That's why I prefaced this topic with [Brainstorm]. I want to > explore the solution space to this problem and discuss some of the pros and > cons of different ideas, *not* proceed straight to action. You are right. Please apologize, I was so primed by the discussions we had in October 2019 that I didn't pay enough attention to "branstorm" in the subject. Fuzz testing and stateful testing like that provided by hypothesis might > work together with contracts in an interesting way. > You might want to look at the literature on automatic test generation. A possible entry point could be: https://www.research-collection.ethz.ch/handle/20.500.11850/69581 If I had time available, I would start with a tool that analyses a given module and automatically generates code for the Hypothesis test cases. The tool needs to select functions which accept primitive data types and for each one of them translates their contracts into Hypothesis code. If contracts are not trivially translatable to Hypothesis, the function is ignored. For readability and speed of development (of the code under test, not of the tool), I would prefer this tool *not *to be dynamic so that the developer herself needs to re-run it if the function signatures changed. The ingredients for such a tool are all there with icontract (similar to sphinx-icontract, you import the module and analyze its functions; you can copy/past parts of sphinx-icontract implementation for parsing and listing the AST of the contracts). (If you'd like to continue discussing this topic, let's create an issue on icontract github page or switch to private correspondence in order not to spam this mail list). There seems like a lot of opportunity for the re-use of contracts, so maybe > we should consider a mechanism to facilitate that. > This was the case for the requests library. @James Lu was looking into it -- a lot of functions had very similar contracts. However, in our code base at the company (including the open-sourced libraries), there was not a single case where we thought that contracts re-use would be beneficial. Either it would have hurt the readability and introduce unnecessary couplings (when the contracts were trivial) or it made sense to encapsulate more complex contracts in a separate function. >> *Multiple predicates per decorator. * >> > I suppose it may be difficult to implement a clean, *backwards-compatible* > solution, but yes; going through the arguments in a sequence would be my > naive solution. Each entry has an optional description, a callable, and an > optional tag or level to enable toggling (I would follow a simple model > such as logging levels) *in that order*. > I found that to be too error-prone in a larger code base, but that is my very subjective opinion. Maybe you could make an example? but without new syntax; each step between icontracts and an Eiffel-esque > platonic ideal would require significant hackery with diminishing returns > on investment. > I agree. There are also issues with core python interpreter which I expect to remain open for a long time (see the issues related to retrieving code text of lambda functions and decorators and tweaking dynamically the behavior of help(.) for functions). Cheers, Marko > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mhroncok at redhat.com Fri Dec 7 03:53:04 2018 From: mhroncok at redhat.com (=?UTF-8?Q?Miro_Hron=c4=8dok?=) Date: Fri, 7 Dec 2018 09:53:04 +0100 Subject: [Python-ideas] Using sha512 instead of md5 on python.org/downloads Message-ID: <775682f6-16f0-e7a7-dd17-7e3ccfb7e772@redhat.com> Hi, I see md5 checksums at a release download page such as [1]. My idea is to switch to sha512 for a more reliable outcome. I'm no security expert, but AFAK md5 is generally believed to be unsafe, as it was repeatedly proven it can be vulnerable [2]. [1] https://www.python.org/downloads/release/python-371/ [2] https://en.wikipedia.org/wiki/MD5#Security -- Miro Hron?ok -- Phone: +420777974800 IRC: mhroncok From solipsis at pitrou.net Fri Dec 7 04:39:30 2018 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 7 Dec 2018 10:39:30 +0100 Subject: [Python-ideas] Using sha512 instead of md5 on python.org/downloads References: <775682f6-16f0-e7a7-dd17-7e3ccfb7e772@redhat.com> Message-ID: <20181207103930.565ce442@fsol> On Fri, 7 Dec 2018 09:53:04 +0100 Miro Hron?ok wrote: > Hi, > > I see md5 checksums at a release download page such as [1]. > > My idea is to switch to sha512 for a more reliable outcome. > > I'm no security expert, but AFAK md5 is generally believed to be unsafe, > as it was repeatedly proven it can be vulnerable [2]. md5 is only used for a quick integrity check here (think of it as a sophisticated checksum). For security you need to verify the corresponding GPG signature. Regards Antoine. From jeanpierreda at gmail.com Fri Dec 7 09:49:59 2018 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Fri, 7 Dec 2018 06:49:59 -0800 Subject: [Python-ideas] Using sha512 instead of md5 on python.org/downloads In-Reply-To: <20181207103930.565ce442@fsol> References: <775682f6-16f0-e7a7-dd17-7e3ccfb7e772@redhat.com> <20181207103930.565ce442@fsol> Message-ID: On Fri, Dec 7, 2018 at 1:40 AM Antoine Pitrou wrote: > md5 is only used for a quick integrity check here (think of it as a > sophisticated checksum). For security you need to verify the > corresponding GPG signature. > More to the point: you're getting the hash from the same place as the binary. If one is vulnerable to modifications by attackers, both are. So it doesn't matter. The real defense most people are relying on is TLS. -- Devin -------------- next part -------------- An HTML attachment was scrubbed... URL: From prometheus235 at gmail.com Fri Dec 7 10:56:22 2018 From: prometheus235 at gmail.com (Nick Timkovich) Date: Fri, 7 Dec 2018 09:56:22 -0600 Subject: [Python-ideas] Using sha512 instead of md5 on python.org/downloads In-Reply-To: References: <775682f6-16f0-e7a7-dd17-7e3ccfb7e772@redhat.com> <20181207103930.565ce442@fsol> Message-ID: Devils advocate: it might complicate things for someone that needs to use FIPS, where MD5 can be a pain to deal with. On Fri, Dec 7, 2018 at 8:50 AM Devin Jeanpierre wrote: > On Fri, Dec 7, 2018 at 1:40 AM Antoine Pitrou wrote: > >> md5 is only used for a quick integrity check here (think of it as a >> sophisticated checksum). For security you need to verify the >> corresponding GPG signature. >> > > More to the point: you're getting the hash from the same place as the > binary. If one is vulnerable to modifications by attackers, both are. So it > doesn't matter. The real defense most people are relying on is TLS. > > -- Devin > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Fri Dec 7 13:47:24 2018 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 7 Dec 2018 19:47:24 +0100 Subject: [Python-ideas] Using sha512 instead of md5 on python.org/downloads References: <775682f6-16f0-e7a7-dd17-7e3ccfb7e772@redhat.com> <20181207103930.565ce442@fsol> Message-ID: <20181207194724.7cb16fb5@fsol> On Fri, 7 Dec 2018 06:49:59 -0800 Devin Jeanpierre wrote: > On Fri, Dec 7, 2018 at 1:40 AM Antoine Pitrou wrote: > > > md5 is only used for a quick integrity check here (think of it as a > > sophisticated checksum). For security you need to verify the > > corresponding GPG signature. > > > > More to the point: you're getting the hash from the same place as the > binary. If one is vulnerable to modifications by attackers, both are. So it > doesn't matter. The real defense most people are relying on is TLS. If the site is vulnerable to modifications, then TLS doesn't help. Again: you must verify the GPG signatures (since they are produced by the release manager's private key, which is *not* stored on the python.org Web site). Regards Antoine. From bernardo at bernardosulzbach.com Fri Dec 7 14:46:29 2018 From: bernardo at bernardosulzbach.com (Bernardo Sulzbach) Date: Fri, 7 Dec 2018 17:46:29 -0200 Subject: [Python-ideas] Using sha512 instead of md5 on python.org/downloads In-Reply-To: References: <775682f6-16f0-e7a7-dd17-7e3ccfb7e772@redhat.com> <20181207103930.565ce442@fsol> Message-ID: Would this change actually help people who need to use FIPS? Other than that this change would only decrease the already very small probability of a corrupted download hashing the same, which isn't a bad thing. If it could make some users' jobs easier, even if it by no means helps guaranteeing the authenticity of the downloaded file, it might be worth considering. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeanpierreda at gmail.com Fri Dec 7 14:54:59 2018 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Fri, 7 Dec 2018 11:54:59 -0800 Subject: [Python-ideas] Using sha512 instead of md5 on python.org/downloads In-Reply-To: <20181207194724.7cb16fb5@fsol> References: <775682f6-16f0-e7a7-dd17-7e3ccfb7e772@redhat.com> <20181207103930.565ce442@fsol> <20181207194724.7cb16fb5@fsol> Message-ID: On Fri, Dec 7, 2018 at 10:48 AM Antoine Pitrou wrote: > If the site is vulnerable to modifications, then TLS doesn't help. > Again: you must verify the GPG signatures (since they are produced by > the release manager's private key, which is *not* stored on the > python.org Web site). > This is missing the point. They were asking why not to use SHA512. The answer is that the hash does not provide any extra security. GPG is separate: even if there was no GPG signature, SHA512 would still not provide any extra security. That's why I said "more to the point". :P Nobody "must" verify the GPG signatures. TLS doesn't protect against everything, but neither does GPG. A naive user might just download a public GPG key from a compromised python.org and use it to verify the compromised release, see everything is "OK", and still be hosed. -- Devin -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Dec 7 16:25:19 2018 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 7 Dec 2018 13:25:19 -0800 Subject: [Python-ideas] Using sha512 instead of md5 on python.org/downloads In-Reply-To: References: <775682f6-16f0-e7a7-dd17-7e3ccfb7e772@redhat.com> <20181207103930.565ce442@fsol> <20181207194724.7cb16fb5@fsol> Message-ID: For this specific purpose, md5 is just as good as a proper hash. But all else being equal, it would still be better to use a proper hash, just so people don't have to go through the whole security analysis to check that. Of course all else isn't equal: switching from md5 to sha-whatever would require someone do the work. Is anyone volunteering? On Fri, Dec 7, 2018, 11:56 Devin Jeanpierre On Fri, Dec 7, 2018 at 10:48 AM Antoine Pitrou > wrote: > >> If the site is vulnerable to modifications, then TLS doesn't help. >> Again: you must verify the GPG signatures (since they are produced by >> the release manager's private key, which is *not* stored on the >> python.org Web site). >> > > This is missing the point. They were asking why not to use SHA512. The > answer is that the hash does not provide any extra security. GPG is > separate: even if there was no GPG signature, SHA512 would still not > provide any extra security. That's why I said "more to the point". :P > > Nobody "must" verify the GPG signatures. TLS doesn't protect against > everything, but neither does GPG. A naive user might just download a public > GPG key from a compromised python.org and use it to verify the > compromised release, see everything is "OK", and still be hosed. > > -- Devin > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Fri Dec 7 18:38:06 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 8 Dec 2018 10:38:06 +1100 Subject: [Python-ideas] Using sha512 instead of md5 on python.org/downloads In-Reply-To: References: <775682f6-16f0-e7a7-dd17-7e3ccfb7e772@redhat.com> <20181207103930.565ce442@fsol> <20181207194724.7cb16fb5@fsol>

Message-ID: <20181207233805.GE13061@ando.pearwood.info> On Fri, Dec 07, 2018 at 01:25:19PM -0800, Nathaniel Smith wrote: > For this specific purpose, md5 is just as good as a proper hash. But all > else being equal, it would still be better to use a proper hash, just so > people don't have to go through the whole security analysis to check that. I don't understand what you are trying to say here about "the whole security analysis" to check "that". What security analysis, and what is "that"? It seems to me that moving to a cryptographically-secure hash would give many people a false sense of security, that just because the hash matched, the download was not only not corrupted, but not compromised as well. For those two purposes: - testing for accidental corruption; - testing for deliberate compromise; md5 and sha512 are precisely equivalent: both are sufficient for the first, and useless for the second. But a crypto-hash can give a false sense of security. The original post in this thread is evidence of that. As such, I don't think we should move to anything stronger than md5. -- Steve From njs at pobox.com Fri Dec 7 19:35:56 2018 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 7 Dec 2018 16:35:56 -0800 Subject: [Python-ideas] Using sha512 instead of md5 on python.org/downloads In-Reply-To: <20181207233805.GE13061@ando.pearwood.info> References: <775682f6-16f0-e7a7-dd17-7e3ccfb7e772@redhat.com> <20181207103930.565ce442@fsol> <20181207194724.7cb16fb5@fsol>

<20181207233805.GE13061@ando.pearwood.info> Message-ID: On Fri, Dec 7, 2018 at 3:38 PM Steven D'Aprano wrote: > On Fri, Dec 07, 2018 at 01:25:19PM -0800, Nathaniel Smith wrote: > > > For this specific purpose, md5 is just as good as a proper hash. But all > > else being equal, it would still be better to use a proper hash, just so > > people don't have to go through the whole security analysis to check > that. > > I don't understand what you are trying to say here about "the whole > security analysis" to check "that". What security analysis, and > what is "that"? > The analysis that people posted in this thread, demonstrating that for the particular purpose at hand, md5 and sha-whatever are equally useful. > It seems to me that moving to a cryptographically-secure hash would give > many people a false sense of security, that just because the hash > matched, the download was not only not corrupted, but not compromised as > well. For those two purposes: > > - testing for accidental corruption; > - testing for deliberate compromise; > > md5 and sha512 are precisely equivalent: both are sufficient for the > first, and useless for the second. But a crypto-hash can give a false > sense of security. The original post in this thread is evidence of that. > If you're worried about giving people a false sense of security, I think it would be more effective to post a prominent notice or link describing how people should interpret the hashes. Maybe some people see md5 and think "ah-hah, this is their way of warning me that the hash is suitable for defending against accidental corruption but not malicious actors", but it must be a small minority :-). (That's certainly not what the OP thought.) Most people will just think we're fools who don't realize or care md5 is broken. Statistically, that's a pretty reasonable guess when you see someone using md5. -n -- Nathaniel J. Smith -- https://vorpus.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From songofacandy at gmail.com Fri Dec 7 21:05:43 2018 From: songofacandy at gmail.com (INADA Naoki) Date: Sat, 8 Dec 2018 11:05:43 +0900 Subject: [Python-ideas] Using sha512 instead of md5 on python.org/downloads In-Reply-To: <20181207233805.GE13061@ando.pearwood.info> References: <775682f6-16f0-e7a7-dd17-7e3ccfb7e772@redhat.com> <20181207103930.565ce442@fsol> <20181207194724.7cb16fb5@fsol>

<20181207233805.GE13061@ando.pearwood.info> Message-ID: > > It seems to me that moving to a cryptographically-secure hash would give > many people a false sense of security, that just because the hash > matched, the download was not only not corrupted, but not compromised as > well. For those two purposes: > > - testing for accidental corruption; > - testing for deliberate compromise; > > md5 and sha512 are precisely equivalent: both are sufficient for the > first, and useless for the second. But a crypto-hash can give a false > sense of security. The original post in this thread is evidence of that. > > As such, I don't think we should move to anything stronger than md5. > We already use SHA256 on PyPI. Many project in the world moving from md5 to SHA256. And at some time, SHA256 can be better than md5. When hash is delivered through other route than content, it's difficult to attack / easy to detect we're under attack. For example, sha256 is written in requirements.txt or Homebrew formula. When hash mismatch is happened, we can detect something go wrong. So it's worth to write stronger hash in such files. And if we use sha256 on download site, it's easy to check hash equality between formula and download site. If it's different, Homebrew or download site is under attack. So I think it's worth enough to moving to stronger and more used hash. (And by this reason, I prefer sha256 to sha512 for now.) -- INADA Naoki From steve at pearwood.info Fri Dec 7 23:09:26 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 8 Dec 2018 15:09:26 +1100 Subject: [Python-ideas] Using sha512 instead of md5 on python.org/downloads In-Reply-To: References: <775682f6-16f0-e7a7-dd17-7e3ccfb7e772@redhat.com> <20181207103930.565ce442@fsol> <20181207194724.7cb16fb5@fsol>

<20181207233805.GE13061@ando.pearwood.info> Message-ID: <20181208040925.GH13061@ando.pearwood.info> On Fri, Dec 07, 2018 at 04:35:56PM -0800, Nathaniel Smith wrote: > On Fri, Dec 7, 2018 at 3:38 PM Steven D'Aprano wrote: > > > On Fri, Dec 07, 2018 at 01:25:19PM -0800, Nathaniel Smith wrote: > > > > > For this specific purpose, md5 is just as good as a proper hash. But all > > > else being equal, it would still be better to use a proper hash, just so > > > people don't have to go through the whole security analysis to check > > > that. > > > > I don't understand what you are trying to say here about "the whole > > security analysis" to check "that". What security analysis, and > > what is "that"? > > > > The analysis that people posted in this thread, demonstrating that for the > particular purpose at hand, md5 and sha-whatever are equally useful. Okay, so your position is that even though there's no actual increase in security from using sha512, we ought to use it so that people who don't know any better won't complain that we're using a "less secure" hash. Is that accurate? As security theatre goes, I guess its less harmful than most :-) [...] > If you're worried about giving people a false sense of security, I think it > would be more effective to post a prominent notice or link describing how > people should interpret the hashes. I want to avoid encouraging a false sense of security. I'm not sure that we ought to extend that further to actively taking on the responsibility of teaching users about this. On the other hand, perhaps threads like this suggest that this is inevitable... on the gripping hand, many users won't read the notice regardless of what we do... How often does this issue come up? I'm not sure it is common enough to bother fixing, but others' judgement on that may differ. > Maybe some people see md5 and think > "ah-hah, this is their way of warning me that the hash is suitable for > defending against accidental corruption but not malicious actors", but it > must be a small minority :-). (That's certainly not what the OP thought.) I didn't think they would. > Most people will just think we're fools who don't realize or care md5 is > broken. Statistically, that's a pretty reasonable guess when you see > someone using md5. I don't think there's any way to know for sure, but I'd be shocked if "most people" even thought about the issue, or checked the hash, regardless of whether it is sha512, md5 or a CRC checksum. In my experience, browsers and downloaders like wget either download the data correctly, or they make it damn obvious that the download failed. YMMV. As for those who "think we're fools", that's not a reasonable guess by any means. Since we're not fools, and for the purposes we're using the hash there is no difference between md5 and sha512, such a guess would be a classic example of "a little knowledge is dangerous" and "not as clever or well-informed as you think you are" (that's a generic "you", not you personally). If they don't think we're fools for using md5, they'll probably think we're fools for some other reason. -- Steve From steve at pearwood.info Fri Dec 7 23:14:18 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 8 Dec 2018 15:14:18 +1100 Subject: [Python-ideas] Using sha512 instead of md5 on python.org/downloads In-Reply-To: References: <775682f6-16f0-e7a7-dd17-7e3ccfb7e772@redhat.com> <20181207103930.565ce442@fsol> <20181207194724.7cb16fb5@fsol>

<20181207233805.GE13061@ando.pearwood.info> Message-ID: <20181208041417.GI13061@ando.pearwood.info> On Sat, Dec 08, 2018 at 11:05:43AM +0900, INADA Naoki wrote: > We already use SHA256 on PyPI. > Many project in the world moving from md5 to SHA256. [...] How easy is it to use sha256 on the major platforms, compared to md5? On Linux, it is just as easy: [steve at ando ~]$ md5sum x.py 7008dcaa07fd35917474835425c6151a x.py [steve at ando ~]$ sha256sum x.py 6730dbf2b5ea5c874e789a39532b0e544af18fbea3c680880b01c81b773eabe2 x.py but how about Windows and Mac users? Do those platforms provide a sha256 checksum utility? (Maybe we should provide both hashes.) -- Steve From greg at krypto.org Fri Dec 7 23:55:53 2018 From: greg at krypto.org (Gregory P. Smith) Date: Fri, 7 Dec 2018 20:55:53 -0800 Subject: [Python-ideas] Using sha512 instead of md5 on python.org/downloads In-Reply-To: <20181207233805.GE13061@ando.pearwood.info> References: <775682f6-16f0-e7a7-dd17-7e3ccfb7e772@redhat.com> <20181207103930.565ce442@fsol> <20181207194724.7cb16fb5@fsol>

<20181207233805.GE13061@ando.pearwood.info> Message-ID: On Fri, Dec 7, 2018 at 3:38 PM Steven D'Aprano wrote: > On Fri, Dec 07, 2018 at 01:25:19PM -0800, Nathaniel Smith wrote: > > > For this specific purpose, md5 is just as good as a proper hash. But all > > else being equal, it would still be better to use a proper hash, just so > > people don't have to go through the whole security analysis to check > that. > > I don't understand what you are trying to say here about "the whole > security analysis" to check "that". What security analysis, and > what is "that"? > > It seems to me that moving to a cryptographically-secure hash would give > many people a false sense of security, that just because the hash > matched, the download was not only not corrupted, but not compromised as > well. For those two purposes: > > - testing for accidental corruption; > - testing for deliberate compromise; > > md5 and sha512 are precisely equivalent: both are sufficient for the > first, and useless for the second. But a crypto-hash can give a false > sense of security. The original post in this thread is evidence of that. > > As such, I don't think we should move to anything stronger than md5. > If we switched to sha2+ or listed 8 different hashes at once in the announcement text so that nobody can find the actual link content, we'd stop having people pipe up and complain that we used md5 for something. Less mailing list threads like this one seems like a benefit. :P Debian provides all of the popular FIPS hashes, in side files, so people can use whatever floats their boat for a content integrity check: https://cdimage.debian.org/debian-cd/current/ppc64el/iso-cd/ >From a semi-security perspective without verifying gpg signatures, listing a currently collision-resistant hash (sha2 onwards today) in widely disseminated release announcement that goes on mailing lists and gets forwarded and reposted in many places is still useful. Being not hosted in a single central place, if the downloads and hashes on the main servers change *after* their computation, publishing, and announcement - it serves as a widely distributed question mark. A pointless one, as the gpg signature also exists, but it is one none the less. As to windows and mac providing hashing functions on the command line, nope. assume nothing is provided. On linux my fingers would use "openssl hashname" rather than *sum commands. But none of those are ever required to be installed by anything. The only people who ever check hashes are those who already know what tools to use and how. Some could ironically install the downloaded python and use it to check its own hash. None of that is our problem. -gps -------------- next part -------------- An HTML attachment was scrubbed... URL: From phd at phdru.name Sat Dec 8 07:01:57 2018 From: phd at phdru.name (Oleg Broytman) Date: Sat, 8 Dec 2018 13:01:57 +0100 Subject: [Python-ideas] Using sha512 instead of md5 on python.org/downloads In-Reply-To: References: <775682f6-16f0-e7a7-dd17-7e3ccfb7e772@redhat.com> <20181207103930.565ce442@fsol> <20181207194724.7cb16fb5@fsol>

<20181207233805.GE13061@ando.pearwood.info> Message-ID: <20181208120157.x7rb3sh2rc37hvit@phdru.name> On Fri, Dec 07, 2018 at 08:55:53PM -0800, "Gregory P. Smith" wrote: > Debian provides all of the popular FIPS hashes... [skip] > https://cdimage.debian.org/debian-cd/current/ppc64el/iso-cd/ And they protect the hash files by signing them instead of signing CDs/DVDs. > -gps Oleg. -- Oleg Broytman https://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From christian at python.org Sat Dec 8 10:06:51 2018 From: christian at python.org (Christian Heimes) Date: Sat, 8 Dec 2018 16:06:51 +0100 Subject: [Python-ideas] Using sha512 instead of md5 on python.org/downloads In-Reply-To: References: <775682f6-16f0-e7a7-dd17-7e3ccfb7e772@redhat.com> <20181207103930.565ce442@fsol> <20181207194724.7cb16fb5@fsol>

<20181207233805.GE13061@ando.pearwood.info> Message-ID: On 08/12/2018 05.55, Gregory P. Smith wrote: > > On Fri, Dec 7, 2018 at 3:38 PM Steven D'Aprano > > wrote: > > On Fri, Dec 07, 2018 at 01:25:19PM -0800, Nathaniel Smith wrote: > > > For this specific purpose, md5 is just as good as a proper hash. > But all > > else being equal, it would still be better to use a proper hash, > just so > > people don't have to go through the whole security analysis to > check that. > > I don't understand what you are trying to say here about "the whole > security analysis" to check "that". What security analysis, and > what is "that"? > > It seems to me that moving to a cryptographically-secure hash would > give > many people a false sense of security, that just because the hash > matched, the download was not only not corrupted, but not > compromised as > well. For those two purposes: > > - testing for accidental corruption; > - testing for deliberate compromise; > > md5 and sha512 are precisely equivalent: both are sufficient for the > first, and useless for the second. But a crypto-hash can give a false > sense of security. The original post in this thread is evidence of that. > > As such, I don't think we should move to anything stronger than md5. > > > If we switched to sha2+ or listed 8 different hashes at once in the > announcement text so that nobody can find the actual link content, we'd > stop having people pipe up and complain that we used md5 for something.? > Less mailing list threads like this one seems like a benefit. :P > > Debian provides all of the popular FIPS hashes, in side files, so people > can use whatever floats their boat for a content integrity check: > ?https://cdimage.debian.org/debian-cd/current/ppc64el/iso-cd/ By the way it's a common misunderstanding that FIPS forbids MD5 in general. FIPS is more complicated than black and white lists of algorithms. FIPS also takes into account how an algorithm is used. For example and if I recall correctly, AES-GCM is only allowed in network communication protocols but not for persistent storage. Simply speaking: In FIPS mode, MD5 is still allowed in **non-security contexts**. You cannot use MD5 to make any security claims like file integrity. However you are still allowed to use MD5 as non-secure hash function to detect file corruption. The design and documentation must clearly state that you are only guarding against accidental file corruption caused by network or hardware issue, but as protection against a malicious attacker. Christian From solipsis at pitrou.net Sat Dec 8 11:54:31 2018 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 8 Dec 2018 17:54:31 +0100 Subject: [Python-ideas] Using sha512 instead of md5 on python.org/downloads References: <775682f6-16f0-e7a7-dd17-7e3ccfb7e772@redhat.com> <20181207103930.565ce442@fsol> <20181207194724.7cb16fb5@fsol> Message-ID: <20181208175431.3ad22bfb@fsol> On Fri, 7 Dec 2018 11:54:59 -0800 Devin Jeanpierre wrote: > On Fri, Dec 7, 2018 at 10:48 AM Antoine Pitrou wrote: > > > If the site is vulnerable to modifications, then TLS doesn't help. > > Again: you must verify the GPG signatures (since they are produced by > > the release manager's private key, which is *not* stored on the > > python.org Web site). > > This is missing the point. Why do you think I missed anything here? Regards Antoine. From jamtlu at gmail.com Sat Dec 8 16:27:13 2018 From: jamtlu at gmail.com (James Lu) Date: Sat, 8 Dec 2018 16:27:13 -0500 Subject: [Python-ideas] [Brainstorm] Testing with Documented ABCs In-Reply-To: References:

Message-ID: > Interesting. In the thread you linked on DBC, it seemed like Steve D'Aprano and David Mertz (and possibly others) were put off by the verbosity and noisiness of the decorator-based solution you provided with icontract (though I think there are ways to streamline that solution). It seems like syntactic support could offer a more concise and less noisy implementation. Btw, it would be relatively easy to create a parser for Python. Python doesn't have any crazy grammar constructs like the lexer hack AFAIK. I'm imagining using Bison: 1. convert python's grammar ( https://github.com/python/cpython/blob/master/Lib/lib2to3/Grammar.txt) to Bison format. 2. write a lexer to parse tokens and convert indentation to indent/dedent tokens. 3. extend the grammar however you want it. Call these custom AST nodes "contract nodes." 4. create a simple AST, really an annotated parse tree. I think we can use a simple one that's a bunch of nested lists: ["for_stmt", "for i in range(10):", [ ["exprlist", "i", [ ... ]], ["testlist", "range(10)", [ ... ]] ]] # ["node_type", "", ] The AST can be made more detailed on an as-needed basis. 5. traverse the AST, and "rewrite" the the AST by pasting traditional python AST nodes where contract nodes are. This example from the Babel handbook may help if you have trouble understanding what this step means. https://github.com/jamiebuilds/babel-handbook/blob/master/translations/en/plugin-handbook.md#toc-writing-your-first-babel-plugin 6. turn the AST back into python source. Since we're storing the source code from the beginning, this should be fairly easy. (Bison lets your lexer tell the parser the line and column numbers of each token.) --- I made a joke language with Bison once, it's really flexible and well-suited for this kind of task. This 6-step p Tip: I found Bison's C++ mode too complicated, so I used it in C mode with the C++ Standard Library and C++ references enabled. --- I'm interested, what contract-related functionality do you think Python's existing syntax is inadequate for? You could look into using with statements and a python program that takes the AST and snips contract-related with statements to produce optimized code, though I suppose that's one step below the custom-parser method. On Wed, Nov 28, 2018 at 3:29 PM Abe Dillon wrote: > [Marko Ristin-Kaufmann] >> >> Have you looked at the recent discussions regarding design-by-contract on >> this list > > > I tried to read through them all before posting, but I may have missed > some of the forks. There was a lot of good discussion! > > [Marko Ristin-Kaufmann] > >> You might want to have a look at static checking techniques such as >> abstract interpretation. I hope to be able to work on such a tool for >> Python in some two years from now. We can stay in touch if you are >> interested. > > > I'll look into that! I'm very interested! > > [Marko Ristin-Kaufmann] > >> Re decorators: to my own surprise, using decorators in a larger code base >> is completely practical including the readability and maintenance of the >> code. It's neither that ugly nor problematic as it might seem at first look. > > > Interesting. In the thread you linked on DBC, it seemed like Steve > D'Aprano and David Mertz (and possibly others) were put off by the > verbosity and noisiness of the decorator-based solution you provided with > icontract (though I think there are ways to streamline that solution). It > seems like syntactic support could offer a more concise and less noisy > implementation. > > One thing that I can get on a soap-box about is the benefit putting the > most relevant information to the reader in the order of top to bottom and > left to right whenever possible. I've written many posts about this. I > think a lot of Python syntax gets this right. It would have been easy to > follow the same order as for-loops when designing comprehensions, but > expressions allow you some freedom to order things differently, so now > comprehensions read: > > squares = ... > # squares is > > squares = [... > # squares is a list > > squares = [number*number... > # squares is a list of num squared > > squares = [number*number for num in numbers] > # squares is a list of num squared 'from' numbers > > I think decorators sort-of break this rule because they can put a lot of > less important information (like, that a function is logged or timed) > before more important information (like the function's name, signature, > doc-string, etc...). It's not a huge deal because they tend to be > de-emphasized by my IDE and there typically aren't dozens of them on each > function, but I definitely prefer Eiffel's syntax > over > decorators for that reason. > > I understand that syntax changes have an very high bar for very good > reasons. Hillel Wayne's PyCon talk got me thinking that we might be close > enough to a really great solution to a wide variety of testing problems > that it might justify some new syntax or perhaps someone has an idea that > wouldn't require new syntax that I didn't think of. > > [Marko Ristin-Kaufmann] > >> Some of the aspects we still haven't figured out are: how to approach >> multi-threading (locking around the whole function with an additional >> decorator?) and granularity of contract switches (right now we use >> always/optimized, production/non-optimized and teating/slow, but it seems >> that a larger system requires finer categories). > > > Yeah... I don't know anything about testing concurrent or parallel code. > > On Wed, Nov 28, 2018 at 1:12 AM Marko Ristin-Kaufmann < > marko.ristin at gmail.com> wrote: > >> Hi Abe, >> >> I've been pulling a lot of ideas from the recent discussion on design by >>> contract (DBC), the elegance and drawbacks >>> of doctests >>> , and the amazing talk >>> given by Hillel Wayne at >>> this year's PyCon entitled "Beyond Unit Tests: Taking your Tests to the >>> Next Level". >>> >> >> Have you looked at the recent discussions regarding design-by-contract on >> this list ( >> https://groups.google.com/forum/m/#!topic/python-ideas/JtMgpSyODTU >> and the following forked threads)? >> >> You might want to have a look at static checking techniques such as >> abstract interpretation. I hope to be able to work on such a tool for >> Python in some two years from now. We can stay in touch if you are >> interested. >> >> Re decorators: to my own surprise, using decorators in a larger code base >> is completely practical including the readability and maintenance of the >> code. It's neither that ugly nor problematic as it might seem at first look. >> >> We use our https://github.com/Parquery/icontract at the company. Most of >> the design choices come from practical issues we faced -- so you might want >> to read the doc even if you don't plant to use the library. >> >> Some of the aspects we still haven't figured out are: how to approach >> multi-threading (locking around the whole function with an additional >> decorator?) and granularity of contract switches (right now we use >> always/optimized, production/non-optimized and teating/slow, but it seems >> that a larger system requires finer categories). >> >> Cheers Marko >> >> >> >> _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ronaldoussoren at mac.com Sun Dec 9 02:26:06 2018 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Sun, 9 Dec 2018 08:26:06 +0100 Subject: [Python-ideas] Using sha512 instead of md5 on python.org/downloads In-Reply-To: <20181208041417.GI13061@ando.pearwood.info> References: <775682f6-16f0-e7a7-dd17-7e3ccfb7e772@redhat.com> <20181207103930.565ce442@fsol> <20181207194724.7cb16fb5@fsol>

<20181207233805.GE13061@ando.pearwood.info> <20181208041417.GI13061@ando.pearwood.info> Message-ID: <19C76C80-0055-4980-953B-E94288CEC06D@mac.com> > On 8 Dec 2018, at 05:14, Steven D'Aprano wrote: > > On Sat, Dec 08, 2018 at 11:05:43AM +0900, INADA Naoki wrote: > >> We already use SHA256 on PyPI. >> Many project in the world moving from md5 to SHA256. > [...] > > > How easy is it to use sha256 on the major platforms, compared to md5? > > On Linux, it is just as easy: > > [steve at ando ~]$ md5sum x.py > 7008dcaa07fd35917474835425c6151a x.py > [steve at ando ~]$ sha256sum x.py > 6730dbf2b5ea5c874e789a39532b0e544af18fbea3c680880b01c81b773eabe2 x.py > > but how about Windows and Mac users? Do those platforms provide a sha256 > checksum utility? > > (Maybe we should provide both hashes.) macOS has a shasum tool that does the same thing: $ shasum -a 256 __init__.py 8db2fe0b21deec50d134895a6d5cfbb5300b23922bf2d9bb5b4b63ac40c6a22e __init__.py There?s also python itself that can be used to calculate the checksum :-) Ronald -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at barrys-emacs.org Sun Dec 9 09:54:44 2018 From: barry at barrys-emacs.org (Barry Scott) Date: Sun, 9 Dec 2018 14:54:44 +0000 Subject: [Python-ideas] Using sha512 instead of md5 on python.org/downloads In-Reply-To: <19C76C80-0055-4980-953B-E94288CEC06D@mac.com> References: <775682f6-16f0-e7a7-dd17-7e3ccfb7e772@redhat.com> <20181207103930.565ce442@fsol> <20181207194724.7cb16fb5@fsol>

<20181207233805.GE13061@ando.pearwood.info> <20181208041417.GI13061@ando.pearwood.info> <19C76C80-0055-4980-953B-E94288CEC06D@mac.com> Message-ID: <1B3C5E4A-4B1A-4E1E-972E-FC59D736834A@barrys-emacs.org> On Windows 10 this works: c:Downloads> certutil -hashfile python-3.7.1-amd64.exe sha512 SHA512 hash of python-3.7.1-amd64.exe: 7dec6362c402b38a9c29b85b204398d7d3fd19509f05279bf713a92abe5b485d4c0c4b175c4edb47f81fd800a599bc2283642a8f0c666edd9e971b5cedf18041 CertUtil: -hashfile command completed successfully. Barry From p.f.moore at gmail.com Sun Dec 9 12:31:22 2018 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 9 Dec 2018 17:31:22 +0000 Subject: [Python-ideas] Using sha512 instead of md5 on python.org/downloads In-Reply-To: <1B3C5E4A-4B1A-4E1E-972E-FC59D736834A@barrys-emacs.org> References: <775682f6-16f0-e7a7-dd17-7e3ccfb7e772@redhat.com> <20181207103930.565ce442@fsol> <20181207194724.7cb16fb5@fsol>

<20181207233805.GE13061@ando.pearwood.info> <20181208041417.GI13061@ando.pearwood.info> <19C76C80-0055-4980-953B-E94288CEC06D@mac.com> <1B3C5E4A-4B1A-4E1E-972E-FC59D736834A@barrys-emacs.org> Message-ID: On Sun, 9 Dec 2018 at 15:13, Barry Scott wrote: > > On Windows 10 this works: > > c:Downloads> certutil -hashfile python-3.7.1-amd64.exe sha512 > SHA512 hash of python-3.7.1-amd64.exe: > 7dec6362c402b38a9c29b85b204398d7d3fd19509f05279bf713a92abe5b485d4c0c4b175c4edb47f81fd800a599bc2283642a8f0c666edd9e971b5cedf18041 > CertUtil: -hashfile command completed successfully. In Powershell, there's Get-FileHash python-3.7.1-amd64.exe -Algorithm sha512. The default algorithm is SHA256. On Windows, it's surprisingly often the case that things which traditionally fell under "Windows users probably don't have a tool to do that" are available in Powershell. None of which is that relevant, the fact still remains that no matter what algorithm is used, the hash only has limited value as a security measure. Paul From ronaldoussoren at mac.com Mon Dec 10 01:31:44 2018 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Mon, 10 Dec 2018 07:31:44 +0100 Subject: [Python-ideas] Using sha512 instead of md5 on python.org/downloads In-Reply-To: References: <775682f6-16f0-e7a7-dd17-7e3ccfb7e772@redhat.com> <20181207103930.565ce442@fsol> <20181207194724.7cb16fb5@fsol>

<20181207233805.GE13061@ando.pearwood.info> <20181208041417.GI13061@ando.pearwood.info> <19C76C80-0055-4980-953B-E94288CEC06D@mac.com> <1B3C5E4A-4B1A-4E1E-972E-FC59D736834A@barrys-emacs.org> <9156E866-0904-4804-B3DE-52F6745B6D44@mac.com> Message-ID: <20181210112615.51d37cfb@fsol> On Mon, 10 Dec 2018 07:31:44 +0100 Ronald Oussoren via Python-ideas wrote: > > That?s true, but it does show that switching from MD5 to SHA2 doesn?t make it harder to validate the checksum on major platforms. > > I don?t have a strong opinion either way, I?m slightly in favour of switching to the same algorithm as used on PyPI to be consistent within these PSF properties. > > BTW. I wonder how many actually verify these checksums, I personally generally assume that HTTPS downloads are reliable enough and don?t verify checksums unless I do the download in an automation pipeline. Ah, the automation use case is a good point in favor of stronger hashes. You may have checked the initial download hash and then use it in a script to make sure later downloads haven't been tempered with. Regards Antoine. From erik.m.bray at gmail.com Mon Dec 10 08:22:22 2018 From: erik.m.bray at gmail.com (E. Madison Bray) Date: Mon, 10 Dec 2018 14:22:22 +0100 Subject: [Python-ideas] Suggested MapView object (Re: __len__() for map()) In-Reply-To: <5C046217.7010805@canterbury.ac.nz> References: <20181201011734.GN4319@ando.pearwood.info> <20181201165320.GQ4319@ando.pearwood.info> <20181201172307.GS4319@ando.pearwood.info> <20181201190803.GT4319@ando.pearwood.info> <5C033044.9080907@canterbury.ac.nz> <5C03D85F.2040702@canterbury.ac.nz> <20181202134324.GV4319@ando.pearwood.info> <5C046217.7010805@canterbury.ac.nz> Message-ID: On Sun, Dec 2, 2018 at 11:52 PM Greg Ewing wrote: > > Steven D'Aprano wrote: > > Perhaps more like the principle of most > > astonishment: the object changes from sized to unsized even if you don't > > modify its value or its type, but merely if you look at it the wrong > > way: > > Yes, but keep in mind the purpose of the whole thing is to > provide a sequence interface while not breaking old code > that expects an iterator interface. Code that was written > to work with the existing map() will not be calling len() > on it at all, because that would never have worked. > > > Neither fish nor fowl with a confusing API that is not > > quite a sequence, not quite an iterator, not quite sized, but just > > enough of each to lead people into error. > > Yes, it's a compromise in the interests of backwards > compatibility. But there are no surprises as long as you > stick to one interface or the other. Weird things happen > if you mix them up, but sane code won't be doing that. Indeed; I believe it is very useful to have a map-like object that is effectively an augmented list/sequence. From bernardo at bernardosulzbach.com Mon Dec 10 09:44:43 2018 From: bernardo at bernardosulzbach.com (Bernardo Sulzbach) Date: Mon, 10 Dec 2018 12:44:43 -0200 Subject: [Python-ideas] Using sha512 instead of md5 on python.org/downloads In-Reply-To: <6220c108-701c-ce04-656a-4a7d2210bfcc@redhat.com> References: <775682f6-16f0-e7a7-dd17-7e3ccfb7e772@redhat.com> <20181207103930.565ce442@fsol> <6220c108-701c-ce04-656a-4a7d2210bfcc@redhat.com> Message-ID: If the discussion gets to which SHA-2 should be used, I would like to point out that SHA-512 is not only twice the width of SHA-256 but also faster to compute (anecdotally) on most 64-bit platforms. -------------- next part -------------- An HTML attachment was scrubbed... URL: From marcos.eliziario at gmail.com Mon Dec 10 10:05:49 2018 From: marcos.eliziario at gmail.com (Marcos Eliziario) Date: Mon, 10 Dec 2018 13:05:49 -0200 Subject: [Python-ideas] Using sha512 instead of md5 on python.org/downloads In-Reply-To: References: <775682f6-16f0-e7a7-dd17-7e3ccfb7e772@redhat.com> <20181207103930.565ce442@fsol> <6220c108-701c-ce04-656a-4a7d2210bfcc@redhat.com> Message-ID: My two cents. Automation tools should check the PGP signature. The public keys should be obtained once via https from an odd number of different trustworthy sources from a set of well know domains that use DNSSEC. Users should be advised to check the certificate chain from those domains at the first time those keys are downloaded and explicitly agree. This is a more secure schema than simply relying on a checksum that you've got from the same site you've used to download the code. Moving from MD5 from SHA obscures this, by making people believe that this hash should be used for anything more than checking for file corruption. Em seg, 10 de dez de 2018 ?s 12:45, Bernardo Sulzbach < bernardo at bernardosulzbach.com> escreveu: > If the discussion gets to which SHA-2 should be used, I would like to > point out that SHA-512 is not only twice the width of SHA-256 but also > faster to compute (anecdotally) on most 64-bit platforms. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Marcos Elizi?rio Santos mobile/whatsapp/telegram: +55(21) 9-8027-0156 skype: marcos.eliziario at gmail.com linked-in : https://www.linkedin.com/in/eliziario/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From marcos.eliziario at gmail.com Mon Dec 10 10:28:31 2018 From: marcos.eliziario at gmail.com (Marcos Eliziario) Date: Mon, 10 Dec 2018 13:28:31 -0200 Subject: [Python-ideas] Using sha512 instead of md5 on python.org/downloads In-Reply-To: References: <775682f6-16f0-e7a7-dd17-7e3ccfb7e772@redhat.com> <20181207103930.565ce442@fsol> <6220c108-701c-ce04-656a-4a7d2210bfcc@redhat.com>

Message-ID: A Hash is surely useful in the context of locking versions of software packages in Pipfile.lock because we tell us that the code we are downloading has not changed since the first we saw this particular version of the package, but only a signature scheme tell us with a reasonable degree of certainty (though, not absolute) that this particular version of the code came from who it claims to have came from. If an attacker is able to hijack the github repository from a project and it's website, specially on low activity projects, nothing would prevent them from releasing a rogue version, and people downloading it and using it for some time until the rightful maintainers of said project are able to take back control of it. Signing of course is as secure as the ability of said project maintainers to keep their private keys safe. But while we know that nothing can be made 100% secure, a culture that relies on signatures is inherently more secure than relying only on hashes, no matter how cryptographically strong they may be. Hashes tell us that the code we've download we have is the same as other blob of code stored somewhere that for whatever reasons we trust. PGP tells us that there is a high probability, assuming the private keys haven't been compromised, and that a lot of people agrees that the public key we have came from the right person or organization, that this blob of code came from who it says it came from. Em seg, 10 de dez de 2018 ?s 13:05, Marcos Eliziario < marcos.eliziario at gmail.com> escreveu: > My two cents. > Automation tools should check the PGP signature. The public keys should be > obtained once via https from an odd number of different trustworthy sources > from a set of well know domains that use DNSSEC. Users should be advised to > check the certificate chain from those domains at the first time those keys > are downloaded and explicitly agree. This is a more secure schema than > simply relying on a checksum that you've got from the same site you've used > to download the code. > Moving from MD5 from SHA obscures this, by making people believe that this > hash should be used for anything more than checking for file corruption. > > Em seg, 10 de dez de 2018 ?s 12:45, Bernardo Sulzbach < > bernardo at bernardosulzbach.com> escreveu: > >> If the discussion gets to which SHA-2 should be used, I would like to >> point out that SHA-512 is not only twice the width of SHA-256 but also >> faster to compute (anecdotally) on most 64-bit platforms. >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > > -- > Marcos Elizi?rio Santos > mobile/whatsapp/telegram: +55(21) 9-8027-0156 > skype: marcos.eliziario at gmail.com > linked-in : https://www.linkedin.com/in/eliziario/ > > -- Marcos Elizi?rio Santos mobile/whatsapp/telegram: +55(21) 9-8027-0156 skype: marcos.eliziario at gmail.com linked-in : https://www.linkedin.com/in/eliziario/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.franke at campus.tu-berlin.de Mon Dec 10 16:47:07 2018 From: m.franke at campus.tu-berlin.de (Franke, Maximilian Julian Shawn) Date: Mon, 10 Dec 2018 21:47:07 +0000 Subject: [Python-ideas] TAPS Implementation Message-ID: <8fe47843f248469cb7c221bee1053408@EX-MBX-02.tubit.win.tu-berlin.de> Hi, I am a student worker with the Internet Networks Architecture department at TU Berlin and I am working with APIs for network protocols. We are currently looking into implementing TAPS, a novel way to offer transport layer services to the application layer. The idea is to offer an API on top of multiple different transport protocols, such as TCP and QUIC. Instead of explicitly choosing a transport protocol, the application only provides abstract requirements, e.g., reliability. The TAPS system maps these properties to transport protocols, potentially trying out multiple protocols in parallel. Furthermore, TAPS can select between multiple local interfaces and remote IP addresses. A short talk (~25 minutes) from the All systems go! conference about it is available here: https://media.ccc.de/v/ASG2018-188-the_future_of_networking_apis. TAPS is currently being standardized in the IETF (https://datatracker.ietf.org/wg/taps/about/). Here you can find the proposed architecture: https://datatracker.ietf.org/doc/draft-ietf-taps-arch/, interface: https://datatracker.ietf.org/doc/draft-ietf-taps-interface/ and an informal draft on implementation considerations: https://datatracker.ietf.org/doc/draft-ietf-taps-impl/. One if the implementations currently in the works is done by Apple in form of their Network.framework API (https://developer.apple.com/documentation/network). While this implementation is relatively advanced, it is so far only available for MacOS, iOS and it derivatives. As such, it would be favorable to have a platform agnostic and open-source implementation as well. >From what we can tell, asyncio seems to offer a lot of the ground work necessary to implement it efficiently, so here are some questions we have before beginning with the implementation: - Is something like this in the scope to become part of the standard python library or something that would be done in an external library? If it is in scope, what would the requirements for it to become part of the standard library be? - Are there currently any other active efforts to implement new network functionality into the standard library? - Are there currently any considerations to expand the standard transports offered by asyncio (TCP, UDP and SSL) by additional ones like SCTP, or more importantly QUIC? Any comments or further pointers to sources that could be helpful with this would be greatly appreciated. Best regards Max -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Mon Dec 10 17:31:36 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 11 Dec 2018 09:31:36 +1100 Subject: [Python-ideas] TAPS Implementation In-Reply-To: <8fe47843f248469cb7c221bee1053408@EX-MBX-02.tubit.win.tu-berlin.de> References: <8fe47843f248469cb7c221bee1053408@EX-MBX-02.tubit.win.tu-berlin.de> Message-ID: <20181210223135.GB13061@ando.pearwood.info> Hi Max, and welcome! On Mon, Dec 10, 2018 at 09:47:07PM +0000, Franke, Maximilian Julian Shawn wrote: [...] > We are currently looking into implementing TAPS, a novel way to offer > transport layer services to the application layer. [...] > TAPS is currently being standardized ... > Here you can find the proposed architecture ... These are factors which strongly go against TAPS being implemented in the standard library: it is novel and the usage of it is unproven, and it hasn't been standardized yet. Generally speaking, the Python standard library only provides proven, standardized protocols. A few reasons for this: - We don't have the resources of Apple, we can't support everything, so we have to choose those which are most likely to be useful; that means those with a proven track-record, not experimental or novel protocols. - We take backwards-compatibility seriously, so with a few exceptions, any API we offer would have to be stable. (There are ways around this, but we don't use them lightly.) - The Python release cycle is relatively sedate and slow, and experimental libraries usually need a much faster release cycle. This is not to absolutely rule out a std lib implementation. If the networking experts among the core developers think this is a good idea, it could happen, regardless of how novel it is. But in the meantime, I recommend that you consider writing a library and offering it on PyPI as a third-party library: https://pypi.org/ If you are still keen to push for a standard library implementation, you will probably need to write a PEP: https://www.python.org/dev/peps/ At the very least, reading over some successful PEPs will suggest what sort of arguments you should make in order to get TAPS approved. -- Steve From solipsis at pitrou.net Mon Dec 10 17:50:33 2018 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 10 Dec 2018 23:50:33 +0100 Subject: [Python-ideas] TAPS Implementation References: <8fe47843f248469cb7c221bee1053408@EX-MBX-02.tubit.win.tu-berlin.de> <20181210223135.GB13061@ando.pearwood.info> Message-ID: <20181210235033.47381eea@fsol> On Tue, 11 Dec 2018 09:31:36 +1100 Steven D'Aprano wrote: > Hi Max, and welcome! > > On Mon, Dec 10, 2018 at 09:47:07PM +0000, Franke, Maximilian Julian Shawn wrote: > [...] > > We are currently looking into implementing TAPS, a novel way to offer > > transport layer services to the application layer. > [...] > > TAPS is currently being standardized ... > > Here you can find the proposed architecture ... > > These are factors which strongly go against TAPS being implemented in > the standard library: it is novel and the usage of it is unproven, and > it hasn't been standardized yet. I agree that TAPS doesn't look proven at all (I would also add that I'm a bit skeptical it will achieve the stated goals -- but we'll see). IMO it's not a good candidate for standard library inclusion. Regards Antoine. From chris.barker at noaa.gov Mon Dec 10 20:15:36 2018 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 10 Dec 2018 17:15:36 -0800 Subject: [Python-ideas] Suggested MapView object (Re: __len__() for map()) In-Reply-To: References: <20181201011734.GN4319@ando.pearwood.info> <20181201165320.GQ4319@ando.pearwood.info> <20181201172307.GS4319@ando.pearwood.info> <20181201190803.GT4319@ando.pearwood.info> <5C033044.9080907@canterbury.ac.nz> <5C03D85F.2040702@canterbury.ac.nz> <20181202134324.GV4319@ando.pearwood.info> <5C046217.7010805@canterbury.ac.nz> Message-ID: On Mon, Dec 10, 2018 at 5:23 AM E. Madison Bray wrote: > Indeed; I believe it is very useful to have a map-like object that is > effectively an augmented list/sequence. but what IS a "map-like object" -- I'm trying to imagine what that actually means. "map" takes a function and maps it onto a interable, returning a new iterable. So a map object is an iterable -- what's under the hood being used to create it is (and should remain) opaque. Back in the day, Python was "all about sequences" -- so map() took a sequence and returned a sequence (an actual list, but that's not the point here). And that's pretty classic "map". With py3, there was a big shift toward iterables, rather than sequences as the core type to work with. There are a few other benefits, but the main one is that often sequences were made, simply so that they could be immediately iterated over, and that was a waste of resources. for i, item in enumerate(a_sequence): ... for x, y in zip(seq1, seq2): ... These two are pretty obvious, but the same approach was taken over much of python: dict.keys(), map(), range(), .... So now in Python, you need to decide, when writing code, what your API is -- does your function take a sequence? or does it take an iterable? Of course, a sequence is an iterable, but a iterable is not (necessarily) a sequence. -- so back in the day, you din't really need to make the decision. So in the case of the Sage example -- I wonder what the real problem is -- if you have an API that requires a sequence, on Py2, folks may have well been passing it the result of a map() call. -- note that they weren't passing a "map object" that is now somehow different than it used to be -- they were passing a list plain and simple. And there are all sorts of places, when converting from py2 to py3, where you will now get an iterable that isn't a proper sequence, and if the API you are using requires a sequence, you need to wrap a list() or tuple() or some such around it to make the sequence. Note that you can write your code to work under either 2 or 3, but it's really hard to write a library so that your users can run it under either 2 or 3 without any change in their code! But note: the fact that it's a map object is just one special case. I suppose one could write an API now that actually expects a map object (rather than a generic sequence or iterable) but it was literally impossible in py2 -- there was no such object. I'm still confused -- what's so wrong with: list(map(func, some_iterable)) if you need a sequence? You can, of course mike lazy-evaluated sequences (like range), and so you could make a map-like function that required a sequence as input, and would lazy evaluate that sequence. This could be useful if you weren't going to work with the entire collection, but really wanted to only index out a few items, but I'm trying to imagine a use case for that, and I haven't. And I don't think that's the use case that started this thread... -CHB > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Dec 10 20:23:20 2018 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 10 Dec 2018 17:23:20 -0800 Subject: [Python-ideas] Using sha512 instead of md5 on python.org/downloads In-Reply-To: <9156E866-0904-4804-B3DE-52F6745B6D44@mac.com> References: <775682f6-16f0-e7a7-dd17-7e3ccfb7e772@redhat.com> <20181207103930.565ce442@fsol> <20181207194724.7cb16fb5@fsol>

Message-ID: On Tue, 11 Dec 2018 at 10:38, E. Madison Bray wrote: > I don't understand why this is confusing. [...] > For something like a fixed sequence a "map" could just as easily be > defined as a pair (, ) that applies , > which I'm claiming is a pure function, to every element returned by > the . This transformation can be applied lazily on a > per-element basis whether I'm iterating over it, or performing random > access (since is known for all N). What's confusing to *me*, at least, is what's actually being suggested here. There's a lot of theoretical discussion, but I've lost track of how it's grounded in reality: 1. If we're saying that "it would be nice if there were a function that acted like map but kept references to its arguments", that's easy to do as a module on PyPI. Go for it - no-one will have any problem with that. 2. If we're saying "the builtin map needs to behave like that", then 2a. *Why*? What is so special about this situation that the builtin has to be changed? 2b. Compatibility questions need to be addressed. Is this important enough to code that "needs" it that such code is OK with being Python 3.8+ only? If not, why aren't the workarounds needed for Python 3.7 good enough? (Long term improvement and simplification of the code *is* a sufficient reason here, it's just something that should be explicit, as it means that the benefits are long-term rather than immediate). 2c. Weird corner case questions, while still being rare, *do* need to be addressed - once a certain behaviour is in the stdlib, changing it is a major pain, so we have a responsibility to get even the corner cases right. 2d. It's not actually clear to me how critical that need actually is. Nice to have, sure (you only need a couple of people who would use a feature for it to be "nice to have") but beyond that I haven't seen a huge number of people offering examples of code that would benefit (you mentioned Sage, but that example rapidly degenerated into debates about Sage's design, and while that's a very good reason for not wanting to continue using that as a use case, it does leave us with few actual use cases, and none that I'm aware of that are in production code...) 3. If we're saying something else (your comment "map could just as easily be defined as..." suggests that you might be) then I'm not clear what it is. Can you describe your proposal as pseudo-code, or a Python implementation of the "map" replacement you're proposing? Paul From erik.m.bray at gmail.com Tue Dec 11 06:48:10 2018 From: erik.m.bray at gmail.com (E. Madison Bray) Date: Tue, 11 Dec 2018 12:48:10 +0100 Subject: [Python-ideas] Suggested MapView object (Re: __len__() for map()) In-Reply-To: References: <20181201011734.GN4319@ando.pearwood.info> <20181201165320.GQ4319@ando.pearwood.info> <20181201172307.GS4319@ando.pearwood.info> <20181201190803.GT4319@ando.pearwood.info> <5C033044.9080907@canterbury.ac.nz> <5C03D85F.2040702@canterbury.ac.nz> <20181202134324.GV4319@ando.pearwood.info> <5C046217.7010805@canterbury.ac.nz>

Message-ID: On Tue, Dec 11, 2018 at 12:13 PM Paul Moore wrote: > > On Tue, 11 Dec 2018 at 10:38, E. Madison Bray wrote: > > I don't understand why this is confusing. > [...] > > For something like a fixed sequence a "map" could just as easily be > > defined as a pair (, ) that applies , > > which I'm claiming is a pure function, to every element returned by > > the . This transformation can be applied lazily on a > > per-element basis whether I'm iterating over it, or performing random > > access (since is known for all N). > > What's confusing to *me*, at least, is what's actually being suggested > here. There's a lot of theoretical discussion, but I've lost track of > how it's grounded in reality: It's true, this has been a wide-ranging discussion and it's confusing. Right now I'm specifically responding to the sub-thread that Greg started "Suggested MapView object", so I'm considering this a mostly clean slate from the previous thread "__len__() for map()". Different ideas have been tossed around and the discussion has me thinking about broader possibilities. I responded to this thread because I liked Greg's proposal and the direction he's suggesting. I think that the motivation underlying much of this discussion, forth both the OP who started the original thread, as well as myself, and others is that before Python 3 changed the implementation of map() there were certain assumptions one could make about map() called on a list* which, under normal circumstances were quite reasonable and sane (e.g. len(map(func, lst)) == len(lst), or map(func, lst)[N] == func(lst[N])). Python 3 broke all of these assumptions, for reasons that I personally have no disagreement with, in terms of motivation. However, in retrospect, it might have been nice if more consideration were given to backwards compatibility for some "obvious" simple cases. This isn't a Python 2 vs Python 3 whine though: I'm just trying to think about how I might expect map() to work on different types of arguments, and I see no problem--so long as it's properly documented--with making its behavior somewhat polymorphic on the types of arguments. The idea would be to now enhance the existing built-ins to restore at least some previously lost assumptions, at least in the relevant cases. To give an analogy, Python 3.0 replaced range() with (effectively) xrange(). This broken a lot of assumptions that the object returned by range(N) would work much like a list, and Python 3.2 restored some of that list-like functionality by adding support for slicing and negative indexing on range(N). I believe it's worth considering such enhancements for filter() and map() as well, though these are obviously a bit trickier. * or other fixed-length sequence, but let's just use list as a shorthand, and assume for the sake of simplicity a single list as well. > 1. If we're saying that "it would be nice if there were a function > that acted like map but kept references to its arguments", that's easy > to do as a module on PyPI. Go for it - no-one will have any problem > with that. Sure, though since this is about the behavior of global built-ins that are commonly used by users at all experience levels the problem is a bit hairier. Anybody can implement anything they want and put it in a third-party module. That doesn't mean anyone will use it. I still have to write code that handles map objects. In retrospect I think Guido might have had the right idea of wanting to move map() and filter() into functools along with reduce(). There's a surprisingly lot more at stake in terms of backwards compatibility and least-astonishment when it comes to built-ins. I think that's in part why the new Python 3 definitions of map() and filter() were kept so simple: although they were not backwards compatible I do think they were well designed to minimize astonishment. That's why I don't necessarily disagree with the choices made (but still would like to think about how we can make enhancements going forward). > 2. If we're saying "the builtin map needs to behave like that", then > 2a. *Why*? What is so special about this situation that the builtin > has to be changed? Same question could apply to last time it was changed. I think now we're trying to find some middle-ground. > 2b. Compatibility questions need to be addressed. Is this important > enough to code that "needs" it that such code is OK with being Python > 3.8+ only? If not, why aren't the workarounds needed for Python 3.7 > good enough? (Long term improvement and simplification of the code > *is* a sufficient reason here, it's just something that should be > explicit, as it means that the benefits are long-term rather than > immediate). That's a good point: I think the same arguments as for enhancing range() apply here, but this is worth further consideration (though having a more concrete proposal in the first place should come first). > 2c. Weird corner case questions, while still being rare, *do* need > to be addressed - once a certain behaviour is in the stdlib, changing > it is a major pain, so we have a responsibility to get even the corner > cases right. It depends on what you mean by getting them "right". It's definitely worth going over as one can think of. Not all corner cases have a satisfying resolution (and may be highly context-dependent). In those cases getting it "right" is probably no more than documenting that corner case and perhaps warning against it. > 2d. It's not actually clear to me how critical that need actually > is. Nice to have, sure (you only need a couple of people who would use > a feature for it to be "nice to have") but beyond that I haven't seen > a huge number of people offering examples of code that would benefit > (you mentioned Sage, but that example rapidly degenerated into debates > about Sage's design, and while that's a very good reason for not > wanting to continue using that as a use case, it does leave us with > few actual use cases, and none that I'm aware of that are in > production code...) That's a fair point worthy of further consideration. To me, at least, map on a list working as an augmented list is obvious, clear, useful, at solves most of the use-cases where having map.__len__ might be desirable, among others. > 3. If we're saying something else (your comment "map could just as > easily be defined as..." suggests that you might be) then I'm not > clear what it is. Can you describe your proposal as pseudo-code, or a > Python implementation of the "map" replacement you're proposing? Again, I'm mostly responding to Greg's proposal which I like. To extend it, I'm suggesting that a call to map() where all the arguments are sequences** might return something like his MapView. If even that idea is crazy or impractical though, I can accept that. But I think it's quite analogous to how map on arbitrary iterables went from immediate evaluation to lazy evaluation while iterating: in the same way map on some sequence(s) can be evaluated lazily on random access. ** I have a separate complaint that there's no great way, at the Python level, to define a class that is explicitly a "sequence" as opposed to a more general "mapping", but that's a topic for another thread... From p.f.moore at gmail.com Tue Dec 11 07:53:38 2018 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 11 Dec 2018 12:53:38 +0000 Subject: [Python-ideas] Suggested MapView object (Re: __len__() for map()) In-Reply-To: References: <20181201011734.GN4319@ando.pearwood.info> <20181201165320.GQ4319@ando.pearwood.info> <20181201172307.GS4319@ando.pearwood.info> <20181201190803.GT4319@ando.pearwood.info> <5C033044.9080907@canterbury.ac.nz> <5C03D85F.2040702@canterbury.ac.nz> <20181202134324.GV4319@ando.pearwood.info> <5C046217.7010805@canterbury.ac.nz>

Message-ID: On Tue, 11 Dec 2018 at 11:49, E. Madison Bray wrote: > The idea would be to now enhance the existing built-ins to restore at > least some previously lost assumptions, at least in the relevant > cases. To give an analogy, Python 3.0 replaced range() with > (effectively) xrange(). This broken a lot of assumptions that the > object returned by range(N) would work much like a list, and Python > 3.2 restored some of that list-like functionality by adding support > for slicing and negative indexing on range(N). I believe it's worth > considering such enhancements for filter() and map() as well, though > these are obviously a bit trickier. Thanks. That clarifies the situation for me very well. I agree with most of the comments you made, although I don't have any good answers. I think you're probably right that Guido's original idea to move map and filter to functools might have been better, forcing users to explicitly choose between a genexp and a list comprehension. On the other hand, it might have meant people used more lists than they needed to, as a result. Paul From steve at pearwood.info Tue Dec 11 09:47:30 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 12 Dec 2018 01:47:30 +1100 Subject: [Python-ideas] Suggested MapView object (Re: __len__() for map()) In-Reply-To: References: <20181201172307.GS4319@ando.pearwood.info> <20181201190803.GT4319@ando.pearwood.info> <5C033044.9080907@canterbury.ac.nz> <5C03D85F.2040702@canterbury.ac.nz> <20181202134324.GV4319@ando.pearwood.info> <5C046217.7010805@canterbury.ac.nz> Message-ID: <20181211144726.GE13061@ando.pearwood.info> On Mon, Dec 10, 2018 at 05:15:36PM -0800, Chris Barker via Python-ideas wrote: [...] > I'm still confused -- what's so wrong with: > > list(map(func, some_iterable)) > > if you need a sequence? You might need a sequence. Why do you think that has to be an *eager* sequence? I can think of two obvious problems with eager sequences: space and time. They can use too much memory, and they can take too much time to generate them up-front and too much time to reap when they become garbage. And if you have an eager sequence, and all you want is the first item, you still have to generate all of them even though they aren't needed. We can afford to be profligate with memory when the data is small, but eventually you run into cases where having two copies of the data is one copy too many. > You can, of course mike lazy-evaluated sequences (like range), and so you > could make a map-like function that required a sequence as input, and would > lazy evaluate that sequence. This could be useful if you weren't going to > work with the entire collection, Or even if you *are* going to work with the entire collection, but you don't need them all at once. I once knew a guy whose fondest dream was to try the native cuisine of every nation of the world ... but not all in one meal. This is a classic time/space tradeoff: for the cost of calling the mapping function anew each time we index the sequence, we can avoid allocating a potentially huge list and calling a potentially expensive function up front for items we're never going to use. Instead, we call it only on demand. These are the same principles that justify (x)range and dict views. Why eagerly generate a list up front, if you only need the values one at a time on demand? Why make a copy of the dict keys, if you don't need a copy? These are not rhetorical questions. This is about avoiding the need to make unnecessary copies for those times we *don't* need an eager sequence generated up front, keeping the laziness of iterators and the random-access of sequences. map(func, sequence) is a great candidate for this approach. It has to hold onto a reference to the sequence even as an iterator. The function is typically side-effect free (a pure function), and if it isn't, "consenting adults" applies. We've already been told there's at least one major Python project, Sage, where this would have been useful. There's a major functional language, Haskell, where nearly all sequence processing follows this approach. I suggest we provide a separate mapview() type that offers only the lazy sequence API, without trying to be an iterator at the same time. If you want an eager sequence, or an iterator, they're only a single function call away: list(mapview_instance) iter(mapview_instance) # or just stick to map() Rather than trying to guess whether people want to treat their map objects as sequences or iterators, we let them choose which they want and be explicit about it. Consider the history of dict.keys(), values() and items() in Python 2. Originally they returned eager lists. Did we try to retrofit view-like and iterator-like behaviour onto the existing dict.keys() method, returning a cunning object which somehow turned from a list to a view to an iterator as needed? Hell no! We introduced *six new methods* on dicts: - dict.iterkeys() - dict.viewkeys() and similar for items() and values(). Compared to that, adding a single variant on map() that expects a sequence and returns a view on the sequence seems rather timid. -- Steve From steve at pearwood.info Tue Dec 11 11:26:27 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 12 Dec 2018 03:26:27 +1100 Subject: [Python-ideas] Suggested MapView object (Re: __len__() for map()) In-Reply-To: References: <5C033044.9080907@canterbury.ac.nz> <5C03D85F.2040702@canterbury.ac.nz> <20181202134324.GV4319@ando.pearwood.info> <5C046217.7010805@canterbury.ac.nz>

Message-ID: <20181211162627.GF13061@ando.pearwood.info> On Tue, Dec 11, 2018 at 12:48:10PM +0100, E. Madison Bray wrote: > Right now I'm specifically responding to the sub-thread that Greg > started "Suggested MapView object", so I'm considering this a mostly > clean slate from the previous thread "__len__() for map()". Different > ideas have been tossed around and the discussion has me thinking about > broader possibilities. I responded to this thread because I liked > Greg's proposal and the direction he's suggesting. Greg's code can be found here: https://mail.python.org/pipermail/python-ideas/2018-December/054659.html His MapView tries to be both an iterator and a sequence at the same time, but it is neither. The iterator protocol is that iterators must: - have a __next__ method; - have an __iter__ method which returns self; and the test for an iterator is: obj is iter(obj) https://docs.python.org/3/library/stdtypes.html#iterator-types Greg's MapView object is an *iterable* with a __next__ method, which makes it neither a sequence nor a iterator, but a hybrid that will surprise people who expect it to act considently as either. This is how iterators work: py> x = iter("abcdef") # An actual iterator. py> next(x) 'a' py> next(x) 'b' py> next(iter(x)) 'c' Greg's hybrid violates that expected behaviour: py> x = MapView(str.upper, "abcdef") # An imposter. py> next(x) 'A' py> next(x) 'B' py> next(iter(x)) 'A' As an iterator, it is officially "broken", continuing to yield values even after it is exhausted: py> x = MapView(str.upper, 'a') py> next(x) 'A' py> next(x) Traceback (most recent call last): File "", line 1, in File "/home/steve/gregmapview.py", line 24, in __next__ return next(self.iterator) StopIteration py> list(x) # But wait! There's more! ['A'] py> list(x) # And even more! ['A'] This hybrid is fragile: whether operations succeed or not depend on the order that you call them: py> x = MapView(str.upper, "abcdef") py> len(x)*next(x) # Safe. But only ONCE. 'AAAAAA' py> y = MapView(str.upper, "uvwxyz") py> next(y)*len(y) # Looks safe. But isn't. Traceback (most recent call last): File "", line 1, in File "/home/steve/gregmapview.py", line 12, in __len__ raise TypeError("Mapping iterator has no len()") TypeError: Mapping iterator has no len() (For brevity, from this point on I shall trim the tracebacks and show only the final error message.) Things that work once, don't work a second time. py> len(x)*next(x) # Worked a moment ago, but now it is broken. TypeError: Mapping iterator has no len() If you pass your MapView object to another function, it can accidentally sabotage your code: py> def innocent_looking_function(obj): ... next(obj) ... py> x = MapView(str.upper, "abcdef") py> len(x) 6 py> innocent_looking_function(x) py> len(x) TypeError: Mapping iterator has no len() I presume this is just an oversight, but indexing continues to work even when len() has been broken. Greg seems to want to blame the unwitting coder who runs into these boobytraps: "But there are no surprises as long as you stick to one interface or the other. Weird things happen if you mix them up, but sane code won't be doing that." (URL as above). This MapView class offers a hybrid "sequence plus iterator, together at last!" double-headed API, and even its creator says that sane code shouldn't use that API. Unfortunately, you can't use the iterator API, because its broken as an iterator, and you can't use it as a sequence, because any function you pass it to might use it as an iterator and pull the rug out from under your feet. Greg's code is, apart from the addition of the __next__ method, almost identical to the version of mapview I came up with in my own testing. Except Greg's is even better, since I didn't bother handling the multiple-sequences case and his does. Its the __next__ method which ruins it, by trying to graft on almost- but-not-really iterator behaviour onto something which otherwise is a sequence. I don't think there's any way around that: I think that any attempt to make a single MapView object work as either a sequence with a length and indexing AND an iterator with next() and no length and no indexing is doomed to the same problems. Far from minimizing surprise, it will maximise it. Look at how many violations of the Principle Of Least Surprise Greg's MapView has: - If an object has a __len__ method, calling len() on it shouldn't raise TypeError; - If you called len() before, and it succeeded, calling it again should also succeed; - if an object has a __next__ method, it should be an iterator, and that means iter(obj) is obj; - if it isn't an iterator, you shouldn't be able to call next() on it; - if it is an iterator, once it is exhausted, it should stay exhausted; - iterating over an object (calling next() or iter() on it) shouldn't change it from a sequence to a non-sequence; - passing a sequence to another function, shouldn't result in that sequence no longer supporting len() or indexing; - if an object has a length, then it should still have a length even after iterating over it. I may have missed some. -- Steve From chris.barker at noaa.gov Tue Dec 11 12:01:27 2018 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Tue, 11 Dec 2018 09:01:27 -0800 Subject: [Python-ideas] Suggested MapView object (Re: __len__() for map()) In-Reply-To: <20181211144726.GE13061@ando.pearwood.info> References: <20181201172307.GS4319@ando.pearwood.info> <20181201190803.GT4319@ando.pearwood.info> <5C033044.9080907@canterbury.ac.nz> <5C03D85F.2040702@canterbury.ac.nz> <20181202134324.GV4319@ando.pearwood.info> <5C046217.7010805@canterbury.ac.nz> <20181211144726.GE13061@ando.pearwood.info> Message-ID: Perhaps I got confused by the early part of this discussion. My point was that there is no ?map-like? object at the Python level. (That is no Map abc). Py2?s map produced a sequence. Py3?s map produced an iterable. So any API that was expecting a sequence could accept the result of a py2 map, but not a py3 map. There is absolutely nothing special about map here. The example of range has been brought up, but I don?t think it?s analogous ? py2 range returns a list, py3 range returns an immutable sequence. Because that?s as close as we can get to a sequence while preserving the lazy evaluation that is wanted. I _think_ someone may be advocating that map() could return an iterable if it is passed a iterable, and a sequence of it is passed a sequence. Yes, it could, but that seems like a bad idea to me. But folks are proposing a ?map? that would produce a lazy-evaluated sequence. Sure ? as Paul said, put it up on pypi and see if folks find it useful. Personally, I?m still finding it hard to imagine a use case where you need the sequence features, but also lazy evaluation is important. Sure: range() has that, but it came at almost zero cost, and I?m not sure the sequence features are used much. Note: the one use-case I can think of for a lazy evaluated sequence instead of an iterable is so that I can pick a random element with random.choice(). (Try to pick a random item from. a dict), but that doesn?t apply here?pick a random item from the source sequence instead. But this is specific example of a general use case: you need to access only a subset of the mapped sequence (or access it out of order) so using the iterable version won?t work, and it may be large enough that making a new sequence is too resource intensive. Seems rare to me, and in many cases, you could do the subsetting before applying the function, so I think it?s a pretty rare use case. But go ahead and make it ? I?ve been wrong before :-) -CHB Sent from my iPhone > On Dec 11, 2018, at 6:47 AM, Steven D'Aprano wrote: > >> On Mon, Dec 10, 2018 at 05:15:36PM -0800, Chris Barker via Python-ideas wrote: >> [...] >> I'm still confused -- what's so wrong with: >> >> list(map(func, some_iterable)) >> >> if you need a sequence? > > You might need a sequence. Why do you think that has to be an *eager* > sequence? > > I can think of two obvious problems with eager sequences: space and > time. They can use too much memory, and they can take too much time to > generate them up-front and too much time to reap when they become > garbage. And if you have an eager sequence, and all you want is the > first item, you still have to generate all of them even though they > aren't needed. > > We can afford to be profligate with memory when the data is small, but > eventually you run into cases where having two copies of the data is one > copy too many. > > >> You can, of course mike lazy-evaluated sequences (like range), and so you >> could make a map-like function that required a sequence as input, and would >> lazy evaluate that sequence. This could be useful if you weren't going to >> work with the entire collection, > > Or even if you *are* going to work with the entire collection, but you > don't need them all at once. I once knew a guy whose fondest dream was > to try the native cuisine of every nation of the world ... but not all > in one meal. > > This is a classic time/space tradeoff: for the cost of calling the > mapping function anew each time we index the sequence, we can avoid > allocating a potentially huge list and calling a potentially expensive > function up front for items we're never going to use. Instead, we call > it only on demand. > > These are the same principles that justify (x)range and dict views. Why > eagerly generate a list up front, if you only need the values one at a > time on demand? Why make a copy of the dict keys, if you don't need a > copy? These are not rhetorical questions. > > This is about avoiding the need to make unnecessary copies for those > times we *don't* need an eager sequence generated up front, keeping the > laziness of iterators and the random-access of sequences. > > map(func, sequence) is a great candidate for this approach. It has to > hold onto a reference to the sequence even as an iterator. The function > is typically side-effect free (a pure function), and if it isn't, > "consenting adults" applies. We've already been told there's at least > one major Python project, Sage, where this would have been useful. > > There's a major functional language, Haskell, where nearly all sequence > processing follows this approach. > > I suggest we provide a separate mapview() type that offers only the lazy > sequence API, without trying to be an iterator at the same time. If you > want an eager sequence, or an iterator, they're only a single function > call away: > > list(mapview_instance) > iter(mapview_instance) # or just stick to map() > > Rather than trying to guess whether people want to treat their map > objects as sequences or iterators, we let them choose which they want > and be explicit about it. > > Consider the history of dict.keys(), values() and items() in Python 2. > Originally they returned eager lists. Did we try to retrofit view-like > and iterator-like behaviour onto the existing dict.keys() method, > returning a cunning object which somehow turned from a list to a view to > an iterator as needed? Hell no! We introduced *six new methods* on > dicts: > > - dict.iterkeys() > - dict.viewkeys() > > and similar for items() and values(). > > Compared to that, adding a single variant on map() that expects a > sequence and returns a view on the sequence seems rather timid. > > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From tjreedy at udel.edu Tue Dec 11 12:51:17 2018 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 11 Dec 2018 12:51:17 -0500 Subject: [Python-ideas] Suggested MapView object (Re: __len__() for map()) In-Reply-To: <5C033044.9080907@canterbury.ac.nz> References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181201011734.GN4319@ando.pearwood.info> <20181201165320.GQ4319@ando.pearwood.info> <20181201172307.GS4319@ando.pearwood.info> <20181201190803.GT4319@ando.pearwood.info> <5C033044.9080907@canterbury.ac.nz> Message-ID: On 12/1/2018 8:07 PM, Greg Ewing wrote: > Steven D'Aprano wrote: After defining a separate iterable mapview sequence class >> For backwards compatibilty reasons, we can't just make map() work like >> this, because that's a change in behaviour. > > Actually, I think it's possible to get the best of both worlds. I presume you mean the '(iterable) sequence' 'iterator' worlds. I don't think they should be mixed. A sequence is reiterable, an iterator is once through and done. > Consider this: > > from operator import itemgetter > > class MapView: > > ??? def __init__(self, func, *args): > ??????? self.func = func > ??????? self.args = args > ??????? self.iterator = None > > ??? def __len__(self): > ??????? return min(map(len, self.args)) > > ??? def __getitem__(self, i): > ??????? return self.func(*list(map(itemgetter(i), self.args))) > > ??? def __iter__(self): > ??????? return self > > ??? def __next__(self): > ??????? if not self.iterator: > ??????????? self.iterator = map(self.func, *self.args) > ??????? return next(self.iterator) The last two (unnecessarily) restrict this to being a once through iterator. I think much better would be def __iter__: return map(self.func, *self.args) -- Terry Jan Reedy From tjreedy at udel.edu Tue Dec 11 13:06:47 2018 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 11 Dec 2018 13:06:47 -0500 Subject: [Python-ideas] __len__() for map() In-Reply-To: <20181201190803.GT4319@ando.pearwood.info> References: <3e46b3e5-09e0-b53e-16f3-a1605c88df3f@thekunderts.net> <20181201011734.GN4319@ando.pearwood.info> <20181201165320.GQ4319@ando.pearwood.info> <20181201172307.GS4319@ando.pearwood.info> <20181201190803.GT4319@ando.pearwood.info> Message-ID: On 12/1/2018 2:08 PM, Steven D'Aprano wrote: > This proof of concept wrapper class could have been written any time > since Python 1.5 or earlier: > > class lazymap: > def __init__(self, function, sequence): One could now add at the top of the file from collections.abc import Sequence and here if not isinstance(sequence, Sequence): raise TypeError(f'{sequence} is not a sequence') > self.function = function > self.wrapped = sequence > def __len__(self): > return len(self.wrapped) > def __getitem__(self, item): > return self.function(self.wrapped[item]) For 3.x, I would add def __iter__: return map(self.function, self.sequence) but your point that iteration is possible even without, with the old protocol, is well made. > It is fully iterable using the sequence protocol, even in Python 3: > > py> x = lazymap(str.upper, 'aardvark') > py> list(x) > ['A', 'A', 'R', 'D', 'V', 'A', 'R', 'K'] > > > Mapped items are computed on demand, not up front. It doesn't make a > copy of the underlying sequence, it can be iterated over and over again, > it has a length and random access. And if you want an iterator, you can > just pass it to the iter() function. > > There are probably bells and whistles that can be added (a nicer repr? > any other sequence methods? a cache?) and I haven't tested it fully. > > For backwards compatibilty reasons, we can't just make map() work like > this, because that's a change in behaviour. There may be tricky corner > cases I haven't considered, but as a proof of concept I think it shows > that the basic premise is sound and worth pursuing. -- Terry Jan Reedy From tjreedy at udel.edu Tue Dec 11 13:41:32 2018 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 11 Dec 2018 13:41:32 -0500 Subject: [Python-ideas] Suggested MapView object (Re: __len__() for map()) In-Reply-To: References: <20181201011734.GN4319@ando.pearwood.info> <20181201165320.GQ4319@ando.pearwood.info> <20181201172307.GS4319@ando.pearwood.info> <20181201190803.GT4319@ando.pearwood.info> <5C033044.9080907@canterbury.ac.nz> <5C03D85F.2040702@canterbury.ac.nz> <20181202134324.GV4319@ando.pearwood.info> <5C046217.7010805@canterbury.ac.nz>

Message-ID: On 12/11/2018 6:48 AM, E. Madison Bray wrote: > The idea would be to now enhance the existing built-ins to restore at > least some previously lost assumptions, at least in the relevant > cases. To give an analogy, Python 3.0 replaced range() with > (effectively) xrange(). This broken a lot of assumptions that the > object returned by range(N) would work much like a list, A range represents an arithmetic sequence. Any usage of range that could be replaced by xrange, which is nearly all uses, made no assumption broken by xrange. The basic assumption was and is that a range/xrange could be repeatedly iterated. That this assumption was met in the first case by returning a list was somewhat of an implementation detail. In terms of mutability, a tuple would be have been better, as range objects should not be mutable. (If [2,4,6] is mutated to [2,3,7], it is no longer a range (arithmetic sequence). > and Python 3.2 restored some of that list-like functionality As I see it, xranges were unfinished as sequence objects and 3.2 finished the job. This included having the min() and max() builtins calculate the min and max efficiently, as a human would, as the first or last of the sequence, rather than uselessly iterating and comparing all the items in the sequence. A proper analogy to range would be a re-iterable mapview (or 'mapseq) like what Steven D'Aprano proposes. > ** I have a separate complaint that there's no great way, at the > Python level, to define a class that is explicitly a "sequence" as > opposed to a more general "mapping", You mean like this? >>> from collections.abc import Sequence as S >>> isinstance((), S) True >>> isinstance([], S) True >>> isinstance(range(5), S) True >>> isinstance({}, S) False >>> isinstance(set(), S) False >>> class NItems(S): def __init__(self, n, item): self.len = n self.item = item def __getitem__(self, i): # missing index check return self.item def __len__(self): >>> isinstance(NItems(2, 3), S) True -- Terry Jan Reedy From tjreedy at udel.edu Tue Dec 11 14:08:47 2018 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 11 Dec 2018 14:08:47 -0500 Subject: [Python-ideas] Suggested MapView object (Re: __len__() for map()) In-Reply-To: References: <20181201172307.GS4319@ando.pearwood.info> <20181201190803.GT4319@ando.pearwood.info> <5C033044.9080907@canterbury.ac.nz> <5C03D85F.2040702@canterbury.ac.nz> <20181202134324.GV4319@ando.pearwood.info> <5C046217.7010805@canterbury.ac.nz> <20181211144726.GE13061@ando.pearwood.info> Message-ID: On 12/11/2018 12:01 PM, Chris Barker - NOAA Federal via Python-ideas wrote: > Perhaps I got confused by the early part of this discussion. > > My point was that there is no ?map-like? object at the Python level. > (That is no Map abc). > > Py2?s map produced a sequence. Py3?s map produced an iterable. > > So any API that was expecting a sequence could accept the result of a > py2 map, but not a py3 map. There is absolutely nothing special about > map here. > > The example of range has been brought up, but I don?t think it?s > analogous ? py2 range returns a list, py3 range returns an immutable > sequence. Because that?s as close as we can get to a sequence while > preserving the lazy evaluation that is wanted. > > I _think_ someone may be advocating that map() could return an > iterable if it is passed a iterable, I believe you mean 'iterator' rather than 'iterable' here and below as a sequence is an iterable. > and a sequence of it is passed a sequence. > Yes, it could, but that seems like a bad idea to me. > > But folks are proposing a ?map? that would produce a lazy-evaluated > sequence. Sure ? as Paul said, put it up on pypi and see if folks find > it useful. > > Personally, I?m still finding it hard to imagine a use case where you > need the sequence features, but also lazy evaluation is important. > > Sure: range() has that, but it came at almost zero cost, and I?m not > sure the sequence features are used much. > > Note: the one use-case I can think of for a lazy evaluated sequence > instead of an iterable is so that I can pick a random element with > random.choice(). (Try to pick a random item from. a dict), but that > doesn?t apply here?pick a random item from the source sequence > instead. > > But this is specific example of a general use case: you need to access > only a subset of the mapped sequence (or access it out of order) so > using the iterable version won?t work, and it may be large enough that > making a new sequence is too resource intensive. > > Seems rare to me, and in many cases, you could do the subsetting > before applying the function, so I think it?s a pretty rare use case. > > But go ahead and make it ? I?ve been wrong before :-) From greg.ewing at canterbury.ac.nz Tue Dec 11 17:31:03 2018 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 12 Dec 2018 11:31:03 +1300 Subject: [Python-ideas] Suggested MapView object (Re: __len__() for map()) In-Reply-To: <20181211144726.GE13061@ando.pearwood.info> References: <20181201172307.GS4319@ando.pearwood.info> <20181201190803.GT4319@ando.pearwood.info> <5C033044.9080907@canterbury.ac.nz> <5C03D85F.2040702@canterbury.ac.nz> <20181202134324.GV4319@ando.pearwood.info> <5C046217.7010805@canterbury.ac.nz>