From ron3200 at gmail.com Wed Jul 1 06:25:06 2015 From: ron3200 at gmail.com (Ron Adam) Date: Wed, 01 Jul 2015 00:25:06 -0400 Subject: [Python-ideas] Are there asynchronous generators? In-Reply-To: References: <20150628185249.61624b82@x230> <559071BC.8090603@gmail.com> <20150629094439.4f8a8efa@x230> Message-ID: On 06/30/2015 12:08 AM, Nick Coghlan wrote: > On 30 June 2015 at 07:51, Ron Adam wrote: >> >> On 06/29/2015 07:23 AM, Nick Coghlan wrote: > Some completely untested conceptual code that may not even compile, > let alone run, but hopefully conveys what I mean better than English > does: It seems (to me) like there are more layers here than needed. I suppose since this is a higher order functionality, it may be the nature of it. > def get_awaitables(self, async_iterable) > """Gets a list of awaitables from an asynchronous iterator""" > asynciter = async_iterable.__aiter__() > awaitables = [] > while True: > try: > awaitables.append(asynciter.__anext__()) > except StopAsyncIteration: > break > return awaitables > > async def wait_for_result(awaitable): > """Simple coroutine to wait for a single result""" > return await awaitable > > def iter_coroutines(async_iterable): > """Produces coroutines to wait for each result from an > asynchronous iterator""" > for awaitable in get_awaitables(async_iterable): > yield wait_for_result(awaitable) > > def iter_tasks(async_iterable, eventloop=None): > """Schedules event loop tasks to wait for each result from an > asynchronous iterator""" > if eventloop is None: > eventloop = asyncio.get_event_loop() > for coroutine in iter_coroutines(async_iterable): > yield eventloop.create_task(coroutine) > > class aiter_parallel: > """Asynchronous iterator to wait for several asynchronous > operations in parallel""" > def __init__(self, async_iterable): > # Concurrent evaluation of future results is launched immediately > self._tasks = tasks = list(iter_tasks(async_iterable)) > self._taskiter = iter(tasks) > def __aiter__(self): > return self > def __anext__(self): > try: > return next(self._taskiter) > except StopIteration: > raise StopAsyncIteration > > # Example reduction function > async def sum_async(async_iterable, start=0): > tally = start > async for x in aiter_parallel(async_iterable): > tally += x > return x > > # Parallel sum from synchronous code: > result = asyncio.get_event_loop().run_until_complete(sum_async(async_iterable)) > > # Parallel sum from asynchronous code: > result = await sum_async(async_iterable)) > > As the definition of "aiter_parallel" shows, we don't offer any nice > syntactic sugar for defining asynchronous iterators yet (hence the > question that started this thread). Hopefully the above helps > illustrate the complexity hidden behind such a deceptively simple > question :) While browsing the asyncio module, I decided to take a look at the multiprocessing module... from multiprocessing import Pool def async_map(fn, args): with Pool(processes=4) as pool: yield from pool.starmap(fn, args) def add(a, b): return a + b values = [(1, 2), (3, 4), (5, 6), (7, 8), (9, 10)] print(sum(async_map(add, values))) #---> 55 That's really very nice. Are there advantages to asyncio over the multiprocessing module? Cheers, Ron From ncoghlan at gmail.com Wed Jul 1 07:56:19 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 1 Jul 2015 15:56:19 +1000 Subject: [Python-ideas] Are there asynchronous generators? In-Reply-To: References: <20150628185249.61624b82@x230> <559071BC.8090603@gmail.com> <20150629094439.4f8a8efa@x230> Message-ID: On 1 July 2015 at 14:25, Ron Adam wrote: > > That's really very nice. Are there advantages to asyncio over the > multiprocessing module? I find it most useful to conceive of asyncio as an implementation of an "event driven programming" development paradigm. This means that after starting out with imperative procedural programming in Python, you can branch out into other more advanced complexity management models like object-oriented programming (class statements, dunder protocols, descriptors, metaclasses, type hinting), functional programming (comprehensions, generators, decorators, closures, functools, itertools), array oriented programming (memoryview, __matmul__, NumPy, SciPy), and event driven programming (asyncio, Twisted, Tornado). The stark difference between event driven programming and the first three alternate development models I noted is that you can readily implement the notions of "imperative shell, OO core", "imperative shell, functional core", and "imperative shell, array oriented core", where you expose a regular procedural API to other code, and implement it internally using whichever approach makes the most sense for your particular component. Even generators follow this same basic notion of having a clear "start of iteration" and "end of iteration". The concurrent execution model that most readily aligns with this "imperative shell" approach is concurrent.futures (https://docs.python.org/3/library/concurrent.futures.html) - it's designed to let you easily take particular input->output operations and dispatch them for execution in parallel in separate threads or processes. By contrast, event driven programming fundamentally changes your execution model from "I will accept inputs at the beginning of the program, and produce outputs at the end of the program" to "I will start waiting for events, responding to them as they arrive, until one of them indicates I should cease operation". "Waiting for an event" becomes a core development concept, as now indicated by the "await" keyword in PEP 492. The "async" keyword in that same PEP indicates that the marked construct may need to wait for events as part of its operation (async def, async for, async with), but exactly *where* those wait points are depends on the specific construct (await expressions in the function body for async def, protocol method invocations for async for and async with). For the design of asyncio (and similar frameworks) to make any sense at all, it's necessary to approach them with that "event driven programming" mindset - they seem entirely nonsensical when approached with an inputs -> outputs algorithmic mindset, but more logical when considered from a "receive request -> send other requests -> receive responses -> send response" perspective. For folks that primarily deal with algorithmic problems where inputs are converted to outputs, the event driven model addresses a kind of problem that *they don't have*, so it can seem entirely pointless. However, there really are significant categories of problems (such as network service development) where the event driven model is a genuinely superior design tool. Like array oriented programming (and even object-oriented and functional programming), the benefits can unfortunately be hard to explain to folks that haven't personally experienced the problems these tools address, so folks end up having to take it on faith that we're applying the "Complex is better than complicated" line from the Zen of Python when introducing new modelling techniques into the core language. Regards, Nick. P.S. It *is* possible to implement the "imperative shell, event driven core" model, but it requires a synchronous-to-asynchronous adapter like gevent, or an event-loop-per-thread model and extensive use of "run_until_complete()". It's much more complex than "just use concurrent.futures". P.P.S. Credit to Gary Bernhardt for the "imperative shell, core" phrasing for low coupling component design where the external API design is distinct from the internal architectural design -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From oscar.j.benjamin at gmail.com Wed Jul 1 14:01:50 2015 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Wed, 1 Jul 2015 13:01:50 +0100 Subject: [Python-ideas] Are there asynchronous generators? In-Reply-To: References: Message-ID: On 24 June 2015 at 10:00, Adam Barto? wrote: > > I had a generator producing pairs of values and wanted to feed all the first > members of the pairs to one consumer and all the second members to another > consumer. For example: > > def pairs(): > for i in range(4): > yield (i, i ** 2) > > biconsumer(sum, list)(pairs()) -> (6, [0, 1, 4, 9]) > > The point is I wanted the consumers to be suspended and resumed in a > coordinated manner: The first producer is invoked, it wants the first > element. The coordinator implemented by biconsumer function invokes pairs(), > gets the first pair and yields its first member to the first consumer. Then > it wants the next element, but now it's the second consumer's turn, so the > first consumer is suspended and the second consumer is invoked and fed with > the second member of the first pair. Then the second producer wants the next > element, but it's the first consumer's turn? and so on. In the end, when the > stream of pairs is exhausted, StopIteration is thrown to both consumers and > their results are combined. Unfortunately this is not possible with generators or with coroutines. Remember that the async coroutine stuff doesn't actually add any fundamental new capability to the language. It's really just a cleaner syntax for a particular way of using generators. Anything you can do with coroutines is also possible with generators (hence 3.4's asyncio does all its stuff with ordinary generators). The problem is fundamental: iterable consumers like sum, list etc drive the flow control of execution. You can suspend them by feeding in a generator but they aren't really suspended the way a generator is because the consumer remains at the base of the call stack. If you can rewrite the consumers though it is possible to rewrite them in a fairly simple way using generators so that you can push values in suspending after each push. Suppose I have a consumer function that looks like: def func(iterable): for x in iterable: return I can rewrite it as a feed-in generator like so: def gfunc(): yield lambda: while True: x = yield When I call this function I get a generator. I can call next on that generator to get a result function. I can then push values in with the send method. When I'm done pushing values in I can call the result function to get the final result. Example: >>> def gsum(): ... total = 0 ... yield lambda: total ... while True: ... x = yield ... total += x ... >>> summer = gsum() >>> result = next(summer) >>> next(summer) # Advance to next yield >>> summer.send(1) >>> summer.send(2) >>> summer.send(3) >>> result() 6 You can make a decorator to handle the awkwardness of calling the generator and next-ing it. Also you can use the decorator to provide a consumer function with the inverted consumer behaviour as an attribute: import functools def inverted_consumer(func): @functools.wraps(func) def consumer(iterable): push, result = inverted() for x in iterable: push(x) return result() def inverted(): gen = func() try: result = next(gen) next(gen) except StopIteration: raise RuntimeError return gen.send, result consumer.invert = inverted return consumer @inverted_consumer def mean(): total = 0 count = 0 yield lambda: total / count while True: x = yield total += x count += 1 print(mean([4, 5, 6])) # prints 5 push, result = mean.invert() push(4) push(5) push(6) print(result()) # Also prints 5 Having implemented your consumer functions in this way you can use them normally but you can also implement the biconsumer functionailty that you wanted (with obvious generalisation to an N-consumer function): def biconsumer(consumer1, consumer2, iterable): push1, result1 = consumer1.invert() push2, result2 = consumer2.invert() for val1, val2 in iterable: push1(val1) push2(val2) return result1(), result2() Given some of the complaints about two colours of functions in other posts in this thread perhaps asyncio could take a similar approach. There could be a decorator so that I could define an async function with: @sync_callable def func(...): ... Then in asynchronous code I could call it as x = await func() or in synchronous code it would be x = func.sync_call() Presumably the sync_call version would fire up an event-loop and run the function until complete. Perhaps it could also take other arguments and have a signature like: def sync_call_wrapper(args, kwargs, *, loop=None, timeout=None): ... I'm not sure how viable this is given that different asynchronous functions might need different event loops etc. but maybe there's some sensible way to do it. -- Oscar From ron3200 at gmail.com Thu Jul 2 03:28:20 2015 From: ron3200 at gmail.com (Ron Adam) Date: Wed, 01 Jul 2015 21:28:20 -0400 Subject: [Python-ideas] Are there asynchronous generators? In-Reply-To: References: <20150628185249.61624b82@x230> <559071BC.8090603@gmail.com> <20150629094439.4f8a8efa@x230> Message-ID: On 07/01/2015 01:56 AM, Nick Coghlan wrote: > On 1 July 2015 at 14:25, Ron Adam wrote: >> > >> >That's really very nice. Are there advantages to asyncio over the >> >multiprocessing module? > I find it most useful to conceive of asyncio as an implementation of > an "event driven programming" development paradigm. It makes since to think of events as IO. Then on top of that, have some sort of dispatch mechanism, which could be objective, imperative, or functional. A robot control program would be a good practical example to use to test these things. I think it's a good direction for python to go in. They need to process multiple sensor inputs, and control multiple external devices all at the same time. I think anything that makes that easier will be good. (It is the future.) > P.S. It *is* possible to implement the "imperative shell, event driven > core" model, but it requires a synchronous-to-asynchronous adapter > like gevent, or an event-loop-per-thread model and extensive use of > "run_until_complete()". It's much more complex than "just use > concurrent.futures". I'm not sure what a event driven core is exactly. It seems to me it would be an event driven (functional, objective, imperative) core. The closest thing I can think of that wouldn't be one of those would be a neural net. Of course it may also be a matter of how we choose to think of things. It's quite possible to have many layers of imperative, functional, and objective, on top of each other. Then we need to indicate the part of the program we are referring to as being X shell/Y core. Cheers, Ron From pierre.quentel at gmail.com Thu Jul 2 08:30:53 2015 From: pierre.quentel at gmail.com (Pierre Quentel) Date: Thu, 2 Jul 2015 08:30:53 +0200 Subject: [Python-ideas] Pass a function as the argument "step" of range() Message-ID: In languages such as Javascript, the incrementation of a for loop counter can be done by an operation, for instance : for(i=1; i= self.comp = operator.ge if self.stop>self.start else operator.le self.counter = None def __iter__(self): return self def __next__(self): if self.counter is None: self.counter = self.start else: self.counter = self.incrementor(self.counter) if self.comp(self.counter, self.stop): raise StopIteration return self.counter Iterating on the powers of 2 below N would be done by : for i in Range(1, N, lambda x:x*2) I haven't seen this discussed before, but I may not have searched enough. Any opinions ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben+python at benfinney.id.au Thu Jul 2 08:56:06 2015 From: ben+python at benfinney.id.au (Ben Finney) Date: Thu, 02 Jul 2015 16:56:06 +1000 Subject: [Python-ideas] Pass a function as the argument "step" of range() References: Message-ID: <851tgrrqt5.fsf@benfinney.id.au> Pierre Quentel writes: > To achieve the same thing in Python we currently can't use range() > because it increments by an integer (the argument "step"). An option > is to build a generator like : > > def gen(N): > i = 1 > while i<=N: > yield i > i *= 2 > > then we can iterate on gen(N). Generators can be defined in expressions, of course:: ((x * 2) for x in range(n)) So the full function definition above is misleading for this example. Your single example defines the ?step? function in-line as a lambda expression:: > Iterating on the powers of 2 below N would be done by : > > for i in Range(1, N, lambda x:x*2) So why not define the generator as an expression:: for i in ((x * 2) for x in range(n)): That seems quite clear given existing syntax. Your proposal goes further than that and requires ?range? itself to accept a function argument where it currently expects an integer. But your example demonstrates, to me, that it wouldn't improve the code. Do you have some real-world code that would be materially improved by the change you're proposing? -- \ ?I don't know anything about music. In my line you don't have | `\ to.? ?Elvis Aaron Presley (1935?1977) | _o__) | Ben Finney From njs at pobox.com Thu Jul 2 09:12:46 2015 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 2 Jul 2015 00:12:46 -0700 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: <851tgrrqt5.fsf@benfinney.id.au> References: <851tgrrqt5.fsf@benfinney.id.au> Message-ID: On Jul 1, 2015 23:56, "Ben Finney" wrote: > > Pierre Quentel > writes: > > > To achieve the same thing in Python we currently can't use range() > > because it increments by an integer (the argument "step"). An option > > is to build a generator like : > > > > def gen(N): > > i = 1 > > while i<=N: > > yield i > > i *= 2 > > > > then we can iterate on gen(N). > > Generators can be defined in expressions, of course:: > > ((x * 2) for x in range(n)) > > So the full function definition above is misleading for this example. > > Your single example defines the ?step? function in-line as a lambda > expression:: > > > Iterating on the powers of 2 below N would be done by : > > > > for i in Range(1, N, lambda x:x*2) > > So why not define the generator as an expression:: > > for i in ((x * 2) for x in range(n)): > > That seems quite clear given existing syntax. I believe the original example was actually for i in ((2 ** (x + 1) for x in range(int(log2(n)))): or similar... which is clearly making some sort of argument about clarity but I'm not sure what. This isn't going to work for range() anyway though AFAICT because range isn't an iterator, it's an iterable that offers O(1) membership tests. I could see an argument for putting something along these lines in itertools. itertools.orbit, maybe. But I've never felt an urgent need for such a thing myself. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Thu Jul 2 09:32:35 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 2 Jul 2015 00:32:35 -0700 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: References: <851tgrrqt5.fsf@benfinney.id.au> Message-ID: LOn Jul 2, 2015, at 00:12, Nathaniel Smith wrote: > > On Jul 1, 2015 23:56, "Ben Finney" wrote: > > > > Pierre Quentel > > writes: > > > > > To achieve the same thing in Python we currently can't use range() > > > because it increments by an integer (the argument "step"). An option > > > is to build a generator like : > > > > > > def gen(N): > > > i = 1 > > > while i<=N: > > > yield i > > > i *= 2 > > > > > > then we can iterate on gen(N). > > > > Generators can be defined in expressions, of course:: > > > > ((x * 2) for x in range(n)) > > > > So the full function definition above is misleading for this example. > > > > Your single example defines the ?step? function in-line as a lambda > > expression:: > > > > > Iterating on the powers of 2 below N would be done by : > > > > > > for i in Range(1, N, lambda x:x*2) > > > > So why not define the generator as an expression:: > > > > for i in ((x * 2) for x in range(n)): > > > > That seems quite clear given existing syntax. > > I believe the original example was actually > > for i in ((2 ** (x + 1) for x in range(int(log2(n)))): > > or similar... which is clearly making some sort of argument about clarity but I'm not sure what. > > This isn't going to work for range() anyway though AFAICT because range isn't an iterator, it's an iterable that offers O(1) membership tests. > > I could see an argument for putting something along these lines in itertools. itertools.orbit, maybe. But I've never felt an urgent need for such a thing myself. > You can already do this with accumulate; you just have to write lambda x, _: x*2. Of course it doesn't include the built-in bounds, but I don't think you'd want that anyway. With accumulate, you can bound on the domain by passing range instead of count for the input, bound on the range with takewhile, or generate an infinite iterator, or anything else you think might be useful. Or one more of the various combinations of things you can trivially build out of these pieces might be useful as a recipe ("irange"?) and/or in the third-party more-iterools. -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Thu Jul 2 11:57:40 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 2 Jul 2015 10:57:40 +0100 Subject: [Python-ideas] Are there asynchronous generators? In-Reply-To: References: <20150628185249.61624b82@x230> <559071BC.8090603@gmail.com> <20150629094439.4f8a8efa@x230> Message-ID: On 1 July 2015 at 06:56, Nick Coghlan wrote: > For folks that primarily deal with algorithmic problems where inputs > are converted to outputs, the event driven model addresses a kind of > problem that *they don't have*, so it can seem entirely pointless. > However, there really are significant categories of problems (such as > network service development) where the event driven model is a > genuinely superior design tool. Like array oriented programming (and > even object-oriented and functional programming), the benefits can > unfortunately be hard to explain to folks that haven't personally > experienced the problems these tools address, so folks end up having > to take it on faith that we're applying the "Complex is better than > complicated" line from the Zen of Python when introducing new > modelling techniques into the core language. Hmm, I see what you're getting at here, but my "event driven model" background is with GUI event loops, not with event driven IO. The async/await approach still gives me problems, because I can't map the details of the approach onto the problem domain I'm familiar with. What I can't quite work out is whether that's simply because asyncio is fundamentally designed around the IO problem (the module name suggests that might be the case, but a lot of the module content around tasks, etc, doesn't seem to be), and so doesn't offer any sort of mental model for understanding how GUI event loop code based on async/await would work, or if it's because the async/await design genuinely doesn't map well onto GUI event loop problems. I've been poking in the background at trying to decouple the "IO" aspects of asyncio from the "general" ones, but honestly, I'm not getting very far yet. I think what I need to do is to work out how to write a GUI event loop that drives async/await style coroutines, and see if that helps me understand the model better. But there aren't many examples of event loops around to work from (the asyncio one is pretty complex, and it's hard to know how much of that complexity is needed, and how much is because it was developed before async/await were available). So while I agree that if you don't need an event driven model, it can seem like pointless complexity, I *also* think that the pure callback approach to event driven code is what feels "obvious" to most people. It's maybe not the easiest model to code with, but it is the easiest one to think about - and mentally making the link between callbacks and async/await isn't straightforward. So even though people can understand event-driven problems, they can't, without experience, see how async/await *addresses* that problem. Paul From pierre.quentel at gmail.com Thu Jul 2 12:17:57 2015 From: pierre.quentel at gmail.com (Pierre Quentel) Date: Thu, 2 Jul 2015 12:17:57 +0200 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: References: <851tgrrqt5.fsf@benfinney.id.au> Message-ID: 2015-07-02 9:32 GMT+02:00 Andrew Barnert via Python-ideas < python-ideas at python.org>: > LOn Jul 2, 2015, at 00:12, Nathaniel Smith wrote: > > On Jul 1, 2015 23:56, "Ben Finney" wrote: > > > > Pierre Quentel > > writes: > > > > > To achieve the same thing in Python we currently can't use range() > > > because it increments by an integer (the argument "step"). An option > > > is to build a generator like : > > > > > > def gen(N): > > > i = 1 > > > while i<=N: > > > yield i > > > i *= 2 > > > > > > then we can iterate on gen(N). > > > > Generators can be defined in expressions, of course:: > > > > ((x * 2) for x in range(n)) > > > > So the full function definition above is misleading for this example. > > > > Your single example defines the ?step? function in-line as a lambda > > expression:: > > > > > Iterating on the powers of 2 below N would be done by : > > > > > > for i in Range(1, N, lambda x:x*2) > > > > So why not define the generator as an expression:: > > > > for i in ((x * 2) for x in range(n)): > > > > That seems quite clear given existing syntax. > > I believe the original example was actually > > for i in ((2 ** (x + 1) for x in range(int(log2(n)))): > > or similar... which is clearly making some sort of argument about clarity > but I'm not sure what. > > This isn't going to work for range() anyway though AFAICT because range > isn't an iterator, it's an iterable that offers O(1) membership tests. > > I could see an argument for putting something along these lines in > itertools. itertools.orbit, maybe. But I've never felt an urgent need for > such a thing myself. > > You can already do this with accumulate; you just have to write lambda x, > _: x*2. > > Of course it doesn't include the built-in bounds, but I don't think you'd > want that anyway. With accumulate, you can bound on the domain by passing > range instead of count for the input, bound on the range with takewhile, or > generate an infinite iterator, or anything else you think might be useful. > > Or one more of the various combinations of things you can trivially build > out of these pieces might be useful as a recipe ("irange"?) and/or in the > third-party more-iterools. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > I am not saying that you can't find other ways to get the same result, just that using a function (usually a lambda) is easier to code and to understand. The proposal would bring to Python one of the few options where iteration is more simple in Java or Javascript than with Python - I had the idea from this discussion on reddit : https://www.reddit.com/r/Python/comments/3bj5dh/for_loop_with_iteratively_doubling_range/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Jul 2 14:15:35 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 2 Jul 2015 22:15:35 +1000 Subject: [Python-ideas] Are there asynchronous generators? In-Reply-To: References: <20150628185249.61624b82@x230> <559071BC.8090603@gmail.com> <20150629094439.4f8a8efa@x230> Message-ID: On 2 July 2015 at 19:57, Paul Moore wrote: > So while I agree that if you don't need an event driven model, it can > seem like pointless complexity, I *also* think that the pure callback > approach to event driven code is what feels "obvious" to most people. > It's maybe not the easiest model to code with, but it is the easiest > one to think about - and mentally making the link between callbacks > and async/await isn't straightforward. So even though people can > understand event-driven problems, they can't, without experience, see > how async/await *addresses* that problem. If an operation doesn't need to wait for IO itself, then it can respond immediately using a normal callback (just as a generator is useful for implementing iterators, but would be pointless for a normal function call). async/await is more useful for multi-step processes, and for persistent monitoring of a data source in an infinite loop (e.g. listening for push notifications from a server process). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Thu Jul 2 16:02:26 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 3 Jul 2015 00:02:26 +1000 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: References: <851tgrrqt5.fsf@benfinney.id.au> Message-ID: On 2 July 2015 at 17:12, Nathaniel Smith wrote: > > This isn't going to work for range() anyway though AFAICT because range > isn't an iterator, it's an iterable that offers O(1) membership tests. Right, Python 3's range() is best thought of as a memory-efficient tuple, rather than as an iterator like Python 2's xrange(). As far as the original request goes, srisyadasti's answer to the Reddit thread highlights one reasonable answer: encapsulating the non-standard iteration logic in a reusable generator, rather than making it easier to define non-standard logic inline. That feature wasn't copied from C into Python for loops, so there's no reason to copy it from Java either. In addition to writing a custom generator, or nesting a generator expression, it's also fairly straightforward to address the OP's request by way of map() and changing the expression of the limiting factor (to be the final input value rather than the final output value): def calc_value(x): return 2 ** (x + 1) for i in map(calc_value, range(10)): ... Depending on the problem being solved, "calc_value" could hopefully be given a more self-documenting name. Given the kinds of options available, the appropriate design is likely to come down to the desired "unit of reusability". * iteration pattern reusable as a whole? Write a custom generator * derivation of iteration value from loop index reusable? Write a derivation function and use map * one-shot operation? Use an inline generator expression or a break inside the loop Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From pierre.quentel at gmail.com Thu Jul 2 17:16:01 2015 From: pierre.quentel at gmail.com (Pierre Quentel) Date: Thu, 2 Jul 2015 17:16:01 +0200 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: References: <851tgrrqt5.fsf@benfinney.id.au> Message-ID: 2015-07-02 16:02 GMT+02:00 Nick Coghlan : > On 2 July 2015 at 17:12, Nathaniel Smith wrote: > > > > This isn't going to work for range() anyway though AFAICT because range > > isn't an iterator, it's an iterable that offers O(1) membership tests. > > Right, Python 3's range() is best thought of as a memory-efficient > tuple, rather than as an iterator like Python 2's xrange(). > > As far as the original request goes, srisyadasti's answer to the > Reddit thread highlights one reasonable answer: encapsulating the > non-standard iteration logic in a reusable generator, rather than > making it easier to define non-standard logic inline. That feature > wasn't copied from C into Python for loops, so there's no reason to > copy it from Java either. > > In addition to writing a custom generator, or nesting a generator > expression, it's also fairly straightforward to address the OP's > request by way of map() and changing the expression of the limiting > factor (to be the final input value rather than the final output > value): > > def calc_value(x): > return 2 ** (x + 1) > > for i in map(calc_value, range(10)): > ... > > Again, this does not address the original problem : it produces the first 10 squares of 2, not the squares of 2 lower than a "stop" value. The logic of range(start, stop, step) is to produce the integers starting at "start", incremented by "step", until the integer is >= "stop" (or <= stop if stop Depending on the problem being solved, "calc_value" could hopefully be > given a more self-documenting name. > > Given the kinds of options available, the appropriate design is likely > to come down to the desired "unit of reusability". > > * iteration pattern reusable as a whole? Write a custom generator > * derivation of iteration value from loop index reusable? Write a > derivation function and use map > * one-shot operation? Use an inline generator expression or a break > inside the loop > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Thu Jul 2 17:23:05 2015 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 02 Jul 2015 08:23:05 -0700 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: References: <851tgrrqt5.fsf@benfinney.id.au> Message-ID: <55955759.3020502@stoneleaf.us> On 07/02/2015 08:16 AM, Pierre Quentel wrote: > Again, this does not address the original problem : it produces the first 10 squares of 2, not the squares of 2 lower than a "stop" value. > > The logic of range(start, stop, step) is to produce the integers starting at "start", incremented by "step", until the integer is >= "stop" (or <= stop if stop References: <851tgrrqt5.fsf@benfinney.id.au> <55955759.3020502@stoneleaf.us> Message-ID: 2015-07-02 17:23 GMT+02:00 Ethan Furman : > On 07/02/2015 08:16 AM, Pierre Quentel wrote: > > Again, this does not address the original problem : it produces the first >> 10 squares of 2, not the squares of 2 lower than a "stop" value. >> >> The logic of range(start, stop, step) is to produce the integers starting >> at "start", incremented by "step", until the integer is >= "stop" (or <= >> stop if stop> > > The other logic of range is to be able to say: > > some_value in range(start, stop, step) > > If step is an integer it is easy to calculate whether some_value is in the > range; if step is a function, it becomes impossible to figure out without > iterating through (possibly all) the values of range. > It's true, but testing that an integer is a range is very rare : the pattern "if X in range(Y)" is only found once in all the Python 3.4 standard library (in Lib/test/test_genexps.py), and "assert X in range(Y)" nowhere, whereas "for X in range(Y)" is everywhere. So even if __contains__ must iterate on all the items if the argument of step() is a function, I don't see it as a big problem. > -- > ~Ethan~ > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Thu Jul 2 17:58:18 2015 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 3 Jul 2015 01:58:18 +1000 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: References: <851tgrrqt5.fsf@benfinney.id.au> <55955759.3020502@stoneleaf.us> Message-ID: On Fri, Jul 3, 2015 at 1:53 AM, Pierre Quentel wrote: > It's true, but testing that an integer is a range is very rare : the pattern > "if X in range(Y)" is only found once in all the Python 3.4 standard library > (in Lib/test/test_genexps.py), and "assert X in range(Y)" nowhere, whereas > "for X in range(Y)" is everywhere. That proves that testing for membership of "range literals" (if I may call them that) is rare - which I would expect. What if the range object is created in one place, and probed in another? Harder to find, but possibly monkey-patching builtins.range to report on __contains__ and then running the Python test suite would show something up. ChrisA From ron3200 at gmail.com Thu Jul 2 18:21:54 2015 From: ron3200 at gmail.com (Ron Adam) Date: Thu, 02 Jul 2015 12:21:54 -0400 Subject: [Python-ideas] Are there asynchronous generators? In-Reply-To: References: <20150628185249.61624b82@x230> <559071BC.8090603@gmail.com> <20150629094439.4f8a8efa@x230> Message-ID: On 07/02/2015 05:57 AM, Paul Moore wrote: > So even though people can > understand event-driven problems, they can't, without experience, see > how async/await*addresses* that problem. Yes, I think there are some parts to it that are difficult to understand still. That could be a documentation thing. Consider a routine roughly organised like this: event_loop: item_loop1: action1 <-- wait for event1 item_loop2: action2 <-- wait for event2 other_things_loop: ... sleep # continue event_loop It's not clear to me how to write that with asyncio yet. But I don't think I'm alone. Cheers, Ron From breamoreboy at yahoo.co.uk Thu Jul 2 19:02:50 2015 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Thu, 02 Jul 2015 18:02:50 +0100 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: References: Message-ID: On 02/07/2015 07:30, Pierre Quentel wrote: > In languages such as Javascript, the incrementation of a for loop > counter can be done by an operation, for instance : > > for(i=1; i > would iterate on the powers of 2 lesser than N. > > To achieve the same thing in Python we currently can't use range() > because it increments by an integer (the argument "step"). An option is > to build a generator like : > > def gen(N): > i = 1 > while i<=N: > yield i > i *= 2 > > then we can iterate on gen(N). > > My proposal is that besides an integer, range() would accept a function > as the "step" argument, taking the current counter as its argument and > returning the new counter value. Here is a basic pure-Python > implementation : > > import operator > > class Range: > > def __init__(self, start, stop, incrementor): > self.start, self.stop = start, stop > self.incrementor = incrementor > # Function to compare current counter and stop value : <= or >= > self.comp = operator.ge if > self.stop>self.start else operator.le > self.counter = None > > def __iter__(self): > return self > > def __next__(self): > if self.counter is None: > self.counter = self.start > else: > self.counter = self.incrementor(self.counter) > if self.comp(self.counter, self.stop): > raise StopIteration > return self.counter > > Iterating on the powers of 2 below N would be done by : > > for i in Range(1, N, lambda x:x*2) > > I haven't seen this discussed before, but I may not have searched enough. > > Any opinions ? > -1 from me. I don't like the idea as it doesn't fit in with my concept of what range() is about. A step is fixed and that's it. Changing it so the output has variable increments is a recipe for confusion in my mind, especially for newbies. I suppose we could have uneven_range() with uneven_step but there must be millions of these implementations in existence in all sorts of applications and libraries with all sorts of names so why implement it in Python now? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From steve at pearwood.info Thu Jul 2 19:14:29 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 3 Jul 2015 03:14:29 +1000 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: References: Message-ID: <20150702171429.GK10773@ando.pearwood.info> On Thu, Jul 02, 2015 at 08:30:53AM +0200, Pierre Quentel wrote: > In languages such as Javascript, the incrementation of a for loop counter > can be done by an operation, for instance : > > for(i=1; i > would iterate on the powers of 2 lesser than N. > > To achieve the same thing in Python we currently can't use range() because > it increments by an integer (the argument "step"). An option is to build a > generator like : > > def gen(N): > i = 1 > while i<=N: > yield i > i *= 2 Given how simple generators are in Python, that's all you need for the most part. If you find yourself needing to do the above many times, with different expressions, you can write a factory: def gen(start, end, func): def inner(): i = start while i < end: yield i i = func(i) return inner() for i in gen(1, 100, lambda x: 3**x - x**3): for j in gen(-1, 5, lambda x: x**2): for k in gen(1000, 20, lambda x: -(x**3)): pass > My proposal is that besides an integer, range() would accept a function as > the "step" argument, taking the current counter as its argument and > returning the new counter value. Here is a basic pure-Python implementation Why add this functionality to range? It has little to do with range, except that range happens to sometimes be used as the sequence in for-loops. range has nice simple and clean semantics, for something as uncommon as this request, I think a user-defined generator is fine. range is not a tool for generating arbitrary sequences. It is a tool for generating sequences with equal-spaced values. Let's not complicate it. -- Steven From p.f.moore at gmail.com Thu Jul 2 21:01:17 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 2 Jul 2015 20:01:17 +0100 Subject: [Python-ideas] Are there asynchronous generators? In-Reply-To: References: <20150628185249.61624b82@x230> <559071BC.8090603@gmail.com> <20150629094439.4f8a8efa@x230> Message-ID: On 2 July 2015 at 17:21, Ron Adam wrote: > On 07/02/2015 05:57 AM, Paul Moore wrote: >> >> So even though people can >> understand event-driven problems, they can't, without experience, see >> how async/await*addresses* that problem. > > > Yes, I think there are some parts to it that are difficult to understand > still. That could be a documentation thing. > > Consider a routine roughly organised like this: > > event_loop: > item_loop1: > action1 <-- wait for event1 > item_loop2: > action2 <-- wait for event2 > other_things_loop: > ... > sleep # continue event_loop > > > It's not clear to me how to write that with asyncio yet. But I don't think > I'm alone. Precisely. Thanks for explaining it better than I did. Paul From tjreedy at udel.edu Thu Jul 2 21:53:40 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 2 Jul 2015 15:53:40 -0400 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: References: Message-ID: On 7/2/2015 2:30 AM, Pierre Quentel wrote: > In languages such as Javascript, the incrementation of a for loop > counter can be done by an operation, for instance : > > for(i=1; i > would iterate on the powers of 2 lesser than N. > > To achieve the same thing in Python we currently can't use range() > because it increments by an integer (the argument "step"). An option is > to build a generator like : > > def gen(N): > i = 1 > while i<=N: > yield i > i *= 2 > > then we can iterate on gen(N). > > My proposal is that besides an integer, range() would accept a function > as the "step" argument, taking the current counter as its argument and > returning the new counter value. Here is a basic pure-Python > implementation : > > import operator > > class Range: > > def __init__(self, start, stop, incrementor): > self.start, self.stop = start, stop > self.incrementor = incrementor > # Function to compare current counter and stop value : <= or >= > self.comp = operator.ge if > self.stop>self.start else operator.le > self.counter = None > > def __iter__(self): > return self > > def __next__(self): > if self.counter is None: > self.counter = self.start > else: > self.counter = self.incrementor(self.counter) > if self.comp(self.counter, self.stop): > raise StopIteration > return self.counter The idea of iterating by non-constant steps is valid. Others have given multiple options for doing so. The idea of stuffing this into range is not valid. It does not fit into what 3.x range actually is. The Range class above is a one-use iterator. This post and your counter-responses seem to miss what others have alluded to. 3.x range class is something quite different -- a reusable collections.abc.Sequence class, with a separate range_iterator class. The range class has the following sequence methods with efficient O(1) implementations: __contains__, __getitem__, __iter__, __len__, __reversed__, count, and index methods. Having such efficient methods is part of the design goal for 3.x range. Your proposal would breaks all of these except __iter__ (and make that slower too) in the sense that the replacements would require the O(n) calculation of list(self), whereas avoiding this is part of the purpose of range. While some of these methods are rarely used, __reversed__ is certainly not rare. People depend on the fact that the often easy to write reversed(up-range(...)) is equivalent in output *and speed* to the often harder to write iter(down-range(...). A trivial case is counting down from n to 0 for i in reversed(range(n+1): versus for i in range(n, -1, -1): People do not always get the latter correct. Now onsider a more general case, such as r = range(11, 44000000000, 1335) r1 = reversed(r) versus the equivalent r2 = iter(range(43999999346, 10, -1335)) 43999999346 is r[-1], which uses __getitem__. Using this is much easier than figuring out the following (which __reversed__ has built in). def reversed_start(start, stop, step): rem = (stop - start) % step return stop - (rem if rem else step) -- Terry Jan Reedy From pierre.quentel at gmail.com Thu Jul 2 22:20:16 2015 From: pierre.quentel at gmail.com (Pierre Quentel) Date: Thu, 2 Jul 2015 22:20:16 +0200 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: References: Message-ID: @Steven, Mark The definition of range() in Python docs says : Python 2.7 : "This is a versatile function to create lists containing arithmetic progressions. It is most often used in for loops." Python 3.4 : "The range type represents an immutable sequence of numbers and is commonly used for looping a specific number of times in for loops." Both stress that range is most often used in a for loop (it doesn't "happens to sometimes be used" in for loops, and is rarely used for membership testing). Python 2.7 limited its definition to arithmetic progressions, but Python 3.4 has a more general definition (an immutable sequence of numbers). I really don't think that the proposal would change the general idea behind range : a suite of integers, where each item is built from the previous following a specific pattern, and stopping when a "stop" value is reached. @Terry If the argument "step" is an integer, all the algorithms used in the mentioned methods would remain the same, so performance would not be affected for existing code. If the argument is a function, you are right, the object returned can't support some of these methods, or with an excessive performance penalty ; it would support __iter__ and not much more. I agree that this is a blocking issue : as far as I know all Python built-in functions return objects of a given type, regardless of its arguments. Thank you all for your time. Pierre 2015-07-02 21:53 GMT+02:00 Terry Reedy : > On 7/2/2015 2:30 AM, Pierre Quentel wrote: > >> In languages such as Javascript, the incrementation of a for loop >> counter can be done by an operation, for instance : >> >> for(i=1; i> >> would iterate on the powers of 2 lesser than N. >> >> To achieve the same thing in Python we currently can't use range() >> because it increments by an integer (the argument "step"). An option is >> to build a generator like : >> >> def gen(N): >> i = 1 >> while i<=N: >> yield i >> i *= 2 >> >> then we can iterate on gen(N). >> >> My proposal is that besides an integer, range() would accept a function >> as the "step" argument, taking the current counter as its argument and >> returning the new counter value. Here is a basic pure-Python >> implementation : >> >> import operator >> >> class Range: >> >> def __init__(self, start, stop, incrementor): >> self.start, self.stop = start, stop >> self.incrementor = incrementor >> # Function to compare current counter and stop value : <= or >= >> self.comp = operator.ge if >> self.stop>self.start else operator.le >> self.counter = None >> >> def __iter__(self): >> return self >> >> def __next__(self): >> if self.counter is None: >> self.counter = self.start >> else: >> self.counter = self.incrementor(self.counter) >> if self.comp(self.counter, self.stop): >> raise StopIteration >> return self.counter >> > > The idea of iterating by non-constant steps is valid. Others have given > multiple options for doing so. > > The idea of stuffing this into range is not valid. It does not fit into > what 3.x range actually is. The Range class above is a one-use iterator. > This post and your counter-responses seem to miss what others have alluded > to. 3.x range class is something quite different -- a reusable > collections.abc.Sequence class, with a separate range_iterator class. > > The range class has the following sequence methods with efficient O(1) > implementations: __contains__, __getitem__, __iter__, __len__, > __reversed__, count, and index methods. Having such efficient methods is > part of the design goal for 3.x range. Your proposal would breaks all of > these except __iter__ (and make that slower too) in the sense that the > replacements would require the O(n) calculation of list(self), whereas > avoiding this is part of the purpose of range. > > While some of these methods are rarely used, __reversed__ is certainly not > rare. People depend on the fact that the often easy to write > reversed(up-range(...)) is equivalent in output *and speed* to the often > harder to write iter(down-range(...). > > A trivial case is counting down from n to 0 > for i in reversed(range(n+1): > versus > for i in range(n, -1, -1): > People do not always get the latter correct. > > Now onsider a more general case, such as > r = range(11, 44000000000, 1335) > r1 = reversed(r) > versus the equivalent > r2 = iter(range(43999999346, 10, -1335)) > > 43999999346 is r[-1], which uses __getitem__. > Using this is much easier than figuring out the following (which > __reversed__ has built in). > > def reversed_start(start, stop, step): > rem = (stop - start) % step > return stop - (rem if rem else step) > > -- > Terry Jan Reedy > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Thu Jul 2 22:25:09 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Thu, 2 Jul 2015 13:25:09 -0700 (PDT) Subject: [Python-ideas] Enabling access to the AST for Python code In-Reply-To: <555E90D1.7060404@gmail.com> References: <8AEF5EC5-1DC1-435D-BC32-EF776FA2CC8A@yahoo.com> <555E90D1.7060404@gmail.com> Message-ID: Why would it require "a lot of extra memory"? A program text size is measured in megabytes, and the AST is typically more compact than the code as text. A few megabytes is nothing. Best Neil -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Fri Jul 3 00:01:07 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 2 Jul 2015 18:01:07 -0400 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: References: Message-ID: On 7/2/2015 4:20 PM, Pierre Quentel wrote: > > I agree that this is a blocking issue : as far as I know all Python > built-in functions return objects of a given type, regardless of its > arguments. That is generally true. But classes always return instances of the class when called, and range is a class, not a function. -- Terry Jan Reedy From ben+python at benfinney.id.au Fri Jul 3 01:33:49 2015 From: ben+python at benfinney.id.au (Ben Finney) Date: Fri, 03 Jul 2015 09:33:49 +1000 Subject: [Python-ideas] Pass a function as the argument "step" of range() References: <851tgrrqt5.fsf@benfinney.id.au> Message-ID: <85si96qgma.fsf@benfinney.id.au> Pierre Quentel writes: > I am not saying that you can't find other ways to get the same result, > just that using a function (usually a lambda) is easier to code and to > understand. That's not something I can accept in the abstract. Can you please find and present some existing real-world Python code that you are confident would be improved by the changes you're proposing? So far, the only example you have presented is both contrived (no harm in that, but also not compelling) and does not demonstrate your point. -- \ ?You can't have everything; where would you put it?? ?Steven | `\ Wright | _o__) | Ben Finney From rymg19 at gmail.com Fri Jul 3 02:36:09 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Thu, 02 Jul 2015 19:36:09 -0500 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: References: Message-ID: I like the idea, but this would be nicer in a language with implicit currying so that your example could be: for i in Range(1, N, (*2)): ... Or in Haskell: range' v a b f | a == b = [] | otherwise = f a:range v a+v b f range a b | a > b = range' -1 a b | a < b = range' 1 a b | a == b = [] Since Python doesn't have this, the generator forms others showed would likely end up more concise. On July 2, 2015 1:30:53 AM CDT, Pierre Quentel wrote: >In languages such as Javascript, the incrementation of a for loop >counter >can be done by an operation, for instance : > >for(i=1; i >would iterate on the powers of 2 lesser than N. > >To achieve the same thing in Python we currently can't use range() >because >it increments by an integer (the argument "step"). An option is to >build a >generator like : > >def gen(N): > i = 1 > while i<=N: > yield i > i *= 2 > >then we can iterate on gen(N). > >My proposal is that besides an integer, range() would accept a function >as >the "step" argument, taking the current counter as its argument and >returning the new counter value. Here is a basic pure-Python >implementation >: > >import operator > >class Range: > > def __init__(self, start, stop, incrementor): > self.start, self.stop = start, stop > self.incrementor = incrementor > # Function to compare current counter and stop value : <= or >= > self.comp = operator.ge if self.stop>self.start else operator.le > self.counter = None > > def __iter__(self): > return self > > def __next__(self): > if self.counter is None: > self.counter = self.start > else: > self.counter = self.incrementor(self.counter) > if self.comp(self.counter, self.stop): > raise StopIteration > return self.counter > >Iterating on the powers of 2 below N would be done by : > >for i in Range(1, N, lambda x:x*2) > >I haven't seen this discussed before, but I may not have searched >enough. > >Any opinions ? > > >------------------------------------------------------------------------ > >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >https://mail.python.org/mailman/listinfo/python-ideas >Code of Conduct: http://python.org/psf/codeofconduct/ -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Fri Jul 3 02:55:10 2015 From: wes.turner at gmail.com (Wes Turner) Date: Thu, 2 Jul 2015 19:55:10 -0500 Subject: [Python-ideas] Enabling access to the AST for Python code In-Reply-To: <042AA2E2-6FC8-480A-8C2E-A42AE941C5BA@yahoo.com> References: <8AEF5EC5-1DC1-435D-BC32-EF776FA2CC8A@yahoo.com> <555E8FAD.1060100@canterbury.ac.nz> <042AA2E2-6FC8-480A-8C2E-A42AE941C5BA@yahoo.com> Message-ID: On Thu, May 21, 2015 at 9:22 PM, Andrew Barnert via Python-ideas < python-ideas at python.org> wrote: > > > On May 21, 2015, at 19:08, Greg wrote: > > > >> On 22/05/2015 1:51 p.m., Andrew Barnert via Python-ideas wrote: > >> Or just use MacroPy, which > >> wraps up all the hard stuff (especially 2.x compatibility) and > >> provides a huge framework of useful tools. What do you want to do > >> that can't be done that way? > > > > You might not want to drag in a huge framework just to > > do one thing. > > But "all kinds of LINQ-style things, like ORMs" isn't just one thing. If > you're going to build a huge framework, why not build it on top of another > framework that does the hard part of the work for you? > * MacroPy looks interesting * PonyORM -> SQLAlchemy http://dask.pydata.org/en/latest/array-blaze.html#why-use-blaze """These different projects (Blaze -> dask.array -> NumPy -> Numba) act as > different stages in a compiler. They start at abstract syntax trees, move > to task DAGs, then to in-core computations, finally to LLVM and beyond. For > simple problems you may only need to think about the middle of this chain > (NumPy, dask.array) but as you require more performance optimizations you > extend your interest to the outer edges (Blaze, Numba).""" ... http://continuum.io/blog/blaze """Once a graph is evaluated, Blaze attempts to gather all available type > and metadata available from the user input to inform better computation > selection and scheduling. The compiler converts expressions graph objects > into an intermediate form called ATerm, drawn from the StrategoXT project. > This intermediate form is roughly a subset of Python expressions but allows > the explicit annotation of type and metadata information directly on the > AST. The ATerm IR forms the meeting point where both Numba and Blaze can > come together to code generation and graph rewriting to produce more > efficient kernels.""" > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Fri Jul 3 05:16:54 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 2 Jul 2015 20:16:54 -0700 Subject: [Python-ideas] Are there asynchronous generators? In-Reply-To: References: <20150628185249.61624b82@x230> <559071BC.8090603@gmail.com> <20150629094439.4f8a8efa@x230> Message-ID: <76786595-AFFC-4BFA-A93E-78498FCD9481@yahoo.com> One way for a GUI developer to get the hang of the point of asyncio is to imagine this problem: I've written a synchronous, modal, menu-driven app. I want to turn that into a GUI app, maybe using a wizard-like design, with a pane for each menu. To do that, I basically have to turn my control-flow inside-out. But if, at each step, I could just put up the next pane and "await" the user's response, my code would look much the same as the original CLI code. And if I wanted to let the user fire off multiple "wizards" in parallel (an MDI interface, treating each one as a document), it would just work, because each wizard is a separate coroutine that spends most of its time blocked on the event loop. The difference between a server and an MDI app is that you usually need hundreds or thousands of connections as opposed to a handful of documents, but the control flow for each is usually more linear, so the wizard-like design is a much more obvious choice. > On Jul 2, 2015, at 02:57, Paul Moore wrote: > >> On 1 July 2015 at 06:56, Nick Coghlan wrote: >> For folks that primarily deal with algorithmic problems where inputs >> are converted to outputs, the event driven model addresses a kind of >> problem that *they don't have*, so it can seem entirely pointless. >> However, there really are significant categories of problems (such as >> network service development) where the event driven model is a >> genuinely superior design tool. Like array oriented programming (and >> even object-oriented and functional programming), the benefits can >> unfortunately be hard to explain to folks that haven't personally >> experienced the problems these tools address, so folks end up having >> to take it on faith that we're applying the "Complex is better than >> complicated" line from the Zen of Python when introducing new >> modelling techniques into the core language. > > Hmm, I see what you're getting at here, but my "event driven model" > background is with GUI event loops, not with event driven IO. The > async/await approach still gives me problems, because I can't map the > details of the approach onto the problem domain I'm familiar with. > > What I can't quite work out is whether that's simply because asyncio > is fundamentally designed around the IO problem (the module name > suggests that might be the case, but a lot of the module content > around tasks, etc, doesn't seem to be), and so doesn't offer any sort > of mental model for understanding how GUI event loop code based on > async/await would work, or if it's because the async/await design > genuinely doesn't map well onto GUI event loop problems. > > I've been poking in the background at trying to decouple the "IO" > aspects of asyncio from the "general" ones, but honestly, I'm not > getting very far yet. I think what I need to do is to work out how to > write a GUI event loop that drives async/await style coroutines, and > see if that helps me understand the model better. But there aren't > many examples of event loops around to work from (the asyncio one is > pretty complex, and it's hard to know how much of that complexity is > needed, and how much is because it was developed before async/await > were available). > > So while I agree that if you don't need an event driven model, it can > seem like pointless complexity, I *also* think that the pure callback > approach to event driven code is what feels "obvious" to most people. > It's maybe not the easiest model to code with, but it is the easiest > one to think about - and mentally making the link between callbacks > and async/await isn't straightforward. So even though people can > understand event-driven problems, they can't, without experience, see > how async/await *addresses* that problem. > > Paul > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From abarnert at yahoo.com Fri Jul 3 05:20:33 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 2 Jul 2015 20:20:33 -0700 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: References: <851tgrrqt5.fsf@benfinney.id.au> Message-ID: <7967A1DC-1692-432E-9A84-28B33C609313@yahoo.com> On Jul 2, 2015, at 03:17, Pierre Quentel wrote: > > 2015-07-02 9:32 GMT+02:00 Andrew Barnert via Python-ideas : >> LOn Jul 2, 2015, at 00:12, Nathaniel Smith >> You can already do this with accumulate; you just have to write lambda x, _: x*2. >> >> Of course it doesn't include the built-in bounds, but I don't think you'd want that anyway. With accumulate, you can bound on the domain by passing range instead of count for the input, bound on the range with takewhile, or generate an infinite iterator, or anything else you think might be useful. >> >> Or one more of the various combinations of things you can trivially build out of these pieces might be useful as a recipe ("irange"?) and/or in the third-party more-iterools. > > I am not saying that you can't find other ways to get the same result, just that using a function (usually a lambda) is easier to code and to understand. I don't understand how using a function is easier to code and understand than using a function. Or how passing it to range is any simpler than passing it to accumulate, or to a recipe function built on top of accumulate. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre.quentel at gmail.com Fri Jul 3 08:07:20 2015 From: pierre.quentel at gmail.com (Pierre Quentel) Date: Fri, 3 Jul 2015 08:07:20 +0200 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: References: Message-ID: 2015-07-03 0:01 GMT+02:00 Terry Reedy : > On 7/2/2015 4:20 PM, Pierre Quentel wrote: > >> >> I agree that this is a blocking issue : as far as I know all Python >> built-in functions return objects of a given type, regardless of its >> arguments. >> > > That is generally true. But classes always return instances of the class > when called, and range is a class, not a function. >>> range >>> class A(range):pass ... Traceback (most recent call last): File "", line 1, in TypeError: type 'range' is not an acceptable base type >>> So yes, range is a class, but a strange one > > > -- > Terry Jan Reedy > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre.quentel at gmail.com Fri Jul 3 08:28:55 2015 From: pierre.quentel at gmail.com (Pierre Quentel) Date: Fri, 3 Jul 2015 08:28:55 +0200 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: <7967A1DC-1692-432E-9A84-28B33C609313@yahoo.com> References: <851tgrrqt5.fsf@benfinney.id.au> <7967A1DC-1692-432E-9A84-28B33C609313@yahoo.com> Message-ID: 2015-07-03 5:20 GMT+02:00 Andrew Barnert : > On Jul 2, 2015, at 03:17, Pierre Quentel wrote: > > > 2015-07-02 9:32 GMT+02:00 Andrew Barnert via Python-ideas < > python-ideas at python.org>: > >> LOn Jul 2, 2015, at 00:12, Nathaniel Smith >> You can already do this with accumulate; you just have to write lambda x, >> _: x*2. >> >> Of course it doesn't include the built-in bounds, but I don't think you'd >> want that anyway. With accumulate, you can bound on the domain by passing >> range instead of count for the input, bound on the range with takewhile, or >> generate an infinite iterator, or anything else you think might be useful. >> >> Or one more of the various combinations of things you can trivially build >> out of these pieces might be useful as a recipe ("irange"?) and/or in the >> third-party more-iterools. >> >> > I am not saying that you can't find other ways to get the same result, > just that using a function (usually a lambda) is easier to code and to > understand. > > > I don't understand how using a function is easier to code and understand > than using a function. Or how passing it to range is any simpler than > passing it to accumulate, or to a recipe function built on top of > accumulate. > With the proposed addition to raise, the list of powers of 2 lower than 100 would be : list(range(1, 100, lambda x:x*2)) How do you code the same with accumulate ? I tried, but I'm stuck with "stop when the element is >= 100" -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre.quentel at gmail.com Fri Jul 3 09:30:15 2015 From: pierre.quentel at gmail.com (Pierre Quentel) Date: Fri, 3 Jul 2015 09:30:15 +0200 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: References: Message-ID: 2015-07-02 8:30 GMT+02:00 Pierre Quentel : > In languages such as Javascript, the incrementation of a for loop counter > can be done by an operation, for instance : > > for(i=1; i > would iterate on the powers of 2 lesser than N. > > To achieve the same thing in Python we currently can't use range() because > it increments by an integer (the argument "step"). An option is to build a > generator like : > > def gen(N): > i = 1 > while i<=N: > yield i > i *= 2 > > then we can iterate on gen(N). > > My proposal is that besides an integer, range() would accept a function as > the "step" argument, taking the current counter as its argument and > returning the new counter value. Here is a basic pure-Python implementation > : > > import operator > > class Range: > > def __init__(self, start, stop, incrementor): > self.start, self.stop = start, stop > self.incrementor = incrementor > # Function to compare current counter and stop value : <= or >= > self.comp = operator.ge if self.stop>self.start else operator.le > self.counter = None > > def __iter__(self): > return self > > def __next__(self): > if self.counter is None: > self.counter = self.start > else: > self.counter = self.incrementor(self.counter) > if self.comp(self.counter, self.stop): > raise StopIteration > return self.counter > > Iterating on the powers of 2 below N would be done by : > > for i in Range(1, N, lambda x:x*2) > > I haven't seen this discussed before, but I may not have searched enough. > > Any opinions ? > > With the proposed Range class, here is an implementation of the Fibonacci sequence, limited to 2000 : previous = 0 def fibo(last): global previous _next, previous = previous+last, last return _next print(list(Range(1, 2000, fibo))) -------------- next part -------------- An HTML attachment was scrubbed... URL: From __peter__ at web.de Fri Jul 3 11:54:28 2015 From: __peter__ at web.de (Peter Otten) Date: Fri, 03 Jul 2015 11:54:28 +0200 Subject: [Python-ideas] Pass a function as the argument "step" of range() References: Message-ID: Pierre Quentel wrote: > With the proposed Range class, here is an implementation of the Fibonacci > sequence, limited to 2000 : > > previous = 0 > def fibo(last): > global previous > _next, previous = previous+last, last > return _next > > print(list(Range(1, 2000, fibo))) How would you make print(list(Range(1000, 2000, fibo))) work? Without that and with the `previous` global under the rug that doesn't look range-like. From rosuav at gmail.com Fri Jul 3 11:56:03 2015 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 3 Jul 2015 19:56:03 +1000 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: References: Message-ID: On Fri, Jul 3, 2015 at 5:30 PM, Pierre Quentel wrote: > With the proposed Range class, here is an implementation of the Fibonacci > sequence, limited to 2000 : > > previous = 0 > def fibo(last): > global previous > _next, previous = previous+last, last > return _next > > print(list(Range(1, 2000, fibo))) Without the proposed Range class, here's an equivalent that doesn't use global state: def fibo(top): a, b = 0, 1 while a < 2000: yield a a, b = a + b, a print(list(fibo(2000))) I'm not seeing this as an argument for a variable-step range func/class, especially since you need to use a global - or to construct a dedicated callable whose sole purpose is to maintain one integer of state. Generators are extremely expressive and flexible. ChrisA From ncoghlan at gmail.com Fri Jul 3 12:08:26 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 3 Jul 2015 20:08:26 +1000 Subject: [Python-ideas] Enabling access to the AST for Python code In-Reply-To: References: <8AEF5EC5-1DC1-435D-BC32-EF776FA2CC8A@yahoo.com> <555E8FAD.1060100@canterbury.ac.nz> <042AA2E2-6FC8-480A-8C2E-A42AE941C5BA@yahoo.com> Message-ID: On 3 July 2015 at 10:55, Wes Turner wrote: > * MacroPy looks interesting > * PonyORM -> SQLAlchemy Wes, it would be helpful if you could provide some context and rationale for links and cryptic bullet points when you post them, rather than expecting us all to follow precisely how you believe they relate to the topic of discussion. In this particular case, I think I can *personally* guess what you meant, but I'm also already familiar with all of the projects you mentioned. For folks without that background, such brief notes are going to be much harder to interpret :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From pierre.quentel at gmail.com Fri Jul 3 12:10:22 2015 From: pierre.quentel at gmail.com (Pierre Quentel) Date: Fri, 3 Jul 2015 12:10:22 +0200 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: References: Message-ID: 2015-07-03 11:54 GMT+02:00 Peter Otten <__peter__ at web.de>: > Pierre Quentel wrote: > > > With the proposed Range class, here is an implementation of the Fibonacci > > sequence, limited to 2000 : > > > > previous = 0 > > def fibo(last): > > global previous > > _next, previous = previous+last, last > > return _next > > > > print(list(Range(1, 2000, fibo))) > > How would you make > > print(list(Range(1000, 2000, fibo))) > > work? I wouldn't, the Fibonacci sequence starts with (0, 1), not with (0, 1000) > Without that and with the `previous` global under the rug that doesn't > look range-like. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Fri Jul 3 12:16:09 2015 From: wes.turner at gmail.com (Wes Turner) Date: Fri, 3 Jul 2015 05:16:09 -0500 Subject: [Python-ideas] Enabling access to the AST for Python code In-Reply-To: References: <8AEF5EC5-1DC1-435D-BC32-EF776FA2CC8A@yahoo.com> <555E8FAD.1060100@canterbury.ac.nz> <042AA2E2-6FC8-480A-8C2E-A42AE941C5BA@yahoo.com> Message-ID: On Jul 3, 2015 5:08 AM, "Nick Coghlan" wrote: > > On 3 July 2015 at 10:55, Wes Turner wrote: > > * MacroPy looks interesting > > * PonyORM -> SQLAlchemy That would be a thread summary. > > Wes, it would be helpful if you could provide some context and > rationale for links and cryptic bullet points when you post them, > rather than expecting us all to follow precisely how you believe they > relate to the topic of discussion. > > In this particular case, I think I can *personally* guess what you > meant, but I'm also already familiar with all of the projects you > mentioned. For folks without that background, such brief notes are > going to be much harder to interpret :) I must've been inferring a different use case than comping Python AST to pandas, SQLAlchemy, Spark. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre.quentel at gmail.com Fri Jul 3 12:17:34 2015 From: pierre.quentel at gmail.com (Pierre Quentel) Date: Fri, 3 Jul 2015 12:17:34 +0200 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: References: Message-ID: 2015-07-03 11:56 GMT+02:00 Chris Angelico : > On Fri, Jul 3, 2015 at 5:30 PM, Pierre Quentel > wrote: > > With the proposed Range class, here is an implementation of the Fibonacci > > sequence, limited to 2000 : > > > > previous = 0 > > def fibo(last): > > global previous > > _next, previous = previous+last, last > > return _next > > > > print(list(Range(1, 2000, fibo))) > > Without the proposed Range class, here's an equivalent that doesn't > use global state: > > def fibo(top): > a, b = 0, 1 > while a < 2000: > yield a > a, b = a + b, a > > print(list(fibo(2000))) > > I'm not seeing this as an argument for a variable-step range > func/class, especially since you need to use a global - or to > construct a dedicated callable whose sole purpose is to maintain one > integer of state. Generators are extremely expressive and flexible. > Of course there are lots of ways to produce the Fibonacci sequence, and generators are perfect for this purpose. This was intended as an example of how to use the proposed range() with always the same logic : build a sequence of integers from a starting point, use a function to build the next item, and stop when a "stop" value is reached. > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Fri Jul 3 12:20:14 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 3 Jul 2015 20:20:14 +1000 Subject: [Python-ideas] Enabling access to the AST for Python code In-Reply-To: References: <8AEF5EC5-1DC1-435D-BC32-EF776FA2CC8A@yahoo.com> <555E90D1.7060404@gmail.com> Message-ID: On 3 July 2015 at 06:25, Neil Girdhar wrote: > Why would it require "a lot of extra memory"? A program text size is > measured in megabytes, and the AST is typically more compact than the code > as text. A few megabytes is nothing. It's more complicated than that. What happens when we multiply that "nothing" by 10,000 concurrent processes across multiple servers. Is it still nothing? How about 10,000,000? What does keeping the extra data around do to our CPU level cache efficiency? Is there a key data structure we're adding a new pointer to? What does *that* do to our performance? Where are the AST objects being kept? Do they become part of the serialised form of the affected object? If yes, what does that do to the wire protocol overhead for inter-process communication, or to the size of cached bytecode files? If no, does that mean these objects may be missing the AST data when deserialised? When we're talking about sufficiently central data structures, a few *bytes* can end up counting as "a lot". Code and function objects aren't quite *that* central (unlike, say, tuple instances), but adding things to them can still have a significant impact (hence the ability to avoid creating docstrings). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From abarnert at yahoo.com Fri Jul 3 12:17:24 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 3 Jul 2015 03:17:24 -0700 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: References: <851tgrrqt5.fsf@benfinney.id.au> <7967A1DC-1692-432E-9A84-28B33C609313@yahoo.com> Message-ID: <3EEE665E-CE9C-4A67-B494-647D5A32A3C5@yahoo.com> On Jul 2, 2015, at 23:28, Pierre Quentel wrote: > > 2015-07-03 5:20 GMT+02:00 Andrew Barnert : >>> On Jul 2, 2015, at 03:17, Pierre Quentel wrote: >>> >>> 2015-07-02 9:32 GMT+02:00 Andrew Barnert via Python-ideas : >>>> LOn Jul 2, 2015, at 00:12, Nathaniel Smith >>>> You can already do this with accumulate; you just have to write lambda x, _: x*2. >>>> >>>> Of course it doesn't include the built-in bounds, but I don't think you'd want that anyway. With accumulate, you can bound on the domain by passing range instead of count for the input, bound on the range with takewhile, or generate an infinite iterator, or anything else you think might be useful. >>>> >>>> Or one more of the various combinations of things you can trivially build out of these pieces might be useful as a recipe ("irange"?) and/or in the third-party more-iterools. >>> >>> I am not saying that you can't find other ways to get the same result, just that using a function (usually a lambda) is easier to code and to understand. >> >> I don't understand how using a function is easier to code and understand than using a function. Or how passing it to range is any simpler than passing it to accumulate, or to a recipe function built on top of accumulate. > > With the proposed addition to raise, the list of powers of 2 lower than 100 would be : > > list(range(1, 100, lambda x:x*2)) > > How do you code the same with accumulate ? I tried, but I'm stuck with "stop when the element is >= 100" A genexpr, a generator function, or a takewhile call. I already explained how you could write an "irange" function in two lines out of count, accumulate, and takewhile (along with a variety of other useful things). I also suggested that if this isn't obvious enough, it could be a handy recipe in the docs and/or a useful addition to the third-party more-itertools package. So, given that recipe, you'd write it as: list(irange(1, 100, lambda x:x*2)) There's no need to add a new itertools.orbit (with a custom C implementation), much less to change range into something that's sometimes a Sequence and sometimes not, when a two-line recipe (that's also an instructive sample) does it just as well. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Fri Jul 3 12:23:28 2015 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 3 Jul 2015 20:23:28 +1000 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: References: Message-ID: On Fri, Jul 3, 2015 at 8:10 PM, Pierre Quentel wrote: > 2015-07-03 11:54 GMT+02:00 Peter Otten <__peter__ at web.de>: >> >> Pierre Quentel wrote: >> >> > With the proposed Range class, here is an implementation of the >> > Fibonacci >> > sequence, limited to 2000 : >> > >> > previous = 0 >> > def fibo(last): >> > global previous >> > _next, previous = previous+last, last >> > return _next >> > >> > print(list(Range(1, 2000, fibo))) >> >> How would you make >> >> print(list(Range(1000, 2000, fibo))) >> >> work? > > > I wouldn't, the Fibonacci sequence starts with (0, 1), not with (0, 1000) The whole numbers start with (0, 1) too (pun intended), but you can ask for a range that starts part way into that sequence: >>> list(range(10,20)) [10, 11, 12, 13, 14, 15, 16, 17, 18, 19] You can create "chunked ranges" by simply migrating your previous second argument into your new first argument, and picking a new second argument. Or you can handle an HTTP parameter "?page=4" by multiplying 4 by your chunk size and using that as your start, adding another chunk and making that your end. While it's fairly easy to ask for Fibonacci numbers up to 2000, it's rather harder to ask for only the ones 1000 and greater. Your range *must* start at the very beginning - a very good place to start, don't get me wrong, but it's a little restrictive if that's _all_ you can do. ChrisA From abarnert at yahoo.com Fri Jul 3 12:29:35 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 3 Jul 2015 03:29:35 -0700 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: References: <851tgrrqt5.fsf@benfinney.id.au> <55955759.3020502@stoneleaf.us> Message-ID: <340F8101-A3E0-4CCE-8AA4-CB68271E91F0@yahoo.com> On Jul 2, 2015, at 08:53, Pierre Quentel wrote: > > It's true, but testing that an integer is a range is very rare : the pattern "if X in range(Y)" is only found once in all the Python 3.4 standard library Given that most of the stdlib predates Python 3.2, and modules are rarely rewritten to take advantage of new features just for the hell of it, this isn't very surprising, or very meaningful. Similarly, you'll find that most of the stdlib doesn't use yield from expressions, and many things that could be written in terms of singledispatch instead use type switching, and so on. This doesn't mean yield from or singledispatch are useless or rarely used. From pierre.quentel at gmail.com Fri Jul 3 12:23:48 2015 From: pierre.quentel at gmail.com (Pierre Quentel) Date: Fri, 3 Jul 2015 12:23:48 +0200 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: <3EEE665E-CE9C-4A67-B494-647D5A32A3C5@yahoo.com> References: <851tgrrqt5.fsf@benfinney.id.au> <7967A1DC-1692-432E-9A84-28B33C609313@yahoo.com> <3EEE665E-CE9C-4A67-B494-647D5A32A3C5@yahoo.com> Message-ID: 2015-07-03 12:17 GMT+02:00 Andrew Barnert : > On Jul 2, 2015, at 23:28, Pierre Quentel wrote: > > 2015-07-03 5:20 GMT+02:00 Andrew Barnert : > >> On Jul 2, 2015, at 03:17, Pierre Quentel >> wrote: >> >> >> 2015-07-02 9:32 GMT+02:00 Andrew Barnert via Python-ideas < >> python-ideas at python.org>: >> >>> LOn Jul 2, 2015, at 00:12, Nathaniel Smith >>> You can already do this with accumulate; you just have to write lambda >>> x, _: x*2. >>> >>> Of course it doesn't include the built-in bounds, but I don't think >>> you'd want that anyway. With accumulate, you can bound on the domain by >>> passing range instead of count for the input, bound on the range with >>> takewhile, or generate an infinite iterator, or anything else you think >>> might be useful. >>> >>> Or one more of the various combinations of things you can trivially >>> build out of these pieces might be useful as a recipe ("irange"?) and/or in >>> the third-party more-iterools. >>> >>> >> I am not saying that you can't find other ways to get the same result, >> just that using a function (usually a lambda) is easier to code and to >> understand. >> >> >> I don't understand how using a function is easier to code and understand >> than using a function. Or how passing it to range is any simpler than >> passing it to accumulate, or to a recipe function built on top of >> accumulate. >> > > With the proposed addition to raise, the list of powers of 2 lower than > 100 would be : > > list(range(1, 100, lambda x:x*2)) > > How do you code the same with accumulate ? I tried, but I'm stuck with > "stop when the element is >= 100" > > > A genexpr, a generator function, or a takewhile call. > > I already explained how you could write an "irange" function in two lines > out of count, accumulate, and takewhile (along with a variety of other > useful things). I also suggested that if this isn't obvious enough, it > could be a handy recipe in the docs and/or a useful addition to the > third-party more-itertools package. So, given that recipe, you'd write it > as: > > list(irange(1, 100, lambda x:x*2)) > > There's no need to add a new itertools.orbit (with a custom C > implementation), much less to change range into something that's sometimes > a Sequence and sometimes not, when a two-line recipe (that's also an > instructive sample) does it just as well. > Well, you still haven't given an example with accumulate, have you ? I expected that. There is a good answer in the reddit thread : from itertools import count, takewhile list(takewhile(lambda n: n <= 100, (2**n for n in count()))) My point here is that even with this simple example, it's not clear even for people who know itertools well to remember which function to use. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Fri Jul 3 12:36:34 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 3 Jul 2015 20:36:34 +1000 Subject: [Python-ideas] Are there asynchronous generators? In-Reply-To: <76786595-AFFC-4BFA-A93E-78498FCD9481@yahoo.com> References: <20150628185249.61624b82@x230> <559071BC.8090603@gmail.com> <20150629094439.4f8a8efa@x230> <76786595-AFFC-4BFA-A93E-78498FCD9481@yahoo.com> Message-ID: On 3 July 2015 at 13:16, Andrew Barnert wrote: > > The difference between a server and an MDI app is that you usually need hundreds or thousands of connections as opposed to a handful of documents, but the control flow for each is usually more linear, so the wizard-like design is a much more obvious choice. Ah, thank you - yes, the "stepping through a wizard" case is a good example, as it hits the same kind multi-step process that causes problems with network applications. Simple request-response cases are easy to handle with callbacks: "event A happens, invoke callback B, which will trigger action C". If things stop there, you're fine. Things get messier when they start looking like this: "event A happens, invoking callback B, which triggers action C after setting up callback D to wait for event E, which triggers action F after setting up callback G to wait for event H and finally trigger action I" This is where coroutines help, as that second case instead becomes: "event A happens, invoking coroutine B, which triggers action C, then waits for event E, then triggers action F, then waits for event H, then triggers the final action I" Rather than having to create a new callback to handle each new action-event pair, you can instead have a single coroutine which triggers an action, and then waits for the corresponding event, and may do this multiple times before terminating. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From abarnert at yahoo.com Fri Jul 3 12:33:31 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 3 Jul 2015 03:33:31 -0700 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: References: <851tgrrqt5.fsf@benfinney.id.au> <7967A1DC-1692-432E-9A84-28B33C609313@yahoo.com> <3EEE665E-CE9C-4A67-B494-647D5A32A3C5@yahoo.com> Message-ID: On Jul 3, 2015, at 03:23, Pierre Quentel wrote: > > Well, you still haven't given an example with accumulate, have you ? If you seriously can't figure out how to put my accumulate(count(1), lambda n, _: n*2) and the takewhile from the reddit example together, then that's a good argument for making it a recipe. But it's still not a good argument for breaking the range type. From p.f.moore at gmail.com Fri Jul 3 12:54:05 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 3 Jul 2015 11:54:05 +0100 Subject: [Python-ideas] Are there asynchronous generators? In-Reply-To: References: <20150628185249.61624b82@x230> <559071BC.8090603@gmail.com> <20150629094439.4f8a8efa@x230> <76786595-AFFC-4BFA-A93E-78498FCD9481@yahoo.com> Message-ID: On 3 July 2015 at 11:36, Nick Coghlan wrote: > On 3 July 2015 at 13:16, Andrew Barnert wrote: >> >> The difference between a server and an MDI app is that you usually need hundreds or thousands of connections as opposed to a handful of documents, but the control flow for each is usually more linear, so the wizard-like design is a much more obvious choice. > > Ah, thank you - yes, the "stepping through a wizard" case is a good > example, as it hits the same kind multi-step process that causes > problems with network applications. Yes, that is a very good example of a (non-IO) use case for the new async capabilities. Thanks, Paul From pierre.quentel at gmail.com Fri Jul 3 13:11:56 2015 From: pierre.quentel at gmail.com (Pierre Quentel) Date: Fri, 3 Jul 2015 13:11:56 +0200 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: References: Message-ID: 2015-07-03 12:23 GMT+02:00 Chris Angelico : > On Fri, Jul 3, 2015 at 8:10 PM, Pierre Quentel > wrote: > > 2015-07-03 11:54 GMT+02:00 Peter Otten <__peter__ at web.de>: > >> > >> Pierre Quentel wrote: > >> > >> > With the proposed Range class, here is an implementation of the > >> > Fibonacci > >> > sequence, limited to 2000 : > >> > > >> > previous = 0 > >> > def fibo(last): > >> > global previous > >> > _next, previous = previous+last, last > >> > return _next > >> > > >> > print(list(Range(1, 2000, fibo))) > >> > >> How would you make > >> > >> print(list(Range(1000, 2000, fibo))) > >> > >> work? > > > > > > I wouldn't, the Fibonacci sequence starts with (0, 1), not with (0, 1000) > > The whole numbers start with (0, 1) too (pun intended), but you can > ask for a range that starts part way into that sequence: > > >>> list(range(10,20)) > [10, 11, 12, 13, 14, 15, 16, 17, 18, 19] > > You can create "chunked ranges" by simply migrating your previous > second argument into your new first argument, and picking a new second > argument. Or you can handle an HTTP parameter "?page=4" by multiplying > 4 by your chunk size and using that as your start, adding another > chunk and making that your end. While it's fairly easy to ask for > Fibonacci numbers up to 2000, it's rather harder to ask for only the > ones 1000 and greater. No, that's very easy, just rewrite the function : previous = 987 def fibo(last): global previous _next, previous = previous+last, last return _next print(list(Range(1597, 10000, fibo))) and erase the browser history ;-) More seriously, you can't produce this sequence without starting by the first 2 item (0, 1), no matter how you implement it > Your range *must* start at the very beginning - > a very good place to start, don't get me wrong, but it's a little > restrictive if that's _all_ you can do. > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Fri Jul 3 13:23:49 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 3 Jul 2015 21:23:49 +1000 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: References: Message-ID: On 3 July 2015 at 06:20, Pierre Quentel wrote: > @Steven, Mark > The definition of range() in Python docs says : > > Python 2.7 : "This is a versatile function to create lists containing > arithmetic progressions. It is most often used in for loops." > > Python 3.4 : "The range type represents an immutable sequence of numbers and > is commonly used for looping a specific number of times in for loops." Pierre, I *wrote* the Python 3 range docs. I know what they say. Functionality for generating an arbitrary numeric series isn't going into range(). Now, it may be that there's value in having a way to neatly express a potentially infinite mathematical series, and further value in having a way to terminate iteration of that series based on the values it produces. The key question you need to ask yourself is whether or not you can come up with a proposal that is easier to read than writing an appropriately named custom iterator for whatever iteration problem you need to solve, or using a generator expression with itertools.takewhile and itertools.count: from itertools import takewhile, count for i in takewhile((lambda i: i < N), (2**x for x in count())): ... Outside obfuscated code contests, there aren't any prizes for expressing an idea in the fewest characters possible, but there are plenty of rewards to be found in expressing ideas in such a way that future maintainers can understand not only what the code actually does, but what it was intended to do, and that the computer can also execute at an acceptable speed. Assuming you're able to come up with such a proposal, the second question would then be whether that solution even belongs in the standard library, let alone in the builtins. What are the real world problems that the construct solves that itertools doesn't already cover? Making it easier to translate homework assignments written to explore features of other programming languages rather than features of Python doesn't count. > Both stress that range is most often used in a for loop (it doesn't "happens > to sometimes be used" in for loops, and is rarely used for membership > testing). Python 2.7 limited its definition to arithmetic progressions, but > Python 3.4 has a more general definition (an immutable sequence of numbers). > I really don't think that the proposal would change the general idea behind > range : a suite of integers, where each item is built from the previous > following a specific pattern, and stopping when a "stop" value is reached. There isn't a "general idea" behind Python 3's range type, there's a precise, formal definition. For starters, the contents are defined to meet a specific formula: =================== For a positive step, the contents of a range r are determined by the formula r[i] = start + step*i where i >= 0 and r[i] < stop. For a negative step, the contents of the range are still determined by the formula r[i] = start + step*i, but the constraints are i >= 0 and r[i] > stop. =================== If you're tempted to respond "we can change the formula to use an arbitrary element value calculation algorithm", we make some *very* specific performance and behavioural promises for range objects, like: =================== Range objects implement the collections.abc.Sequence ABC, and provide features such as containment tests, element index lookup, slicing and support for negative indices =================== Testing range objects for equality with == and != compares them as sequences. That is, two range objects are considered equal if they represent the same sequence of values. (Note that two range objects that compare equal might have different start, stop and step attributes, for example range(0) == range(2, 1, 3) or range(0, 3, 2) == range(0, 4, 2).) =================== Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From tjreedy at udel.edu Fri Jul 3 21:33:46 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 3 Jul 2015 15:33:46 -0400 Subject: [Python-ideas] Enabling access to the AST for Python code In-Reply-To: References: <8AEF5EC5-1DC1-435D-BC32-EF776FA2CC8A@yahoo.com> <555E90D1.7060404@gmail.com> Message-ID: As I remember, the proposal is or would have to be to give code objects a new attribute -- co_ast. This would require an addition to marshal to compress and uncompress asts. It would expand both on-disk .pyc files and even more, in-memory code objects. On 7/2/2015 4:25 PM, Neil Girdhar wrote: > Why would it require "a lot of extra memory"? > A program text size is measured in megabytes, > and the AST is typically more compact than the code as text. Why do you think that? Each text token becomes an node object that is a minimun 56 bytes (on my 64-bit Win7 3.5). For instance, a one-byte '+' (in all-ascii code) balloons to at least 56 bytes in the ast and compiled back down to 1 byte in the byte code. I expect the uncompressed in-memory size of asts to be several times the current size of corresponding code objects. -- Terry Jan Reedy From pierre.quentel at gmail.com Fri Jul 3 21:37:58 2015 From: pierre.quentel at gmail.com (Pierre Quentel) Date: Fri, 3 Jul 2015 21:37:58 +0200 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: References: Message-ID: 2015-07-03 13:23 GMT+02:00 Nick Coghlan : > On 3 July 2015 at 06:20, Pierre Quentel wrote: > > @Steven, Mark > > The definition of range() in Python docs says : > > > > Python 2.7 : "This is a versatile function to create lists containing > > arithmetic progressions. It is most often used in for loops." > > > > Python 3.4 : "The range type represents an immutable sequence of numbers > and > > is commonly used for looping a specific number of times in for loops." > > Pierre, I *wrote* the Python 3 range docs. I know what they say. > Functionality for generating an arbitrary numeric series isn't going > into range(). > > Now, it may be that there's value in having a way to neatly express a > potentially infinite mathematical series, and further value in having > a way to terminate iteration of that series based on the values it > produces. > > The key question you need to ask yourself is whether or not you can > come up with a proposal that is easier to read than writing an > appropriately named custom iterator for whatever iteration problem you > need to solve, or using a generator expression with > itertools.takewhile and itertools.count: > > from itertools import takewhile, count > for i in takewhile((lambda i: i < N), (2**x for x in count())): > ... > > Outside obfuscated code contests, there aren't any prizes for > expressing an idea in the fewest characters possible, but there are > plenty of rewards to be found in expressing ideas in such a way that > future maintainers can understand not only what the code actually > does, but what it was intended to do, and that the computer can also > execute at an acceptable speed. > > Assuming you're able to come up with such a proposal, the second > question would then be whether that solution even belongs in the > standard library, let alone in the builtins. What are the real world > problems that the construct solves that itertools doesn't already > cover? Making it easier to translate homework assignments written to > explore features of other programming languages rather than features > of Python doesn't count. > > > Both stress that range is most often used in a for loop (it doesn't > "happens > > to sometimes be used" in for loops, and is rarely used for membership > > testing). Python 2.7 limited its definition to arithmetic progressions, > but > > Python 3.4 has a more general definition (an immutable sequence of > numbers). > > I really don't think that the proposal would change the general idea > behind > > range : a suite of integers, where each item is built from the previous > > following a specific pattern, and stopping when a "stop" value is > reached. > > There isn't a "general idea" behind Python 3's range type, there's a > precise, formal definition. > > For starters, the contents are defined to meet a specific formula: > =================== > For a positive step, the contents of a range r are determined by the > formula r[i] = start + step*i where i >= 0 and r[i] < stop. > > For a negative step, the contents of the range are still determined by > the formula r[i] = start + step*i, but the constraints are i >= 0 and > r[i] > stop. > =================== > > If you're tempted to respond "we can change the formula to use an > arbitrary element value calculation algorithm", we make some *very* > specific performance and behavioural promises for range objects, like: > > =================== > Range objects implement the collections.abc.Sequence ABC, and provide > features such as containment tests, element index lookup, slicing and > support for negative indices > =================== > Testing range objects for equality with == and != compares them as > sequences. That is, two range objects are considered equal if they > represent the same sequence of values. (Note that two range objects > that compare equal might have different start, stop and step > attributes, for example range(0) == range(2, 1, 3) or range(0, 3, 2) > == range(0, 4, 2).) > =================== > > Regards, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > Nick, Thanks for taking the time to explain. I am conscious that my proposal is not well received (understatement), and I respect the opinion of all those who expressed concerns with it. For me Terry's objection is the most serious : with a function instead of a fixed increment, many current methods of range() can't be implemented, or with a serious performance penalty. This pretty much kills the discussion. Nevertheless I will try to answer your questions. The proposal is (was) to extend the incrementation algorithm used to produce items from range() : from an addition of a fixed step to the last item, to an arbitrary function on the last item. The best I can do is rewriting the first lines of the document of range() : ### class range(stop) class range(start, stop[, step]) The arguments start and stop to the range constructor must be integers (either built-in int or any object that implements the __index__ special method). If the start argument is omitted, it defaults to 0. The step argument can be an integer of a function. If it is omitted, it defaults to 1. If step is zero, ValueError is raised. If step is a positive integer, the contents of a range r are determined by the formula r[i] = start + step*i where i >= 0 and r[i] < stop. If step is a negative integer, the contents of the range are still determined by the formula r[i] = start + step*i, but the constraints are i >= 0 and r[i] > stop. If step is a function, the contents of the range is determined by the formulas r[0] = start, r[i] = step(r[i-1]) where i >= 1. If stop > step, the iteration stops when r[i] >= stop ; if stop < start, when r[i] <= stop. ### The advantage over specific generators or functions in itertools is the generality of the construct. For the example with the sequence of powers of 2, I find that for i in range(1, N, lambda x:x*2): ... is more readable than : from itertools import takewhile, count for i in takewhile((lambda i: i < N), (2**x for x in count())): ... It is not because it is shorter : I hate obscure one-liners as much as anyone. It is for two main reasons : - it makes it clear that we start at 1, we stop at N, and the incrementation is done by multiplying the previous item by 2. - the second form requires mastering the functions in itertools, which is not the case of all Python developers - after all, itertools is a module, its functions are not built-in. Even those who do hesitate between count() and accumulate(). Moreover, the construct applies to any incrementing function ; with itertools, you need to translate the function using the appropriate function(s) in the module. Of course this doesn't solve any problem that can't be solved any other way. But for that matter, there is nothing comprehensions can do that can't be done with loops - not that I compare my proposal to comprehensions in any way, it's just to say that this argument is not an absolute killer. Once again thank you all for your time. Pierre -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Fri Jul 3 22:33:54 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 3 Jul 2015 13:33:54 -0700 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: References: Message-ID: <868F3729-3F83-4691-8F4C-A3EC85177279@yahoo.com> On Jul 3, 2015, at 12:37, Pierre Quentel wrote: > > - the second form requires mastering the functions in itertools, which is not the case of all Python developers - after all, itertools is a module, its functions are not built-in. Even those who do hesitate between count() and accumulate(). Nobody "hesitates" between count and accumulate. They do completely different things. And I think everyone who's answered you, and everyone who's read any of the answers, understands that. It's only because you described "powers of two" analytically in text, but "multiply the last value by two" iteratively in pseudocode, that there's a question of which one to use. That won't happen in any real-life cases. Of course people who haven't "mastered" itertools and aren't used to thinking in higher-level terms might not think of accumulate and takewhile here; they might instead write something like this: def iterate(func, start): while True: yield start start = func(start) def irange(start, stop, stepfunc): for value in iterate(stepfunc, start): if value >= stop: break yield value for powerof2 in irange(1, 1000, lambda n:n*2): print(powerof2) But so what? It's a couple lines longer and maybe a tiny bit slower (at least in CPython; I wouldn't be too surprised if it's actually faster in PyPy...), but it's perfectly readable, and almost certainly efficient enough. And it's abstracted into a pair of simple, reusable functions, which you can always micro-optimize later if that turns out to be necessary. People on places like Reddit or StackOverflow like to debate about what's the absolute best implementation for any idea, but if the naive implementation that a novice would come up with on his own is good enough, those debates aren't relevant except as a fun little challenge, or a way to explore different parts of the language; the good enough code is good enough as-is. So, this just falls into the "not every 3-line function needs to be in the stdlib" category. From mistersheik at gmail.com Fri Jul 3 22:42:55 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Fri, 3 Jul 2015 16:42:55 -0400 Subject: [Python-ideas] Enabling access to the AST for Python code In-Reply-To: References: <8AEF5EC5-1DC1-435D-BC32-EF776FA2CC8A@yahoo.com> <555E90D1.7060404@gmail.com> Message-ID: On Fri, Jul 3, 2015 at 6:20 AM, Nick Coghlan wrote: > On 3 July 2015 at 06:25, Neil Girdhar wrote: > > Why would it require "a lot of extra memory"? A program text size is > > measured in megabytes, and the AST is typically more compact than the > code > > as text. A few megabytes is nothing. > > It's more complicated than that. > > What happens when we multiply that "nothing" by 10,000 concurrent > processes across multiple servers. Is it still nothing? How about > 10,000,000? > I guess we find a way to share data between the processes? > > What does keeping the extra data around do to our CPU level cache > efficiency? Is there a key data structure we're adding a new pointer > to? What does *that* do to our performance? > Why would a few megabytes of data affect your CPU level cache? If I have a Python program that generates a data structure that's a few megabytes, does it slow down the rest of the program? > > Where are the AST objects being kept? Do they become part of the > serialised form of the affected object? If yes, what does that do to > the wire protocol overhead for inter-process communication, or to the > size of cached bytecode files? If no, does that mean these objects may > be missing the AST data when deserialised? > When do you send code objects on the wire? I'm not even sure if pickle supports that yet. When we're talking about sufficiently central data structures, a few > *bytes* can end up counting as "a lot". Code and function objects > aren't quite *that* central (unlike, say, tuple instances), but adding > things to them can still have a significant impact (hence the ability > to avoid creating docstrings). > Thanks, I'm interested in learning more about this. There are a lot of messages in this discussion. Was there a final consensus about how the AST for a given code object should be calculated? Was it re-parsing the source? Was it an import hook? Something else? I want to do this with a personal project. I realize we may not get the AST by default, but it would be nice to know how I should best determine it myself. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Fri Jul 3 22:44:36 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Fri, 3 Jul 2015 16:44:36 -0400 Subject: [Python-ideas] Enabling access to the AST for Python code In-Reply-To: References: <8AEF5EC5-1DC1-435D-BC32-EF776FA2CC8A@yahoo.com> <555E90D1.7060404@gmail.com> Message-ID: On Fri, Jul 3, 2015 at 3:33 PM, Terry Reedy wrote: > As I remember, the proposal is or would have to be to give code objects a > new attribute -- co_ast. This would require an addition to marshal to > compress and uncompress asts. It would expand both on-disk .pyc files and > even more, in-memory code objects. > > On 7/2/2015 4:25 PM, Neil Girdhar wrote: > >> Why would it require "a lot of extra memory"? >> > > A program text size is measured in megabytes, > > and the AST is typically more compact than the code as text. > > Why do you think that? Each text token becomes an node object that is a > minimun 56 bytes (on my 64-bit Win7 3.5). For instance, a one-byte '+' (in > all-ascii code) balloons to at least 56 bytes in the ast and compiled back > down to 1 byte in the byte code. I expect the uncompressed in-memory size > of asts to be several times the current size of corresponding code objects. Yes, but in fairness whitespace disappears, and there are some optimizations to the AST that could be made so that nodes with a single child for example are elided. > > > -- > Terry Jan Reedy > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- > > --- You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/F4KEYEd6Cs0/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Fri Jul 3 23:24:57 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 3 Jul 2015 17:24:57 -0400 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: References: Message-ID: On 7/3/2015 7:23 AM, Nick Coghlan wrote: > On 3 July 2015 at 06:20, Pierre Quentel wrote: >> @Steven, Mark >> The definition of range() in Python docs says : >> >> Python 2.7 : "This is a versatile function to create lists containing >> arithmetic progressions. It is most often used in for loops." >> >> Python 3.4 : "The range type represents an immutable sequence of numbers and >> is commonly used for looping a specific number of times in for loops." > > Pierre, I *wrote* the Python 3 range docs. I think deleting 'arithmetic' was a mistake. Would you mind changing 'immutable sequence' to 'immutable arithmetic sequence'? Also, I think 'numbers' should be narrowed to 'integers' (or whatever is actually accepted). The idea of allowing floats has been proposed at least once, probably more, and rejected. ... > There isn't a "general idea" behind Python 3's range type, there's a > precise, formal definition. 'predictable finite increasing or decreasing arithmetic sequence of integers, efficiently implemented' Making step an arbitrary function removes all the adjectives except (maybe) 'of integers', leaving 'sequence of integers'. There are several ways to generate unrestricted or lightly restricted sequences. > For starters, the contents are defined to meet a specific formula: > =================== > For a positive step, the contents of a range r are determined by the > formula r[i] = start + step*i where i >= 0 and r[i] < stop. > > For a negative step, the contents of the range are still determined by > the formula r[i] = start + step*i, but the constraints are i >= 0 and > r[i] > stop. For a 0 step, which properly is neither + nor -, a ValueError is raised. range() looks at the value of step to decide whether to raise or return. Something must look as the sign of step to decide whether stop is a max or min, and what comparison to use. Since the sign is constant, this determination need only be done once, though I do not know where or when it is currently done. Given that a function has neither a value nor sign, and each call can have not only a different value, but a different sign. A step function is a very messy fit to an api with a stop parameter whose meaning depends on the sign of step. For many sequences, one would want an explicit max or min or both. Range could have had number-of-items parameter instead of the max-or-min stop parameter. Indeed, this would be easier in some situations, and some languages slice with start and number rather than start and stop. But range is intentionally consistent with python slicing, which uses start, a max-or-min stop, and a + or - step. -- Terry Jan Reedy From ron3200 at gmail.com Sat Jul 4 00:24:04 2015 From: ron3200 at gmail.com (Ron Adam) Date: Fri, 03 Jul 2015 18:24:04 -0400 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: References: Message-ID: On 07/02/2015 02:30 AM, Pierre Quentel wrote: > > Iterating on the powers of 2 below N would be done by : > > for i in Range(1, N, lambda x:x*2) > > I haven't seen this discussed before, but I may not have searched enough. > > Any opinions ? I'm surprised no one mentioned this!? >>> for i in map(lambda x:2**x, range(1, 10)): ... print(i) ... 2 4 8 16 32 64 128 256 512 It looks like map returns a map object which is a generator. You just need to use the power of 2 formula rather than accumulate it. That's actually more flexible solution as your range can start some place other than 1. >>> for i in map(lambda x:2**x, range(5, 10)): ... print(i) ... 32 64 128 256 512 Cheers, Ron From abarnert at yahoo.com Sat Jul 4 01:53:25 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 3 Jul 2015 16:53:25 -0700 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: References: Message-ID: <17A8EA79-711C-407C-84BB-C7AC53337FFA@yahoo.com> On Jul 3, 2015, at 15:24, Ron Adam wrote: > >> On 07/02/2015 02:30 AM, Pierre Quentel wrote: >> >> Iterating on the powers of 2 below N would be done by : >> >> for i in Range(1, N, lambda x:x*2) >> >> I haven't seen this discussed before, but I may not have searched enough. >> >> Any opinions ? > > I'm surprised no one mentioned this!? > > >>> for i in map(lambda x:2**x, range(1, 10)): Probably because this map call is equivalent to (2**x for x in range(1, 10)), which someone did mention, and is less readable. If you already have a function lying around that does what you want, passing it to map tends to be more readable than wrapping it in a function call expression with a meaningless argument name just so you can stick in a genexpr. But, by the same token, if you have an expression, and don't have a function lying around, using it in a genexpr tends to be more readable than wrapping it in a lambda expression with a meaningless parameter name just so you can pass it to map. Also, this has the same problem as many of the other proposed solutions, in that it assumes that you can transform the iterative n*2 into an analytic 2**n, and that you can work out the maximum domain value (10) in your head from the maximum range value (1000), and that both of those transformations will be obvious to the readers of the code. In this particular trivial case, that's true, but it's hard to imagine any real-life case where it would be. From ncoghlan at gmail.com Sat Jul 4 02:17:10 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 4 Jul 2015 10:17:10 +1000 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: References: Message-ID: On 4 Jul 2015 7:25 am, "Terry Reedy" wrote: > > On 7/3/2015 7:23 AM, Nick Coghlan wrote: >> >> On 3 July 2015 at 06:20, Pierre Quentel wrote: >>> >>> @Steven, Mark >>> The definition of range() in Python docs says : >>> >>> Python 2.7 : "This is a versatile function to create lists containing >>> arithmetic progressions. It is most often used in for loops." >>> >>> Python 3.4 : "The range type represents an immutable sequence of numbers and >>> is commonly used for looping a specific number of times in for loops." >> >> >> Pierre, I *wrote* the Python 3 range docs. > > > I think deleting 'arithmetic' was a mistake. Would you mind changing 'immutable sequence' to 'immutable arithmetic sequence'? Sure, clarifications aren't a problem - getting "arithmetic progression" back in the docs somewhere will be useful to folks familiar with the mathematical terminology for how range works. > Also, I think 'numbers' should be narrowed to 'integers' (or whatever is actually accepted). The idea of allowing floats has been proposed at least once, probably more, and rejected. Unfortunately, we don't have a great word for "implements __index__", as "integer" at least arguably implies specifically "int". >> There isn't a "general idea" behind Python 3's range type, there's a >> precise, formal definition. > > > 'predictable finite increasing or decreasing arithmetic sequence of integers, efficiently implemented' Ah, I like that. > range() looks at the value of step to decide whether to raise or return. Something must look as the sign of step to decide whether stop is a max or min, and what comparison to use. Since the sign is constant, this determination need only be done once, though I do not know where or when it is currently done. > > Given that a function has neither a value nor sign, and each call can have not only a different value, but a different sign. A step function is a very messy fit to an api with a stop parameter whose meaning depends on the sign of step. For many sequences, one would want an explicit max or min or both. Yeah, I started trying to think of how to do this generically, and I think it would need to be considered in terms of upper and lower bounds, rather than a single stop value. That is, there'd be 6 characterising values for a general purpose computed sequence: * domain start * domain stop * domain step * range lower bound * range upper bound * item calculation However, you fundamentally can't make len(obj) an O(1) operation in that model, since you don't know the output range without calling the function. So the general purpose form could be an iterator like: def within_bounds(iterable, lower, upper): for x in iterable: if not lower <= x < upper: break yield x Positive & negative infinity would likely suffice as defaults for the lower & upper bounds. Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sat Jul 4 02:28:59 2015 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 4 Jul 2015 10:28:59 +1000 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: References: Message-ID: On Sat, Jul 4, 2015 at 10:17 AM, Nick Coghlan wrote: >> Also, I think 'numbers' should be narrowed to 'integers' (or whatever is >> actually accepted). The idea of allowing floats has been proposed at least >> once, probably more, and rejected. > > Unfortunately, we don't have a great word for "implements __index__", as > "integer" at least arguably implies specifically "int". I'm not sure that actually matters - even if the parameters can be any objects that implement __index__, the range still represents a sequence of ints: >>> class X: ... def __index__(self): return 4 ... >>> range(X()) range(0, 4) >>> list(range(X())) [0, 1, 2, 3] >>> [type(n) for n in range(X())] [, , , ] Saying "sequence of integers" seems fine to me. ChrisA From ron3200 at gmail.com Sat Jul 4 02:56:42 2015 From: ron3200 at gmail.com (Ron Adam) Date: Fri, 03 Jul 2015 20:56:42 -0400 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: <17A8EA79-711C-407C-84BB-C7AC53337FFA@yahoo.com> References: <17A8EA79-711C-407C-84BB-C7AC53337FFA@yahoo.com> Message-ID: On 07/03/2015 07:53 PM, Andrew Barnert via Python-ideas wrote: > On Jul 3, 2015, at 15:24, Ron Adam wrote: >> > >>> >>On 07/02/2015 02:30 AM, Pierre Quentel wrote: >>> >> >>> >>Iterating on the powers of 2 below N would be done by : >>> >> >>> >>for i in Range(1, N, lambda x:x*2) >>> >> >>> >>I haven't seen this discussed before, but I may not have searched enough. >>> >> >>> >>Any opinions ? >> > >> >I'm surprised no one mentioned this!? >> > >>>>> > >>>for i in map(lambda x:2**x, range(1, 10)): > Probably because this map call is equivalent to (2**x for x in range(1, 10)), which someone did mention, and is less readable. I missed that one. But you are correct, it is equivalent. > If you already have a function lying around that does what you want, > passing it to map tends to be more readable than wrapping it in a > function call expression with a meaningless argument name just so you > can stick in a genexpr. Which was what I was thinking of. for i in map(fn, range(start, stop): ... > But, by the same token, if you have an expression, and don't have a > function lying around, using it in a genexpr tends to be more readable > than wrapping it in a lambda expression with a meaningless parameter > name just so you can pass it to map. Agree. > Also, this has the same problem as many of the other proposed solutions, > in that it assumes that you can transform the iterative n*2 into an > analytic 2**n, and that you can work out the maximum domain value (10) > in your head from the maximum range value (1000), and that both of those > transformations will be obvious to the readers of the code. In this > particular trivial case, that's true, but it's hard to imagine any > real-life case where it would be. Ah.. this is the part I missed. I would most likely rewite it as a while loop myself. Cheers, Ron From steve at pearwood.info Sat Jul 4 05:58:39 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 4 Jul 2015 13:58:39 +1000 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: References: Message-ID: <20150704035839.GQ10773@ando.pearwood.info> On Thu, Jul 02, 2015 at 10:20:16PM +0200, Pierre Quentel wrote: > Both stress that range is most often used in a for loop (it doesn't > "happens to sometimes be used" in for loops, and is rarely used for > membership testing). You have misunderstood me. I'm not saying that range necessarily has many widespread and common uses outside of for-loops, but that for-loops only sometimes use range. Most loops iterate directly over the iterable, they don't use a range object at all. You started this thread with an example from Javascript. For loops in Javascript can be extremely general: js> for(var i=1,j=0,k=2; i < 100; j=2*i, k-=1, i+=j+k){print([i,j,k])} 1,0,2 4,2,1 12,8,0 35,24,-1 Why try to force all that generality into the range function? There are really two issues here: (1) Is there a problem with Python that it cannot easily or reasonable perform certain for-loops that Javascript makes easy? (2) Is modifying range() the right way to solve that problem? I don't think you have actually demonstrated the existence of a problem yet. True, Javascript gives you a nice, compact, readable syntax for some very general loops, but Python has its own way of doing those same loops which may not be quite as compact but are probably still quite acceptable. But even if we accept that Javascript for-loops are more powerful, more readable, and more flexible, and that Python lacks an acceptable way of performing certain for-loops that Javascript makes easy, changing range does not seem to be the right way to fix that lack. -- Steve From pierre.quentel at gmail.com Sat Jul 4 08:15:24 2015 From: pierre.quentel at gmail.com (Pierre Quentel) Date: Sat, 4 Jul 2015 08:15:24 +0200 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: References: Message-ID: 2015-07-04 0:24 GMT+02:00 Ron Adam : > > > On 07/02/2015 02:30 AM, Pierre Quentel wrote: > >> >> Iterating on the powers of 2 below N would be done by : >> >> for i in Range(1, N, lambda x:x*2) >> >> I haven't seen this discussed before, but I may not have searched enough. >> >> Any opinions ? >> > > I'm surprised no one mentioned this!? > > >>> for i in map(lambda x:2**x, range(1, 10)): > ... print(i) > ... > 2 > 4 > 8 > 16 > 32 > 64 > 128 > 256 > 512 > It's not the same as Range(1, N, lambda x:x*2) : your loop is executed a fixed number of times (10 here), regardless of the values produced, but Range stops when the value is over N > > It looks like map returns a map object which is a generator. You just > need to use the power of 2 formula rather than accumulate it. That's > actually more flexible solution as your range can start some place other > than 1. > > >>> for i in map(lambda x:2**x, range(5, 10)): > ... print(i) > ... > 32 > 64 > 128 > 256 > 512 > > > Cheers, > Ron > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sat Jul 4 09:13:31 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 4 Jul 2015 00:13:31 -0700 Subject: [Python-ideas] Enabling access to the AST for Python code In-Reply-To: References: <8AEF5EC5-1DC1-435D-BC32-EF776FA2CC8A@yahoo.com> <555E90D1.7060404@gmail.com> Message-ID: On Jul 3, 2015, at 13:42, Neil Girdhar wrote: > > There are a lot of messages in this discussion. Was there a final consensus about how the AST for a given code object should be calculated? I think it depends on what exactly you're trying to do, but using an import hook means you can call compile or ast.parse once and keep it around as well as using it for the compile, so that seems like it should be a good solution for most uses. From mistersheik at gmail.com Sat Jul 4 09:46:15 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 4 Jul 2015 03:46:15 -0400 Subject: [Python-ideas] Enabling access to the AST for Python code In-Reply-To: References: <8AEF5EC5-1DC1-435D-BC32-EF776FA2CC8A@yahoo.com> <555E90D1.7060404@gmail.com> Message-ID: Thank you. Has anyone implemented this solution yet? On Sat, Jul 4, 2015 at 3:13 AM, Andrew Barnert wrote: > On Jul 3, 2015, at 13:42, Neil Girdhar wrote: > > > > There are a lot of messages in this discussion. Was there a final > consensus about how the AST for a given code object should be calculated? > > I think it depends on what exactly you're trying to do, but using an > import hook means you can call compile or ast.parse once and keep it around > as well as using it for the compile, so that seems like it should be a good > solution for most uses. -------------- next part -------------- An HTML attachment was scrubbed... URL: From toddrjen at gmail.com Sat Jul 4 10:56:52 2015 From: toddrjen at gmail.com (Todd) Date: Sat, 4 Jul 2015 10:56:52 +0200 Subject: [Python-ideas] Should iPython Notebook replace Idle In-Reply-To: References: <6a451676-8487-4482-a903-4ad21b3f2470@googlegroups.com> Message-ID: On Jul 4, 2015 5:16 AM, "Jason Swails" wrote: > > > > On Fri, Jul 3, 2015 at 10:01 PM, Sayth Renshaw wrote: >> >> In future releases of Python should ipython Notebooks replace idle as the default tool for new users to learn python? >> >> >> This would as I see it have many benefits? >> >> 1. A nicer more usual web interface for new users. >> 2. Would allow the python documentation and tutorials to be distributed as ipython notebooks which would allow new users to play and interact with the tutorials as they proceed. No download separate code retyping just edit run and play. >> 3. Would allow teachers to setup notebooks knowing that all users have the same default environment, no need for setting up virtualenvs etc. >> 4. Strengthen the learning base and for new python developers as a whole. >> >> Thoughts? > > > IPython and IDLE are different. IPython is *just* an interactive Python interpreter with a ton of tweaks and enhancements. IDLE, by contrast, is both an upscale interpreter (not *nearly* as feature-complete as IPython), but it's also an IDE. AFAICT, IPython does not do this. > > Also, look at the IPython dependencies for its core functionalities: > > - jinja2 > - sphinx > - pyzmq > - pygments > - tornado > - PyQt | PySide > > None of these are part of the Python standard library. By contrast, IDLE is built entirely with stdlib components (tkinter for the GUI). AFAIK, nothing in the stdlib depends on anything outside of it. And addition to the Python stdlib imposes some pretty serious restrictions on a library. If the IPython team agreed to release their tools with the stdlib instead of IDLE, they'd have to give up a lot of control over their project: > > - License > - Release schedule > - Development environment > > Everything gets swallowed into Python. I can't imagine this ever happening. > It is certainly true that IDLE and IPython do not cover the same use-cases, and it almost certainly true that putting the IPython notebook into the standard library is infeasible. That being said, one thing that IPython and other shells have shown is that it is possible to make a much more powerful python shell. So I don't think it is out of the realm of possibility to take a hard look at the current python shell and see where and how it can be made more useful. The IPython shell is one of many places we could look for ideas. More out-there, but it probably isn't completely impossible for python to provide some sort of native notebook-like interface, or at least some sort of interface that makes it convenient for third parties to make such notebook interfaces. -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Sat Jul 4 12:56:47 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 4 Jul 2015 11:56:47 +0100 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: <17A8EA79-711C-407C-84BB-C7AC53337FFA@yahoo.com> References: <17A8EA79-711C-407C-84BB-C7AC53337FFA@yahoo.com> Message-ID: On 4 July 2015 at 00:53, Andrew Barnert via Python-ideas wrote: > Also, this has the same problem as many of the other proposed solutions, in that it assumes that you can transform the iterative n*2 into an analytic 2**n, and that you can work out the maximum domain value (10) in your head from the maximum range value (1000), and that both of those transformations will be obvious to the readers of the code. In this particular trivial case, that's true, but it's hard to imagine any real-life case where it would be. One thing that I have kept stumbling over when I've been reading this discussion is that I keep expecting there to be a "simple" (i.e., builtin, or in a relatively obvious module) way of generating repeated applications of a single-argument function: def iterate(fn, start): while True: yield start start = fn)start) ... and yet I can't find one. Am I missing it, or does it not exist? It's not hard to write, as shown above, so I'm not claiming it "needs to be a builtin" necessarily, it just seems like a useful building block. Paul From ncoghlan at gmail.com Sat Jul 4 15:26:29 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 4 Jul 2015 23:26:29 +1000 Subject: [Python-ideas] Should iPython Notebook replace Idle In-Reply-To: References: <6a451676-8487-4482-a903-4ad21b3f2470@googlegroups.com> Message-ID: On 4 July 2015 at 18:56, Todd wrote: > That being said, one thing that IPython and other shells have shown is that > it is possible to make a much more powerful python shell. So I don't think > it is out of the realm of possibility to take a hard look at the current > python shell and see where and how it can be made more useful. The IPython > shell is one of many places we could look for ideas. Software Carpentry already recommend the IPython Notebook to research scientists and data analysts learning Python (understandably so, since IPython Notebook is built by and for research scientists and data analysts). The needs for programming education are different, and the Raspberry Pi Foundation are starting looking at the available options in that space (including asking the question of whether or not there should be "Python for Education" edition that bundles additional third party libraries that don't make sense to include in the default installation. There's certainly scope for improving IDLE itself (within the constraints of "no dependencies outside the standard library"), but part of that includes refactoring IDLE to make it easier to work on and test. idle-dev is the appropriate list to find out more about the options there. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sat Jul 4 15:48:48 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 4 Jul 2015 23:48:48 +1000 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: References: <17A8EA79-711C-407C-84BB-C7AC53337FFA@yahoo.com> Message-ID: On 4 July 2015 at 20:56, Paul Moore wrote: > On 4 July 2015 at 00:53, Andrew Barnert via Python-ideas > wrote: >> Also, this has the same problem as many of the other proposed solutions, in that it assumes that you can transform the iterative n*2 into an analytic 2**n, and that you can work out the maximum domain value (10) in your head from the maximum range value (1000), and that both of those transformations will be obvious to the readers of the code. In this particular trivial case, that's true, but it's hard to imagine any real-life case where it would be. > > One thing that I have kept stumbling over when I've been reading this > discussion is that I keep expecting there to be a "simple" (i.e., > builtin, or in a relatively obvious module) way of generating repeated > applications of a single-argument function: > > def iterate(fn, start): > while True: > yield start > start = fn)start) > > ... and yet I can't find one. Am I missing it, or does it not exist? > It's not hard to write, as shown above, so I'm not claiming it "needs > to be a builtin" necessarily, it just seems like a useful building > block. It's a particular way of using itertools.accumulate: def infinite_series(fn, start): def series_step(last_output, __): return fn(last_output) return accumulate(repeat(start), series_step) Due to the closure, that can be generalised to tracking multiple past values fairly easily (e.g. using deque), and the use of repeat() means you can adapt it to define finite iterators as well. This is covered in the accumulate docs (https://docs.python.org/3/library/itertools.html#itertools.accumulate) under the name "recurrence relation", but it may be worth extracting the infinite series example as a new recipe in the recipes section. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From bussonniermatthias at gmail.com Sat Jul 4 19:11:42 2015 From: bussonniermatthias at gmail.com (Matthias Bussonnier) Date: Sat, 4 Jul 2015 10:11:42 -0700 Subject: [Python-ideas] Should iPython Notebook replace Idle In-Reply-To: References: <6a451676-8487-4482-a903-4ad21b3f2470@googlegroups.com> Message-ID: <11A1F4D8-E5A2-465D-A152-53251C9B26EE@gmail.com> > On Jul 4, 2015, at 06:26, Nick Coghlan wrote: > > Software Carpentry already recommend the IPython Notebook to research > scientists and data analysts learning Python (understandably so, since > IPython Notebook is built by and for research scientists and data > analysts). I just want to add the the IPython team has no plan to request, or try to have IPython to be incorporated into standard library. We are happy and our users seem to be happy with it as an external package. This also allow us to do quicker releases, changes our dependencies (and our dependencies are growing), basically leave us more freedom. It would also be funny to require nodejs to build the Javascript that would need to be shipped for the notebook... And despite some people loving IPython and the Notebook, we are the first to admit that the notebook is not the best tool for everything. We are still using vi/emacs/nano/ed/$EDITOR/$IDE to program, and it is better suited for a lot of tasks. Where notebook files are hard to edit in text editor[1]. That being said we would love to get some IPython-shell feature in the interactive python shell, like numbered prompt by default, and a way to have the help syntax with `?` easier to hook into, instead of doing reg-ex transform on the input. With the more an more increasing pain of deadline on window we are also looking into pypython[2] and enable it by default in the IPython shell, there are a lot of really good improvement ideas for the plain shell that can be taken from that too. -- M [1] Yes I am aware of IPymd, but still you cannot break a class in between cells. [2] https://github.com/jonathanslenders/ptpython -------------- next part -------------- An HTML attachment was scrubbed... URL: From benhoyt at gmail.com Sat Jul 4 19:47:19 2015 From: benhoyt at gmail.com (Ben Hoyt) Date: Sat, 4 Jul 2015 13:47:19 -0400 Subject: [Python-ideas] Enabling access to the AST for Python code In-Reply-To: References: <8AEF5EC5-1DC1-435D-BC32-EF776FA2CC8A@yahoo.com> <555E90D1.7060404@gmail.com> Message-ID: On Sat, Jul 4, 2015 at 3:46 AM, Neil Girdhar wrote: > Thank you. Has anyone implemented this solution yet? > > On Sat, Jul 4, 2015 at 3:13 AM, Andrew Barnert wrote: > >> On Jul 3, 2015, at 13:42, Neil Girdhar wrote: >> > >> > There are a lot of messages in this discussion. Was there a final >> consensus about how the AST for a given code object should be calculated? >> >> I think it depends on what exactly you're trying to do, but using an >> import hook means you can call compile or ast.parse once and keep it around >> as well as using it for the compile, so that seems like it should be a good >> solution for most uses. > > Yes, the MacroPy library uses this approach (import hooks to get and modify the AST): https://github.com/lihaoyi/macropy -Bene -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Sat Jul 4 21:00:05 2015 From: wes.turner at gmail.com (Wes Turner) Date: Sat, 4 Jul 2015 14:00:05 -0500 Subject: [Python-ideas] Should iPython Notebook replace Idle In-Reply-To: <11A1F4D8-E5A2-465D-A152-53251C9B26EE@gmail.com> References: <6a451676-8487-4482-a903-4ad21b3f2470@googlegroups.com> <11A1F4D8-E5A2-465D-A152-53251C9B26EE@gmail.com> Message-ID: On Sat, Jul 4, 2015 at 12:11 PM, Matthias Bussonnier < bussonniermatthias at gmail.com> wrote: > > On Jul 4, 2015, at 06:26, Nick Coghlan wrote: > > Software Carpentry already recommend the IPython Notebook to research > scientists and data analysts learning Python (understandably so, since > IPython Notebook is built by and for research scientists and data > analysts). > > > I just want to add the the IPython team has no plan to request, or try to > have IPython > to be incorporated into standard library. We are happy and our users seem > to be happy > with it as an external package. This also allow us to do quicker releases, > changes > our dependencies (and our dependencies are growing), basically leave us > more > freedom. > Pros: * already installed * (C)Python community review Cons: * CPython release schedule * CPython dependencies do not include the IPython dependencies * CPython Makefile * https://hg.python.org/cpython/file/tip/Lib * Core Developers For the hypothetical case that IPython and dependencies all decide to migrate their projects to the CPython source tree: Docs: https://docs.python.org/devguide/stdlibchanges.html#adding-a-new-module Docs: https://docs.python.org/devguide/stdlibchanges.html#adding-to-a-pre-existing-module > > It would also be funny to require nodejs to build the Javascript that would > need to be shipped for the notebook... > And TypeScript. > > And despite some people loving IPython and the Notebook, we are the first > to admit > that the notebook is not the best tool for everything. We are still using > vi/emacs/nano/ed/$EDITOR/$IDE to program, and it is better suited for a > lot of tasks. > Where notebook files are hard to edit in text editor[1]. > > %logstart -o input_and_output_before_here_n_forward.py !cp io.log scriptname.py %ed ./scriptname.py > That being said we would love to get some IPython-shell feature in > the interactive python shell, like numbered prompt by default, and a > way to have the help syntax with `?` easier to hook into, instead of doing > reg-ex transform on the input. > %doctest_mode > > With the more an more increasing pain of deadline on window we are also > looking > into pypython[2] and enable it by default in the IPython shell, there are > a lot of really good > improvement ideas for the plain shell that can be taken from that too. > > -- > M > > [1] Yes I am aware of IPymd, but still you cannot break a class in between > cells. > [2] https://github.com/jonathanslenders/ptpython > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Sat Jul 4 21:09:12 2015 From: wes.turner at gmail.com (Wes Turner) Date: Sat, 4 Jul 2015 14:09:12 -0500 Subject: [Python-ideas] Should iPython Notebook replace Idle In-Reply-To: References: <6a451676-8487-4482-a903-4ad21b3f2470@googlegroups.com> Message-ID: On Sat, Jul 4, 2015 at 8:26 AM, Nick Coghlan wrote: > On 4 July 2015 at 18:56, Todd wrote: > > That being said, one thing that IPython and other shells have shown is > that > > it is possible to make a much more powerful python shell. So I don't > think > > it is out of the realm of possibility to take a hard look at the current > > python shell and see where and how it can be made more useful. The > IPython > > shell is one of many places we could look for ideas. > > Software Carpentry already recommend the IPython Notebook to research > scientists and data analysts learning Python (understandably so, since > IPython Notebook is built by and for research scientists and data > analysts). > * http://software-carpentry.org/blog/2012/03/the-ipython-notebook.html * http://software-carpentry.org/blog/2013/03/using-notebook-as-a-teaching-tool.html * https://software-carpentry.org/v5/novice/python/06-cmdline.html (IPython) > > The needs for programming education are different, and the Raspberry > Pi Foundation are starting looking at the available options in that > space (including asking the question of whether or not there should be > "Python for Education" edition that bundles additional third party > libraries that don't make sense to include in the default > installation. > I believe it's possible to run Docker and LXC on a Raspberry Pi (ARM arch). https://github.com/ipython/ipython/wiki/Install:-Docker * IPython, Scipy Stack (CPython, Anaconda conda packages, pip packages) https://wrdrd.com/docs/consulting/education-technology#jupyter-and-learning * There are extensions to include which packages/modules are/were installed/necessary for a given notebook (watermark, version_information) #jupyter-and-reproducibility > > There's certainly scope for improving IDLE itself (within the > constraints of "no dependencies outside the standard library"), but > part of that includes refactoring IDLE to make it easier to work on > and test. idle-dev is the appropriate list to find out more about the > options there. > Some of my first lines of Python were in IDLE (with diveintopython 2). > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Sat Jul 4 21:32:36 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Sat, 4 Jul 2015 19:32:36 +0000 (UTC) Subject: [Python-ideas] Should iPython Notebook replace Idle References: Message-ID: <48206410457730373.474886sturla.molden-gmail.com@news.gmane.org> Todd wrote: > More out-there, but it probably isn't completely impossible for python to > provide some sort of native notebook-like interface, or at least some sort > of interface that makes it convenient for third parties to make such > notebook interfaces. It will not be possible because of this: https://jupyter.org Having Jupyter in the Python standard library would screw over the Julia and R users. Also, more extensive Python distros like Anaconda and Enthought Canopy include IPython/Jupyter by default. I think it is more important to start to direct users to select these than to include Jupyter in the standard lib. This is particularly important because of the growing number of essential packages that are not in the standard lib (e.g. NumPy, Cython, Numba, matplotlib, wxPython, Twisted, pymq, Sqalchemy, etc.) and the complex package dependencies. Another question is whether python.org should provide links to these installers, or even maintain a similar Python stack with a package manager (I am not sure if pip can handle all the complexities sufficient). Sturla From bussonniermatthias at gmail.com Sat Jul 4 22:16:32 2015 From: bussonniermatthias at gmail.com (Matthias Bussonnier) Date: Sat, 4 Jul 2015 13:16:32 -0700 Subject: [Python-ideas] Should iPython Notebook replace Idle In-Reply-To: <48206410457730373.474886sturla.molden-gmail.com@news.gmane.org> References: <48206410457730373.474886sturla.molden-gmail.com@news.gmane.org> Message-ID: <028E0DB2-1407-4353-B7CB-B89EB1E4EC26@gmail.com> > On Jul 4, 2015, at 12:32, Sturla Molden wrote: > > Todd wrote: > >> More out-there, but it probably isn't completely impossible for python to >> provide some sort of native notebook-like interface, or at least some sort >> of interface that makes it convenient for third parties to make such >> notebook interfaces. > > It will not be possible because of this: > > https://jupyter.org > > Having Jupyter in the Python standard library would screw over the Julia > and R users. Well they still have to install CPython to install Jupyter and even Python3 if they want multi-user... so I?m not sure that would ?screw? them. (or I don?t get why you mean by screw them). > Also, more extensive Python distros like Anaconda and Enthought Canopy > include IPython/Jupyter by default. I think it is more important to start > to direct users to select these than to include Jupyter in the standard > lib. This is particularly important because of the growing number of > essential packages that are not in the standard lib (e.g. NumPy, Cython, > Numba, matplotlib, wxPython, Twisted, pymq, Sqalchemy, etc.) and the > complex package dependencies. Another question is whether python.org should > provide links to these installers, or even maintain a similar Python stack > with a package manager (I am not sure if pip can handle all the > complexities sufficient). One possible, simpler thing would be to ship a minimal python kernel [1] that ?only? requires PyZMQ. Though, now that pip ?Just works"[2], I ?m not sure it is worth either. -- M [2]: offer subject to conditions [1]: https://github.com/dsblank/simple_kernel/blob/master/simple_kernel.py # simple_kernel.py # by Doug Blank # # This sample kernel is meant to be able to demonstrate using zmq for # implementing a language backend (called a kernel) for IPython. It is # written in the most straightforward manner so that it can be easily # translated into other programming languages. It doesn't use any code # from IPython, but only standard Python libraries and zmq. # # It is also designed to be able to run, showing the details of the # message handling system. > > > Sturla > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Sat Jul 4 22:33:43 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 4 Jul 2015 21:33:43 +0100 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: References: <17A8EA79-711C-407C-84BB-C7AC53337FFA@yahoo.com> Message-ID: On 4 July 2015 at 14:48, Nick Coghlan wrote: > It's a particular way of using itertools.accumulate: > > def infinite_series(fn, start): > def series_step(last_output, __): > return fn(last_output) > return accumulate(repeat(start), series_step) > > Due to the closure, that can be generalised to tracking multiple past > values fairly easily (e.g. using deque), and the use of repeat() means > you can adapt it to define finite iterators as well. > > This is covered in the accumulate docs > (https://docs.python.org/3/library/itertools.html#itertools.accumulate) > under the name "recurrence relation", but it may be worth extracting > the infinite series example as a new recipe in the recipes section. Ah, thanks. I hadn't thought of using accumulate with a function that ignored its second argument. Although the above seems noticeably less obvious than my hand-coded def iterate(fn, start): while True: yield start start = fn(start) Unless the accumulate version is substantially faster (I haven't tested), I can't imagine ever wanting to use it in preference. Paul From sturla.molden at gmail.com Sat Jul 4 22:55:46 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Sat, 04 Jul 2015 22:55:46 +0200 Subject: [Python-ideas] Should iPython Notebook replace Idle In-Reply-To: <028E0DB2-1407-4353-B7CB-B89EB1E4EC26@gmail.com> References: <48206410457730373.474886sturla.molden-gmail.com@news.gmane.org> <028E0DB2-1407-4353-B7CB-B89EB1E4EC26@gmail.com> Message-ID: On 04/07/15 22:16, Matthias Bussonnier wrote: > Well they still have to install CPython to install Jupyter > and even Python3 if they want multi-user... so I?m not sure that would > ?screw? > them. (or I don?t get why you mean by screw them). Release schedules and development would be swallowed into Python. If they need a bugfix for Julia or R, they have to wait for the next release of Python. There would also be Julia and R related issues in the CPython bugtracker. > One possible, simpler thing would be to ship a minimal python kernel [1] > that ?only? requires PyZMQ. Only? This would require a C++ compiler. Currently we can build CPython with just a C compiler. A language change is not a minor detail. Sturla From bussonniermatthias at gmail.com Sat Jul 4 23:04:48 2015 From: bussonniermatthias at gmail.com (Matthias Bussonnier) Date: Sat, 4 Jul 2015 14:04:48 -0700 Subject: [Python-ideas] Should iPython Notebook replace Idle In-Reply-To: References: <48206410457730373.474886sturla.molden-gmail.com@news.gmane.org> <028E0DB2-1407-4353-B7CB-B89EB1E4EC26@gmail.com> Message-ID: > On Jul 4, 2015, at 13:55, Sturla Molden wrote: > > On 04/07/15 22:16, Matthias Bussonnier wrote: > >> Well they still have to install CPython to install Jupyter >> and even Python3 if they want multi-user... so I?m not sure that would >> ?screw? >> them. (or I don?t get why you mean by screw them). > > Release schedules and development would be swallowed into Python. If they need a bugfix for Julia or R, they have to wait for the next release of Python. There would also be Julia and R related issues in the CPython bug tracker. Ah, in that sens... I analyse ?screwed" in the sens ?they could not use it anymore?. Yes, the release schedule would be annoying I guess. >> One possible, simpler thing would be to ship a minimal python kernel [1] >> that ?only? requires PyZMQ. > > Only? > > This would require a C++ compiler. Currently we can build CPython with just a C compiler. A language change is not a minor detail. That was the quotes around my ?only?. I should have marked them more. -- M From pierre.quentel at gmail.com Sun Jul 5 09:49:54 2015 From: pierre.quentel at gmail.com (Pierre Quentel) Date: Sun, 5 Jul 2015 09:49:54 +0200 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: <868F3729-3F83-4691-8F4C-A3EC85177279@yahoo.com> References: <868F3729-3F83-4691-8F4C-A3EC85177279@yahoo.com> Message-ID: Thank you Andrew. This would be a good argument for those who think that there's nothing you can do with itertools that can't be done in a more readable and as efficient way without it. Good argument, but that could be improved with the obvious (and more complete, you forgot the case stop < start) import operator def irange(value, stop, func): comp = operator.ge if stop>value else operator.le while True: if comp(value, stop): break yield value value = func(value) 2015-07-03 22:33 GMT+02:00 Andrew Barnert : > On Jul 3, 2015, at 12:37, Pierre Quentel wrote: > > > > - the second form requires mastering the functions in itertools, which > is not the case of all Python developers - after all, itertools is a > module, its functions are not built-in. Even those who do hesitate between > count() and accumulate(). > > Nobody "hesitates" between count and accumulate. They do completely > different things. And I think everyone who's answered you, and everyone > who's read any of the answers, understands that. It's only because you > described "powers of two" analytically in text, but "multiply the last > value by two" iteratively in pseudocode, that there's a question of which > one to use. That won't happen in any real-life cases. > > Of course people who haven't "mastered" itertools and aren't used to > thinking in higher-level terms might not think of accumulate and takewhile > here; they might instead write something like this: > > def iterate(func, start): > while True: > yield start > start = func(start) > > def irange(start, stop, stepfunc): > for value in iterate(stepfunc, start): > if value >= stop: break > yield value > > for powerof2 in irange(1, 1000, lambda n:n*2): > print(powerof2) > > But so what? It's a couple lines longer and maybe a tiny bit slower (at > least in CPython; I wouldn't be too surprised if it's actually faster in > PyPy...), but it's perfectly readable, and almost certainly efficient > enough. And it's abstracted into a pair of simple, reusable functions, > which you can always micro-optimize later if that turns out to be necessary. > > People on places like Reddit or StackOverflow like to debate about what's > the absolute best implementation for any idea, but if the naive > implementation that a novice would come up with on his own is good enough, > those debates aren't relevant except as a fun little challenge, or a way to > explore different parts of the language; the good enough code is good > enough as-is. So, this just falls into the "not every 3-line function needs > to be in the stdlib" category. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Jul 5 11:26:50 2015 From: guido at python.org (Guido van Rossum) Date: Sun, 5 Jul 2015 11:26:50 +0200 Subject: [Python-ideas] Should iPython Notebook replace Idle In-Reply-To: <48206410457730373.474886sturla.molden-gmail.com@news.gmane.org> References: <48206410457730373.474886sturla.molden-gmail.com@news.gmane.org> Message-ID: Basically agreeing with what's been said already: IPython is an application that has Python (and some other things) as a dependency. So is IDLE. Python itself should ideally not be bundled with any applications -- it's a historical accident that IDLE is in the stdlib. Of course there should be a basic command prompt, but I think that the existing one based on GNU readline is fine for that purpose. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun Jul 5 13:53:04 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 5 Jul 2015 21:53:04 +1000 Subject: [Python-ideas] Should iPython Notebook replace Idle In-Reply-To: References: <48206410457730373.474886sturla.molden-gmail.com@news.gmane.org> Message-ID: On 5 July 2015 at 19:26, Guido van Rossum wrote: > Basically agreeing with what's been said already: IPython is an application > that has Python (and some other things) as a dependency. So is IDLE. Python > itself should ideally not be bundled with any applications -- it's a > historical accident that IDLE is in the stdlib. Of course there should be a > basic command prompt, but I think that the existing one based on GNU > readline is fine for that purpose. This does raise an interesting question though: should we perhaps update the "getting & installing" parts of https://docs.python.org/3/using/index.html to cover more of the available options? For users interested in Python for research and data analysis, for example, the SciPy page on installing Python would likely be a better starting point than our upstream guide: http://www.scipy.org/install.html Similarly, someone looking for a more sophisticated IDE than IDLE would do well to explore PyCharm, Komodo, Wingware, Visual Studio Community Edition, or one of the other options listed at https://wiki.python.org/moin/IntegratedDevelopmentEnvironments It may also be worth our while to update https://docs.python.org/3/tutorial/index.html and/or https://docs.python.org/3/tutorial/interpreter.html to include a cross-reference to the usage guide for more detailed installation instructions (it took me a moment to remember where the platform specific installation guides were myself, so it wouldn't surprise me if someone reading the tutorial with no other context also had trouble finding them). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From pierre.quentel at gmail.com Sun Jul 5 18:15:09 2015 From: pierre.quentel at gmail.com (Pierre Quentel) Date: Sun, 5 Jul 2015 18:15:09 +0200 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: <340F8101-A3E0-4CCE-8AA4-CB68271E91F0@yahoo.com> References: <851tgrrqt5.fsf@benfinney.id.au> <55955759.3020502@stoneleaf.us> <340F8101-A3E0-4CCE-8AA4-CB68271E91F0@yahoo.com> Message-ID: 2015-07-03 12:29 GMT+02:00 Andrew Barnert : > On Jul 2, 2015, at 08:53, Pierre Quentel wrote: > > > > It's true, but testing that an integer is a range is very rare : the > pattern "if X in range(Y)" is only found once in all the Python 3.4 > standard library > > Given that most of the stdlib predates Python 3.2, and modules are rarely > rewritten to take advantage of new features just for the hell of it, this > isn't very surprising, or very meaningful. > Then most of the stdlib also predates Python 1.5.2 (the version I started with) : Python 1.5.2 (#0, Apr 13 1999, 10:51:12) [MSC 32 bit (Intel)] on win32 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> 4 in range(10) 1 >>> > > Similarly, you'll find that most of the stdlib doesn't use yield from > expressions, and many things that could be written in terms of > singledispatch instead use type switching, and so on. This doesn't mean > yield from or singledispatch are useless or rarely used. -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg at krypto.org Mon Jul 6 00:54:08 2015 From: greg at krypto.org (Gregory P. Smith) Date: Sun, 05 Jul 2015 22:54:08 +0000 Subject: [Python-ideas] Should iPython Notebook replace Idle In-Reply-To: References: <48206410457730373.474886sturla.molden-gmail.com@news.gmane.org> Message-ID: On Sun, Jul 5, 2015 at 5:01 AM Nick Coghlan wrote: > On 5 July 2015 at 19:26, Guido van Rossum wrote: > > Basically agreeing with what's been said already: IPython is an > application > > that has Python (and some other things) as a dependency. So is IDLE. > Python > > itself should ideally not be bundled with any applications -- it's a > > historical accident that IDLE is in the stdlib. Of course there should > be a > > basic command prompt, but I think that the existing one based on GNU > > readline is fine for that purpose. > > This does raise an interesting question though: should we perhaps > update the "getting & installing" parts of > https://docs.python.org/3/using/index.html to cover more of the > available options? > I think we should. The only reason anyone uses IDLE is that they found it as one of the included batteries (sometimes referred to by tutorials) and misinterpret that to think that it is good. It works, but it is mostly no frills with some annoying limitations (see the bug tracker). I doubt you'll find any core developers using IDLE to get work done. [this is where someone will pipe up and respond "hey!"] > > For users interested in Python for research and data analysis, for > example, the SciPy page on installing Python would likely be a better > starting point than our upstream guide: > http://www.scipy.org/install.html > > Similarly, someone looking for a more sophisticated IDE than IDLE > would do well to explore PyCharm, Komodo, Wingware, Visual Studio > Community Edition, or one of the other options listed at > https://wiki.python.org/moin/IntegratedDevelopmentEnvironments > > It may also be worth our while to update > https://docs.python.org/3/tutorial/index.html and/or > https://docs.python.org/3/tutorial/interpreter.html to include a > cross-reference to the usage guide for more detailed installation > instructions (it took me a moment to remember where the platform > specific installation guides were myself, so it wouldn't surprise me > if someone reading the tutorial with no other context also had trouble > finding them). > +1 -gps -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Mon Jul 6 02:29:45 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 5 Jul 2015 20:29:45 -0400 Subject: [Python-ideas] Should iPython Notebook replace Idle In-Reply-To: References: <48206410457730373.474886sturla.molden-gmail.com@news.gmane.org> Message-ID: On 7/5/2015 6:54 PM, Gregory P. Smith wrote: > The only reason anyone uses IDLE is that they found > it as one of the included batteries (sometimes referred to by tutorials) > and misinterpret that to think that it is good. It works, but it is > mostly no frills with some annoying limitations (see the bug tracker). I > doubt you'll find any core developers using IDLE to get work done. > [this is where someone will pipe up and respond "hey!"] Is that an invitation? It is true that I started using Idle because it came with CPython, and kept using it because it is much, much, much better than using Notepad + python.exe in Windows Command Prompt, and now because it has some 'frills' that I, as a Python-only programmer, consider essention. Two main ones are: 1. Running a file from an editor a. with one button press b. with immediate syntax check (without starting external process) c. with minimal startup time (inviting repeated testing) d. in -i mode (allowing post-run experiments) e. in a shell that understands tracebacks enough to parse out file and line information, open the file, and jump to the line (rt click in Idle). 2. Grep (Find in Files) that runs from the editor a. defaulting to the directory of the file being edited b. putting output in another edit window c. with ability to easily open the file and jump to the line of hits. Shell and editor aside, it has occurred to be that it would be nice to have a tutorial-notebook-interactive-doc system of some sort that runs on tkinter and comes with the stdlib. Thinking about it more, I believe that something basic that alternated between canned text, pre-written code, and a live prompt, could be added to the Idle Shell. -- Terry Jan Reedy From ncoghlan at gmail.com Mon Jul 6 04:28:43 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 6 Jul 2015 12:28:43 +1000 Subject: [Python-ideas] Should iPython Notebook replace Idle In-Reply-To: References: <48206410457730373.474886sturla.molden-gmail.com@news.gmane.org> Message-ID: On 6 July 2015 at 10:29, Terry Reedy wrote: > Shell and editor aside, it has occurred to be that it would be nice to have > a tutorial-notebook-interactive-doc system of some sort that runs on tkinter > and comes with the stdlib. Thinking about it more, I believe that something > basic that alternated between canned text, pre-written code, and a live > prompt, could be added to the Idle Shell. It's worth looking at some of the features of PyCharm Educational Edition in that regard. Tangentially related, something I would *love* to see at some point is IPython's "obj?" and "obj??" syntax extensions elevated to formal Python Enhancement Proposals. Concept sketch: obj? calls sys.__dochook__(obj) obj?? calls sys.__helphook__(obj) Default implementations: def __dochook__(obj): import pydoc print(pydoc.getdoc(obj)) def __helphook__(obj): help(obj) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From wes.turner at gmail.com Mon Jul 6 06:23:24 2015 From: wes.turner at gmail.com (Wes Turner) Date: Sun, 5 Jul 2015 23:23:24 -0500 Subject: [Python-ideas] Should iPython Notebook replace Idle In-Reply-To: References: <48206410457730373.474886sturla.molden-gmail.com@news.gmane.org> Message-ID: On Jul 5, 2015 9:29 PM, "Nick Coghlan" wrote: > > On 6 July 2015 at 10:29, Terry Reedy wrote: > > Shell and editor aside, it has occurred to be that it would be nice to have > > a tutorial-notebook-interactive-doc system of some sort that runs on tkinter > > and comes with the stdlib. Thinking about it more, I believe that something > > basic that alternated between canned text, pre-written code, and a live > > prompt, could be added to the Idle Shell. > > It's worth looking at some of the features of PyCharm Educational > Edition in that regard. PyCharm is great (and now has IPython Notebook integration) Spyder is great, and FOSS, has an IPython drawer (and now has pandas.DataFrame display support). Possibly more relevant for education and learning are the "test until green" features of a given IDE (or vim `:make`) > > Tangentially related, something I would *love* to see at some point is > IPython's "obj?" and "obj??" syntax extensions elevated to formal > Python Enhancement Proposals. > > Concept sketch: > > obj? calls sys.__dochook__(obj) > obj?? calls sys.__helphook__(obj) > > Default implementations: > > def __dochook__(obj): > import pydoc > print(pydoc.getdoc(obj)) > > def __helphook__(obj): > help(obj) > * [ ] ? and ?? support would be great (though you can also just define functions in e.g .pythonrc) * Python readline tab completion (without IDLE or IPython or bpython): http://pymotw.com/2/rlcompleter/ ... I'm sure there's a reason why this is not the default > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Mon Jul 6 07:00:10 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 5 Jul 2015 22:00:10 -0700 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: References: <851tgrrqt5.fsf@benfinney.id.au> <55955759.3020502@stoneleaf.us> <340F8101-A3E0-4CCE-8AA4-CB68271E91F0@yahoo.com> Message-ID: On Jul 5, 2015, at 09:15, Pierre Quentel wrote: > > 2015-07-03 12:29 GMT+02:00 Andrew Barnert : >> On Jul 2, 2015, at 08:53, Pierre Quentel wrote: >> > >> > It's true, but testing that an integer is a range is very rare : the pattern "if X in range(Y)" is only found once in all the Python 3.4 standard library >> >> Given that most of the stdlib predates Python 3.2, and modules are rarely rewritten to take advantage of new features just for the hell of it, this isn't very surprising, or very meaningful. > > Then most of the stdlib also predates Python 1.5.2 (the version I started with) : > > Python 1.5.2 (#0, Apr 13 1999, 10:51:12) [MSC 32 bit (Intel)] on win32 > Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam > >>> 4 in range(10) > 1 > >>> Well, a good chunk of the stdlib does predate 1.5.2, but not nearly as much as predates 3.2... At any rate, as I'm sure you know, that works in 1.5.2 because range returns a list. Try it with range(1000000000) and you may not be quite as happy with the result--but in 3.2+, it returns instantly, without using more than a few dozen bytes of memory. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Jul 6 07:09:16 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 6 Jul 2015 15:09:16 +1000 Subject: [Python-ideas] Should iPython Notebook replace Idle In-Reply-To: References: <48206410457730373.474886sturla.molden-gmail.com@news.gmane.org> Message-ID: On 6 July 2015 at 14:23, Wes Turner wrote: > * Python readline tab completion (without IDLE or IPython or bpython): > http://pymotw.com/2/rlcompleter/ ... I'm sure there's a reason why this is > not the default It's been the default since 3.4: https://docs.python.org/dev/whatsnew/3.4.html#sys Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Mon Jul 6 07:17:13 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 6 Jul 2015 15:17:13 +1000 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: References: <851tgrrqt5.fsf@benfinney.id.au> <55955759.3020502@stoneleaf.us> <340F8101-A3E0-4CCE-8AA4-CB68271E91F0@yahoo.com> Message-ID: On 6 July 2015 at 15:00, Andrew Barnert via Python-ideas wrote: > At any rate, as I'm sure you know, that works in 1.5.2 because range returns > a list. Try it with range(1000000000) and you may not be quite as happy with > the result--but in 3.2+, it returns instantly, without using more than a few > dozen bytes of memory. One kinda neat trick with Python 3 ranges is that you can actually work with computed ranges with a size that exceeds 2**64 (and hence can't be handled by len()): >>> 50 in range(-10**1000, 10**1000) True >>> len(range(-10**1000, 10**1000)) Traceback (most recent call last): File "", line 1, in OverflowError: Python int too large to convert to C ssize_t Conveniently, this also means attempting to convert them to a concrete list fails immediately, rather than eating up all your memory before falling over. Those particular bounds are so large they exceed the range of even a C double: >>> float(10**1000) Traceback (most recent call last): File "", line 1, in OverflowError: int too large to convert to float Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From abarnert at yahoo.com Mon Jul 6 08:03:54 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 5 Jul 2015 23:03:54 -0700 Subject: [Python-ideas] Pass a function as the argument "step" of range() In-Reply-To: References: <851tgrrqt5.fsf@benfinney.id.au> <55955759.3020502@stoneleaf.us> <340F8101-A3E0-4CCE-8AA4-CB68271E91F0@yahoo.com> Message-ID: On Jul 5, 2015, at 22:17, Nick Coghlan wrote: > > On 6 July 2015 at 15:00, Andrew Barnert via Python-ideas > wrote: >> At any rate, as I'm sure you know, that works in 1.5.2 because range returns >> a list. Try it with range(1000000000) and you may not be quite as happy with >> the result--but in 3.2+, it returns instantly, without using more than a few >> dozen bytes of memory. > > One kinda neat trick with Python 3 ranges is that you can actually > work with computed ranges with a size that exceeds 2**64 (and hence > can't be handled by len()): I didn't realize that worked; nifty. Also, I'm not sure why this message triggered this idea but... The OP's example, or any geometric sequence, or anything that can be analytically integrated into an invertible function, can actually provide all the same features as range, without that much code. And that seems like a perfect example for demonstrating how to build a collections.abc.Sequence that's not like a tuple. Maybe a "collections cookbook" would be a useful thing to have in the HOWTO docs? (The OrderedSet recipe could also be moved there from ActiveState; I can't think of any other ideas that aren't overkill like a binary tree as a sorted Mapping or something.) From bussonniermatthias at gmail.com Mon Jul 6 16:16:54 2015 From: bussonniermatthias at gmail.com (Matthias Bussonnier) Date: Mon, 6 Jul 2015 09:16:54 -0500 Subject: [Python-ideas] Should iPython Notebook replace Idle In-Reply-To: References: <48206410457730373.474886sturla.molden-gmail.com@news.gmane.org> Message-ID: > Concept sketch: > > obj? calls sys.__dochook__(obj) > obj?? calls sys.__helphook__(obj) > > Default implementations: > > def __dochook__(obj): > import pydoc > print(pydoc.getdoc(obj)) > > def __helphook__(obj): > help(obj) That would be great, there are (some) things we should to be careful with. prevent `obj?` in a loop is the first I can thing of. I know there are other gotchas. The second things, it would be nice to allow object to return different mimetype for their help, which I would like to see in the spec of __dochook__ and __helphook__ and not as a convention that depends on the shell you use. One of the things we want to play with in IPython (and in particular notebook), is to have run-able docs (yes with all the security concern that could imply), for which we would need richer data. I also want to point out that `?` also allow to do search for object name matching regex, which I like too. In [7]: ?numpy.*str* numpy.__str__ numpy.add_docstring numpy.array2string numpy.array_str numpy.datetime_as_string numpy.fromstring numpy.set_string_function numpy.str numpy.str0 numpy.str_ numpy.string_ -- M On Mon, Jul 6, 2015 at 12:09 AM, Nick Coghlan wrote: > On 6 July 2015 at 14:23, Wes Turner wrote: >> * Python readline tab completion (without IDLE or IPython or bpython): >> http://pymotw.com/2/rlcompleter/ ... I'm sure there's a reason why this is >> not the default > > It's been the default since 3.4: > https://docs.python.org/dev/whatsnew/3.4.html#sys > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From alan.cristh at gmail.com Mon Jul 6 16:32:21 2015 From: alan.cristh at gmail.com (Alan Cristhian) Date: Mon, 6 Jul 2015 11:32:21 -0300 Subject: [Python-ideas] Should iPython Notebook replace Idle In-Reply-To: References: <48206410457730373.474886sturla.molden-gmail.com@news.gmane.org> Message-ID: what do you think about this project? https://github.com/asweigart/idle-reimagined -------------- next part -------------- An HTML attachment was scrubbed... URL: From kale at thekunderts.net Mon Jul 6 16:39:58 2015 From: kale at thekunderts.net (Kale Kundert) Date: Mon, 06 Jul 2015 07:39:58 -0700 Subject: [Python-ideas] OrderedDict.peekitem() Message-ID: <559A933E.4090407@thekunderts.net> Today I was trying to use collections.OrderedDict to manage a LIFO queue, and I was surprised to realize that OrderedDict doesn't provide a way to look at its first or last item. There is an OrderedDict.popitem() method, which removes and returns either the first or last item, but it's not hard to imagine cases where you would want to see what's on the queue without popping it right away. My proposal is to add a peekitem() method to OrderedDict. This method would have the same signature and would return the same thing as popitem(), it just wouldn't modify the data structure. -Kale P.S. There is already a way to peek at the last item an OrderedDict, but it hides the intent of the code and you wouldn't think of it if you weren't familiar with python: next(reversed(ordered_dict)) From toddrjen at gmail.com Mon Jul 6 16:56:22 2015 From: toddrjen at gmail.com (Todd) Date: Mon, 6 Jul 2015 16:56:22 +0200 Subject: [Python-ideas] OrderedDict.peekitem() In-Reply-To: <559A933E.4090407@thekunderts.net> References: <559A933E.4090407@thekunderts.net> Message-ID: On Jul 6, 2015 4:49 PM, "Kale Kundert" wrote: > > Today I was trying to use collections.OrderedDict to manage a LIFO queue, and I > was surprised to realize that OrderedDict doesn't provide a way to look at its > first or last item. There is an OrderedDict.popitem() method, which removes and > returns either the first or last item, but it's not hard to imagine cases where > you would want to see what's on the queue without popping it right away. > > My proposal is to add a peekitem() method to OrderedDict. This method would > have the same signature and would return the same thing as popitem(), it just > wouldn't modify the data structure. > > -Kale > > P.S. There is already a way to peek at the last item an OrderedDict, but it > hides the intent of the code and you wouldn't think of it if you weren't > familiar with python: next(reversed(ordered_dict)) What about just making OrderedDict.values support indexing? -------------- next part -------------- An HTML attachment was scrubbed... URL: From pmiscml at gmail.com Mon Jul 6 17:38:29 2015 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Mon, 6 Jul 2015 18:38:29 +0300 Subject: [Python-ideas] OrderedDict.peekitem() In-Reply-To: <559A933E.4090407@thekunderts.net> References: <559A933E.4090407@thekunderts.net> Message-ID: <20150706183829.61ed59df@x230> Hello, On Mon, 06 Jul 2015 07:39:58 -0700 Kale Kundert wrote: > Today I was trying to use collections.OrderedDict to manage a LIFO > queue, and I was surprised to realize that OrderedDict doesn't > provide a way to look at its first or last item. There is an > OrderedDict.popitem() method, which removes and returns either the > first or last item, but it's not hard to imagine cases where you > would want to see what's on the queue without popping it right away. What's wrong with using list, collections.Deque, heapq to manage a LIFO queue? -- Best regards, Paul mailto:pmiscml at gmail.com From abarnert at yahoo.com Mon Jul 6 22:39:35 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 6 Jul 2015 13:39:35 -0700 Subject: [Python-ideas] OrderedDict.peekitem() In-Reply-To: References: <559A933E.4090407@thekunderts.net> Message-ID: On Jul 6, 2015, at 07:56, Todd wrote: > > On Jul 6, 2015 4:49 PM, "Kale Kundert" wrote: > > > > Today I was trying to use collections.OrderedDict to manage a LIFO queue, and I > > was surprised to realize that OrderedDict doesn't provide a way to look at its > > first or last item. There is an OrderedDict.popitem() method, which removes and > > returns either the first or last item, but it's not hard to imagine cases where > > you would want to see what's on the queue without popping it right away. > > > > My proposal is to add a peekitem() method to OrderedDict. This method would > > have the same signature and would return the same thing as popitem(), it just > > wouldn't modify the data structure. > > > > -Kale > > > > P.S. There is already a way to peek at the last item an OrderedDict, but it > > hides the intent of the code and you wouldn't think of it if you weren't > > familiar with python: next(reversed(ordered_dict)) > > What about just making OrderedDict.values support indexing? > Then you can't use integers as keys. Also, would a novice really expect d[0] to return an item (key, value pair) rather than a value? I think this pretty much has to be a nonstandard function, not an ambiguous reuse of subscripting. Or, maybe better, OrderedDict could have extended keys/values/items views that return something that's tuple-like as well as set-like (because indexing doesn't have any meaning on the standard views to conflict with, and nobody would expect mutability, and it would be obvious whether you're getting keys, values, or items). -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmludo at gmail.com Mon Jul 6 23:01:32 2015 From: gmludo at gmail.com (Ludovic Gasc) Date: Mon, 6 Jul 2015 23:01:32 +0200 Subject: [Python-ideas] Fwd: [Python-Dev] An yocto change proposal in logging module to simplify structured logs support In-Reply-To: References: <1110534133.1837705.1432584526401.JavaMail.yahoo@mail.yahoo.com> Message-ID: 2015-06-29 3:09 GMT+02:00 Nick Coghlan : > > On 29 Jun 2015 1:50 am, "Ludovic Gasc" wrote: > > In fact, the issue shouldn't be our brains, but it was clearly a time > consuming task, and we have too much directly paid-work to take care. > > > > Don't be wrong: I don't say that ELK doesn't work, only it's time > consuming with a high level of logs. > > I'm pretty sure that a lot of people are happy with ELK, it's cool for > them ;-) > > > > It's like Oracle and PostgreSQL databases: Where with Oracle you need a > full-time DBA, with PostgreSQL: apt-get install postgresql > > With this last sentence, I'm totally caricatural, but only to show where > I see an issue that should be fixed, at least for us. > > (FYI, in a previous professional life, I've maintained Oracle, MySQL and > PostgreSQL servers for several clients, I know a little bit the subject). > > This discrepancy in manageability between services like PostgreSQL & more > complex setups like the ELK stack is why Red Hat started working on > Nulecule as part of Project Atomic: > http://rhelblog.redhat.com/2015/06/23/announcing-yum-rpm-for-containerized-applications-nulecule-atomic-app/ > > Thanks for the link, I didn't know, I'll keep an eye on that. > There's still some work to be done making sure the related tools support > Debian and derivatives properly, but "the ELK stack is too hard to install > & maintain" is a distro level software management problem to be solved, > rather than something to try to work around at the language level. > As you can imagine, I want to finish to dig around journald before to change the strategy. Thanks for your time and your remarks. Have a nice day. -------------- next part -------------- An HTML attachment was scrubbed... URL: From breamoreboy at yahoo.co.uk Mon Jul 6 23:06:14 2015 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Mon, 06 Jul 2015 22:06:14 +0100 Subject: [Python-ideas] OrderedDict.peekitem() In-Reply-To: <20150706183829.61ed59df@x230> References: <559A933E.4090407@thekunderts.net> <20150706183829.61ed59df@x230> Message-ID: On 06/07/2015 16:38, Paul Sokolovsky wrote: > Hello, > > On Mon, 06 Jul 2015 07:39:58 -0700 > Kale Kundert wrote: > >> Today I was trying to use collections.OrderedDict to manage a LIFO >> queue, and I was surprised to realize that OrderedDict doesn't >> provide a way to look at its first or last item. There is an >> OrderedDict.popitem() method, which removes and returns either the >> first or last item, but it's not hard to imagine cases where you >> would want to see what's on the queue without popping it right away. > > What's wrong with using list, collections.Deque, heapq to manage a LIFO > queue? > Or queue.LifoQueue or even multiprocessing.Queue depending on the precise requirements? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From mistersheik at gmail.com Mon Jul 6 23:30:36 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Mon, 6 Jul 2015 14:30:36 -0700 (PDT) Subject: [Python-ideas] OrderedDict.peekitem() In-Reply-To: <559A933E.4090407@thekunderts.net> References: <559A933E.4090407@thekunderts.net> Message-ID: <3c4f0379-9b0a-49a5-b3d2-a583b527a99c@googlegroups.com> SortedDict (http://www.grantjenks.com/docs/sortedcontainers/sorteddict.html) manages to support indexing. Can OrderedDict do the same thing? On Monday, July 6, 2015 at 10:49:44 AM UTC-4, Kale Kundert wrote: > > Today I was trying to use collections.OrderedDict to manage a LIFO queue, > and I > was surprised to realize that OrderedDict doesn't provide a way to look at > its > first or last item. There is an OrderedDict.popitem() method, which > removes and > returns either the first or last item, but it's not hard to imagine cases > where > you would want to see what's on the queue without popping it right away. > > My proposal is to add a peekitem() method to OrderedDict. This method > would > have the same signature and would return the same thing as popitem(), it > just > wouldn't modify the data structure. > > -Kale > > P.S. There is already a way to peek at the last item an OrderedDict, but > it > hides the intent of the code and you wouldn't think of it if you weren't > familiar with python: next(reversed(ordered_dict)) > > _______________________________________________ > Python-ideas mailing list > Python... at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From grant.jenks at gmail.com Mon Jul 6 23:51:22 2015 From: grant.jenks at gmail.com (Grant Jenks) Date: Mon, 6 Jul 2015 14:51:22 -0700 Subject: [Python-ideas] OrderedDict.peekitem() In-Reply-To: <3c4f0379-9b0a-49a5-b3d2-a583b527a99c@googlegroups.com> References: <559A933E.4090407@thekunderts.net> <3c4f0379-9b0a-49a5-b3d2-a583b527a99c@googlegroups.com> Message-ID: On Mon, Jul 6, 2015 at 2:30 PM, Neil Girdhar wrote: > SortedDict (http://www.grantjenks.com/docs/sortedcontainers/sorteddict.html) > manages to support indexing. Can OrderedDict do the same thing? OrderedDict doesn't currently do the same thing. But you could use a SortedDict to implement an OrderedDict and have that feature. The use case: ``` In [2]: o = OrderedDict([(1, 2), (2, 3), (3, 4)]) In [3]: o.index[0] Out[3]: 1 In [4]: o.index[-1] Out[4]: 3 ``` A little benchmarking put it at half the speed of collections.OrderedDict. Here's the recipe: ``` from itertools import count from collections import MutableMapping from sortedcontainers import SortedDict class OrderedDictIndex(object): def __init__(self, nums): self._nums = nums def __len__(self): return len(self._nums) def __getitem__(self, index): num = self._nums.iloc[index] return self._nums[num] class OrderedDict(MutableMapping): def __init__(self, *args, **kwargs): self._dict = {} self._keys = {} self._nums = SortedDict() self._count = count() self.index = OrderedDictIndex(self._nums) self.update(*args, **kwargs) def __getitem__(self, key): return self._dict[key] def __setitem__(self, key, value): if key not in self._dict: num = next(self._count) self._keys[key] = num self._nums[num] = key self._dict[key] = value def __delitem__(self, key): del self._dict[key] num = self._keys.pop(key) del self._nums[num] def __len__(self): return len(self._dict) def __iter__(self): return self._nums.itervalues() ``` From ericsnowcurrently at gmail.com Mon Jul 6 23:59:55 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Mon, 6 Jul 2015 15:59:55 -0600 Subject: [Python-ideas] OrderedDict.peekitem() In-Reply-To: <3c4f0379-9b0a-49a5-b3d2-a583b527a99c@googlegroups.com> References: <559A933E.4090407@thekunderts.net> <3c4f0379-9b0a-49a5-b3d2-a583b527a99c@googlegroups.com> Message-ID: On Mon, Jul 6, 2015 at 3:30 PM, Neil Girdhar wrote: > SortedDict (http://www.grantjenks.com/docs/sortedcontainers/sorteddict.html) > manages to support indexing. Can OrderedDict do the same thing? Don't forget that, as the docs describe, an "OrderedDict is a dict that remembers the order that keys were first inserted". While obviously there's an implicit sequence for that order, the focus is still on dict-ness with the sequence exposed through the normal mapping approach (iteration). If you want to get sequence semantics then first unpack the order into a sequence type like list or tuple. Or use some other type than OrderedDict. Note that OrderedDict's view types are essentially just dict's view types with custom iteration. Adding indexing to the views would complicate things and certainly would not be O(1) like you would expect indexing to be. -eric From mistersheik at gmail.com Tue Jul 7 00:04:22 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Mon, 6 Jul 2015 18:04:22 -0400 Subject: [Python-ideas] OrderedDict.peekitem() In-Reply-To: References: <559A933E.4090407@thekunderts.net> <3c4f0379-9b0a-49a5-b3d2-a583b527a99c@googlegroups.com> Message-ID: You can do indexing, insertion, and removal all in logarithmic time (which is basically constant) by using a B-tree as the underlying data structure. (See e.g. the blist package.) On Mon, Jul 6, 2015 at 5:59 PM, Eric Snow wrote: > On Mon, Jul 6, 2015 at 3:30 PM, Neil Girdhar > wrote: > > SortedDict ( > http://www.grantjenks.com/docs/sortedcontainers/sorteddict.html) > > manages to support indexing. Can OrderedDict do the same thing? > > Don't forget that, as the docs describe, an "OrderedDict is a dict > that remembers the order that keys were first inserted". While > obviously there's an implicit sequence for that order, the focus is > still on dict-ness with the sequence exposed through the normal > mapping approach (iteration). If you want to get sequence semantics > then first unpack the order into a sequence type like list or tuple. > Or use some other type than OrderedDict. > > Note that OrderedDict's view types are essentially just dict's view > types with custom iteration. Adding indexing to the views would > complicate things and certainly would not be O(1) like you would > expect indexing to be. > > -eric > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Tue Jul 7 04:10:31 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 07 Jul 2015 11:10:31 +0900 Subject: [Python-ideas] [Python-Dev] PEP 493: Redistributor guidance for Python 2.7 HTTPS In-Reply-To: References: <20150706122123.195c4c57@fsol> Message-ID: <87615wiup4.fsf@uwakimon.sk.tsukuba.ac.jp> Cross-posted to redirect discussion. Replies directed to Python-Ideas. Erik Bray writes on Python-Dev: > On Mon, Jul 6, 2015 at 6:21 AM, Antoine Pitrou wrote: > > On Mon, 6 Jul 2015 14:22:46 +1000, Nick Coghlan wrote: > >> > >> The main change from the last version discussed on python-ideas > > > > Was it discussed there? That list has become totally useless, I've > > stopped following it. > > Considering that a useful discussion of a useful PEP occurred there > (not to mention other occasionally useful discussions) I'd say that > such a value judgment is not only unnecessary but also inaccurate. As you point out, the words "totally" and "useless" were unnecessary and inaccurate respectively. However, the gist of his post, that the S/N on Python-Ideas has become substantially lower in the last few months, seems accurate to me. At least two recent threads could have been continued on Python-List, where they would have benefited a lot more users, and they didn't seem profitable on Python-Ideas since it was quite evident that Those Who Know About Python were adamantly opposed to the idea as discussed in the thread, while the proponent kept pushing on that brick wall rather than seeking a way around it. I myself continue to follow Python-Ideas, Nick and other committers are posting here daily, and even Guido manages to pop up occasionally, so that may be no problem (or even a good thing if it results in educating and inviting new committers in the long run). But I think it's worth considering whether it we should cultivate a bit more discipline here. Again, discussion on Python-Ideas, please. From grant.jenks at gmail.com Tue Jul 7 04:51:55 2015 From: grant.jenks at gmail.com (Grant Jenks) Date: Mon, 6 Jul 2015 19:51:55 -0700 Subject: [Python-ideas] Indexable dict and set (was: OrderedDict.peekitem()) Message-ID: >>> On Mon, Jul 6, 2015 at 3:30 PM, Neil Girdhar >>> wrote: >>> SortedDict >>> (http://www.grantjenks.com/docs/sortedcontainers/sorteddict.html) >>> manages to support indexing. Can OrderedDict do the same thing? >> >> On Mon, Jul 6, 2015 at 5:59 PM, Eric Snow >> wrote: >> >> Don't forget that, as the docs describe, an "OrderedDict is a dict >> that remembers the order that keys were first inserted". While >> obviously there's an implicit sequence for that order, the focus is >> still on dict-ness with the sequence exposed through the normal >> mapping approach (iteration). If you want to get sequence semantics >> then first unpack the order into a sequence type like list or tuple. >> Or use some other type than OrderedDict. > > On Mon, Jul 6, 2015 at 3:04 PM, Neil Girdhar wrote: > You can do indexing, insertion, and removal all in logarithmic time (which > is basically constant) by using a B-tree as the underlying data structure. > (See e.g. the blist package.) If you want a `dict` or `set` that is indexable, it's quite easy using the sortedcontainers module. Just use the hash function as the key: ```python from sortedcontainers import SortedDict, SortedSet class DictWithIndex(SortedDict): def __init__(self, *args, **kwargs): super(DictWithIndex, self).__init__(hash, *args, **kwargs) class SetWithIndex(SortedSet): def __init__(self, *args, **kwargs): super(SetWithIndex, self).__init__(*args, key=hash, **kwargs) ``` The ordering can be quasi-random but the indexing can still be useful. Example usage: ``` In [2]: d = DictWithIndex(enumerate('abcde')) In [3]: d.iloc[4] Out[3]: 4 In [4]: d.iloc[-2] Out[4]: 3 In [5]: s = SetWithIndex('abcde') In [6]: s[4] Out[6]: 'e' In [7]: s[-2] Out[7]: 'd' ``` The likelihood that the hash collides is low and when it does so, the order will be based on insertion order. From abarnert at yahoo.com Tue Jul 7 06:08:09 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 6 Jul 2015 21:08:09 -0700 Subject: [Python-ideas] OrderedDict.peekitem() In-Reply-To: <3c4f0379-9b0a-49a5-b3d2-a583b527a99c@googlegroups.com> References: <559A933E.4090407@thekunderts.net> <3c4f0379-9b0a-49a5-b3d2-a583b527a99c@googlegroups.com> Message-ID: <132FBF34-EEDE-439A-B096-4C6662DA2CCC@yahoo.com> On Jul 6, 2015, at 14:30, Neil Girdhar wrote: > > SortedDict (http://www.grantjenks.com/docs/sortedcontainers/sorteddict.html) manages to support indexing. Only by having a special view, accessed as .index. If it just took indices as subscripts, that would be ambiguous with integer keys. > Can OrderedDict do the same thing? It's worth noting that most of the various different sorted containers on PyPI support something equivalent, but all with different interfaces. More importantly, they all use logarithmic data structures (binary trees, b-trees, skip lists, the hybrid thing blist uses, ...), which give you O(log N) indexing, and some of them can do even better by giving you O(log N) to find a slice and O(1) within that slice; OrderedDict uses a linked list, so it would be O(N). > >> On Monday, July 6, 2015 at 10:49:44 AM UTC-4, Kale Kundert wrote: >> Today I was trying to use collections.OrderedDict to manage a LIFO queue, and I >> was surprised to realize that OrderedDict doesn't provide a way to look at its >> first or last item. There is an OrderedDict.popitem() method, which removes and >> returns either the first or last item, but it's not hard to imagine cases where >> you would want to see what's on the queue without popping it right away. >> >> My proposal is to add a peekitem() method to OrderedDict. This method would >> have the same signature and would return the same thing as popitem(), it just >> wouldn't modify the data structure. >> >> -Kale >> >> P.S. There is already a way to peek at the last item an OrderedDict, but it >> hides the intent of the code and you wouldn't think of it if you weren't >> familiar with python: next(reversed(ordered_dict)) >> >> _______________________________________________ >> Python-ideas mailing list >> Python... at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Tue Jul 7 06:15:44 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 6 Jul 2015 21:15:44 -0700 Subject: [Python-ideas] OrderedDict.peekitem() In-Reply-To: References: <559A933E.4090407@thekunderts.net> <3c4f0379-9b0a-49a5-b3d2-a583b527a99c@googlegroups.com> Message-ID: On Jul 6, 2015, at 15:04, Neil Girdhar wrote: > > You can do indexing, insertion, and removal all in logarithmic time (which is basically constant) If you're dealing with dicts of, say, millions of items, then it's "basically constant" with a multiplier of 6-20x worse than the current implementation, and only as long as you never need to scale larger (which is a pretty major concern if you're writing a general-purpose library or something). That's not the same thing as actually constant. There's a reason all the scripting languages have hash-based containers, and that the systems languages that started off with only tree-based containers later added hash-based containers as well. Personally, I think it would be great if Python had tree-based sorted containers alongside the existing hash-based arbitrary-ordered ones. Then it would be trivial to add a tree-based insertion-order container to replace the hash-based insertion-order container when it's more appropriate to a specific use (e.g., when you need to efficiently random-access-index it). But, unless you're doing government work or something else that has no access to PyPI, that's already true today, so there's not much to wish for. > by using a B-tree as the underlying data structure. (See e.g. the blist package.) > >> On Mon, Jul 6, 2015 at 5:59 PM, Eric Snow wrote: >> On Mon, Jul 6, 2015 at 3:30 PM, Neil Girdhar wrote: >> > SortedDict (http://www.grantjenks.com/docs/sortedcontainers/sorteddict.html) >> > manages to support indexing. Can OrderedDict do the same thing? >> >> Don't forget that, as the docs describe, an "OrderedDict is a dict >> that remembers the order that keys were first inserted". While >> obviously there's an implicit sequence for that order, the focus is >> still on dict-ness with the sequence exposed through the normal >> mapping approach (iteration). If you want to get sequence semantics >> then first unpack the order into a sequence type like list or tuple. >> Or use some other type than OrderedDict. >> >> Note that OrderedDict's view types are essentially just dict's view >> types with custom iteration. Adding indexing to the views would >> complicate things and certainly would not be O(1) like you would >> expect indexing to be. >> >> -eric > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Tue Jul 7 06:23:14 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Tue, 7 Jul 2015 00:23:14 -0400 Subject: [Python-ideas] OrderedDict.peekitem() In-Reply-To: References: <559A933E.4090407@thekunderts.net> <3c4f0379-9b0a-49a5-b3d2-a583b527a99c@googlegroups.com> Message-ID: This thread is not about hash tables. This thread is about indexing into an ordered dictionary when you need an ordered dictionary. Someone pointed out that people expect indexing to be constant time. I agree that no one expects indexing to be linear time. My point was that logarithmic-time indexing is reasonable and possible. On Tue, Jul 7, 2015 at 12:15 AM, Andrew Barnert wrote: > On Jul 6, 2015, at 15:04, Neil Girdhar wrote: > > You can do indexing, insertion, and removal all in logarithmic time (which > is basically constant) > > > If you're dealing with dicts of, say, millions of items, then it's > "basically constant" with a multiplier of 6-20x worse than the current > implementation, and only as long as you never need to scale larger (which > is a pretty major concern if you're writing a general-purpose library or > something). That's not the same thing as actually constant. There's a > reason all the scripting languages have hash-based containers, and that the > systems languages that started off with only tree-based containers later > added hash-based containers as well. > > Personally, I think it would be great if Python had tree-based sorted > containers alongside the existing hash-based arbitrary-ordered ones. Then > it would be trivial to add a tree-based insertion-order container to > replace the hash-based insertion-order container when it's more appropriate > to a specific use (e.g., when you need to efficiently random-access-index > it). But, unless you're doing government work or something else that has no > access to PyPI, that's already true today, so there's not much to wish for. > > by using a B-tree as the underlying data structure. (See e.g. the blist > package.) > > On Mon, Jul 6, 2015 at 5:59 PM, Eric Snow > wrote: > >> On Mon, Jul 6, 2015 at 3:30 PM, Neil Girdhar >> wrote: >> > SortedDict ( >> http://www.grantjenks.com/docs/sortedcontainers/sorteddict.html) >> > manages to support indexing. Can OrderedDict do the same thing? >> >> Don't forget that, as the docs describe, an "OrderedDict is a dict >> that remembers the order that keys were first inserted". While >> obviously there's an implicit sequence for that order, the focus is >> still on dict-ness with the sequence exposed through the normal >> mapping approach (iteration). If you want to get sequence semantics >> then first unpack the order into a sequence type like list or tuple. >> Or use some other type than OrderedDict. >> >> Note that OrderedDict's view types are essentially just dict's view >> types with custom iteration. Adding indexing to the views would >> complicate things and certainly would not be O(1) like you would >> expect indexing to be. >> >> -eric >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Tue Jul 7 06:27:45 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Tue, 7 Jul 2015 00:27:45 -0400 Subject: [Python-ideas] OrderedDict.peekitem() In-Reply-To: <132FBF34-EEDE-439A-B096-4C6662DA2CCC@yahoo.com> References: <559A933E.4090407@thekunderts.net> <3c4f0379-9b0a-49a5-b3d2-a583b527a99c@googlegroups.com> <132FBF34-EEDE-439A-B096-4C6662DA2CCC@yahoo.com> Message-ID: On Tue, Jul 7, 2015 at 12:08 AM, Andrew Barnert wrote: > On Jul 6, 2015, at 14:30, Neil Girdhar wrote: > > SortedDict ( > http://www.grantjenks.com/docs/sortedcontainers/sorteddict.html) manages > to support indexing. > > > Only by having a special view, accessed as .index. If it just took indices > as subscripts, that would be ambiguous with integer keys. > > Can OrderedDict do the same thing? > > > It's worth noting that most of the various different sorted containers on > PyPI support something equivalent, but all with different interfaces. > 2015 might not be the right year to add sorted containers to Python. However, it might be a good idea to standardize the interface of a "sorted map" and "sorted set" in a PEP. > > More importantly, they all use logarithmic data structures (binary trees, > b-trees, skip lists, the hybrid thing blist uses, ...), which give you > O(log N) indexing, and some of them can do even better by giving you O(log > N) to find a slice and O(1) within that slice; OrderedDict uses a linked > list, so it would be O(N). > > > On Monday, July 6, 2015 at 10:49:44 AM UTC-4, Kale Kundert wrote: >> >> Today I was trying to use collections.OrderedDict to manage a LIFO queue, >> and I >> was surprised to realize that OrderedDict doesn't provide a way to look >> at its >> first or last item. There is an OrderedDict.popitem() method, which >> removes and >> returns either the first or last item, but it's not hard to imagine cases >> where >> you would want to see what's on the queue without popping it right away. >> >> My proposal is to add a peekitem() method to OrderedDict. This method >> would >> have the same signature and would return the same thing as popitem(), it >> just >> wouldn't modify the data structure. >> >> -Kale >> >> P.S. There is already a way to peek at the last item an OrderedDict, but >> it >> hides the intent of the code and you wouldn't think of it if you weren't >> familiar with python: next(reversed(ordered_dict)) >> >> _______________________________________________ >> Python-ideas mailing list >> Python... at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kale at thekunderts.net Tue Jul 7 08:56:42 2015 From: kale at thekunderts.net (Kale Kundert) Date: Mon, 06 Jul 2015 23:56:42 -0700 Subject: [Python-ideas] OrderedDict.peekitem() In-Reply-To: References: <559A933E.4090407@thekunderts.net> <3c4f0379-9b0a-49a5-b3d2-a583b527a99c@googlegroups.com> Message-ID: <559B782A.2080405@thekunderts.net> I didn't even mean for this thread to be about arbitrarily indexing into an OrderedDict. I meant for it to be about accessing the first and last items in an OrderedDict. Given that a method already exists to access and remove these items, I find it hard to understand why there isn't a method to simply access them. This should be a constant-time operation if OrderedDict employs a doubly-linked list under the hood. -Kale On 07/06/2015 09:23 PM, Neil Girdhar wrote: > This thread is not about hash tables. This thread is about indexing into an > ordered dictionary when you need an ordered dictionary. Someone pointed out > that people expect indexing to be constant time. I agree that no one expects > indexing to be linear time. My point was that logarithmic-time indexing is > reasonable and possible. From mistersheik at gmail.com Tue Jul 7 08:59:10 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Tue, 7 Jul 2015 02:59:10 -0400 Subject: [Python-ideas] OrderedDict.peekitem() In-Reply-To: <559B782A.2080405@thekunderts.net> References: <559A933E.4090407@thekunderts.net> <3c4f0379-9b0a-49a5-b3d2-a583b527a99c@googlegroups.com> <559B782A.2080405@thekunderts.net> Message-ID: What's wrong with "next(iter(o))" and "next(reversed(o))"? On Tue, Jul 7, 2015 at 2:56 AM, Kale Kundert wrote: > I didn't even mean for this thread to be about arbitrarily indexing into an > OrderedDict. I meant for it to be about accessing the first and last > items in > an OrderedDict. Given that a method already exists to access and remove > these > items, I find it hard to understand why there isn't a method to simply > access > them. This should be a constant-time operation if OrderedDict employs a > doubly-linked list under the hood. > > -Kale > > On 07/06/2015 09:23 PM, Neil Girdhar wrote: > > This thread is not about hash tables. This thread is about indexing > into an > > ordered dictionary when you need an ordered dictionary. Someone pointed > out > > that people expect indexing to be constant time. I agree that no one > expects > > indexing to be linear time. My point was that logarithmic-time indexing > is > > reasonable and possible. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kale at thekunderts.net Tue Jul 7 09:59:07 2015 From: kale at thekunderts.net (Kale Kundert) Date: Tue, 07 Jul 2015 00:59:07 -0700 Subject: [Python-ideas] OrderedDict.peekitem() In-Reply-To: References: <559A933E.4090407@thekunderts.net> <3c4f0379-9b0a-49a5-b3d2-a583b527a99c@googlegroups.com> <559B782A.2080405@thekunderts.net> Message-ID: <559B86CB.5050209@thekunderts.net> Three things: 1. Someone who is not all that familiar with python would never think to do either of those things. I've been using python for years and it didn't occur to me that I could peek at the ends of an OrderedDict like that until I saw someone on stack overflow suggest it. 2. Furthermore, if you don't have a good understanding of iterators in python, you could be excused for thinking that 'next(reversed(o))' creates a temporary list and is O(n) in time and memory. Those expressions read like they're doing a lot more work than they actually are, and that's confusing. 3. Readability counts, and those expressions hide the intent of the programmer. You wouldn't use 'next(iter(o))' to access the first element of a list, because that would be confusing and obfuscated. On 07/06/2015 11:59 PM, Neil Girdhar wrote: > What's wrong with "next(iter(o))" and "next(reversed(o))"? > > On Tue, Jul 7, 2015 at 2:56 AM, Kale Kundert > wrote: > > I didn't even mean for this thread to be about arbitrarily indexing into an > OrderedDict. I meant for it to be about accessing the first and last items in > an OrderedDict. Given that a method already exists to access and remove these > items, I find it hard to understand why there isn't a method to simply access > them. This should be a constant-time operation if OrderedDict employs a > doubly-linked list under the hood. > > -Kale > > On 07/06/2015 09:23 PM, Neil Girdhar wrote: > > This thread is not about hash tables. This thread is about indexing into an > > ordered dictionary when you need an ordered dictionary. Someone pointed out > > that people expect indexing to be constant time. I agree that no one expects > > indexing to be linear time. My point was that logarithmic-time indexing is > > reasonable and possible. > > From mistersheik at gmail.com Tue Jul 7 10:09:20 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Tue, 7 Jul 2015 04:09:20 -0400 Subject: [Python-ideas] OrderedDict.peekitem() In-Reply-To: <559B86CB.5050209@thekunderts.net> References: <559A933E.4090407@thekunderts.net> <3c4f0379-9b0a-49a5-b3d2-a583b527a99c@googlegroups.com> <559B782A.2080405@thekunderts.net> <559B86CB.5050209@thekunderts.net> Message-ID: Good points. I will do my best to address them: 3. The reason I like "next(iter(" and "reversed(iter", which is also probably the reason the discussion drifted into indexing is that right after someone asks for peek, someone else asks for peek_next and so on. If you're unhappy with logarithmic indexing time as you suggested, then the best you can do is linear time with respect to peek depth. That is just iteration. The most readable way to express one step of iteration is to call next on an iterator. The most readable way to produce an iterator from an iterable is with iter or reversed. 1/2. If you don't have a good understanding of Python, I think that that is something that should be remedied by learning about Python? There should ideally be one way of doing things after all. Maybe the documentation could be extended with a mention of how to peek? I don't know. But I disagree with adding peek to OrderedDict for the same reason that I disagree with adding peek to list and to heapq (as "heapq.peek(h)"). Best, Neil On Tue, Jul 7, 2015 at 3:59 AM, Kale Kundert wrote: > Three things: > > 1. Someone who is not all that familiar with python would never think to do > either of those things. I've been using python for years and it didn't > occur to > me that I could peek at the ends of an OrderedDict like that until I saw > someone > on stack overflow suggest it. > > 2. Furthermore, if you don't have a good understanding of iterators in > python, > you could be excused for thinking that 'next(reversed(o))' creates a > temporary > list and is O(n) in time and memory. Those expressions read like they're > doing > a lot more work than they actually are, and that's confusing. > > 3. Readability counts, and those expressions hide the intent of the > programmer. > You wouldn't use 'next(iter(o))' to access the first element of a list, > because > that would be confusing and obfuscated. > > On 07/06/2015 11:59 PM, Neil Girdhar wrote: > > What's wrong with "next(iter(o))" and "next(reversed(o))"? > > > > On Tue, Jul 7, 2015 at 2:56 AM, Kale Kundert > > wrote: > > > > I didn't even mean for this thread to be about arbitrarily indexing > into an > > OrderedDict. I meant for it to be about accessing the first and > last items in > > an OrderedDict. Given that a method already exists to access and > remove these > > items, I find it hard to understand why there isn't a method to > simply access > > them. This should be a constant-time operation if OrderedDict > employs a > > doubly-linked list under the hood. > > > > -Kale > > > > On 07/06/2015 09:23 PM, Neil Girdhar wrote: > > > This thread is not about hash tables. This thread is about > indexing into an > > > ordered dictionary when you need an ordered dictionary. Someone > pointed out > > > that people expect indexing to be constant time. I agree that no > one expects > > > indexing to be linear time. My point was that logarithmic-time > indexing is > > > reasonable and possible. > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From masklinn at masklinn.net Tue Jul 7 10:33:30 2015 From: masklinn at masklinn.net (Masklinn) Date: Tue, 7 Jul 2015 10:33:30 +0200 Subject: [Python-ideas] OrderedDict.peekitem() In-Reply-To: References: <559A933E.4090407@thekunderts.net> <3c4f0379-9b0a-49a5-b3d2-a583b527a99c@googlegroups.com> Message-ID: <01A622FD-75D4-4991-823D-13F2282C98D3@masklinn.net> On 2015-07-07, at 06:23 , Neil Girdhar wrote: > This thread is not about hash tables. This thread is about indexing into an ordered dictionary when you need an ordered dictionary. Someone pointed out that people expect indexing to be constant time. I agree that no one expects indexing to be linear time. My point was that logarithmic-time indexing is reasonable and possible. Linear time indexing would be possible by changing the OrderedDict implementation to Raymond Hettinger's compact dictionaries[0] with a delete operation recompacting the entries array rather than just nulling the item (it would make removals on "early" keys of large dictionaries more expensive though, delete would become O(n) with n the number of "living" entries added after the one being removed). [0] https://mail.python.org/pipermail/python-dev/2012-December/123028.html From mistersheik at gmail.com Tue Jul 7 10:37:00 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Tue, 7 Jul 2015 04:37:00 -0400 Subject: [Python-ideas] OrderedDict.peekitem() In-Reply-To: <01A622FD-75D4-4991-823D-13F2282C98D3@masklinn.net> References: <559A933E.4090407@thekunderts.net> <3c4f0379-9b0a-49a5-b3d2-a583b527a99c@googlegroups.com> <01A622FD-75D4-4991-823D-13F2282C98D3@masklinn.net> Message-ID: Yeah, but logarithmic time everything should be good enough for nearly all problems. After years of programming competitions, one thing I remember is how hard it is to craft a problem such that a linear solution is accepted, but a linearithmic one is not. On Tue, Jul 7, 2015 at 4:33 AM, Masklinn wrote: > On 2015-07-07, at 06:23 , Neil Girdhar wrote: > > > This thread is not about hash tables. This thread is about indexing > into an ordered dictionary when you need an ordered dictionary. Someone > pointed out that people expect indexing to be constant time. I agree that > no one expects indexing to be linear time. My point was that > logarithmic-time indexing is reasonable and possible. > > Linear time indexing would be possible by changing the OrderedDict > implementation to Raymond Hettinger's compact dictionaries[0] with a delete > operation recompacting the entries array rather than just nulling the item > (it would make removals on "early" keys of large dictionaries more > expensive though, delete would become O(n) with n the number of "living" > entries added after the one being removed). > > [0] https://mail.python.org/pipermail/python-dev/2012-December/123028.html > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/gDc4Ez6Z4MQ/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Jul 7 10:45:53 2015 From: guido at python.org (Guido van Rossum) Date: Tue, 7 Jul 2015 10:45:53 +0200 Subject: [Python-ideas] [Python-Dev] PEP 493: Redistributor guidance for Python 2.7 HTTPS In-Reply-To: <87615wiup4.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20150706122123.195c4c57@fsol> <87615wiup4.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: I do see all subjects in python-ideas, but I often mute unproductive threads. I also take my cues from the responses of others. I also spend an inordinate amount skimming email in general, and I don't expect others to do so (in fact, I wish some folks wouldn't spend so much time writing long emails -- or would spend more time making them shorter :-). On Tue, Jul 7, 2015 at 4:10 AM, Stephen J. Turnbull wrote: > Cross-posted to redirect discussion. Replies directed to Python-Ideas. > > Erik Bray writes on Python-Dev: > > On Mon, Jul 6, 2015 at 6:21 AM, Antoine Pitrou > wrote: > > > On Mon, 6 Jul 2015 14:22:46 +1000, Nick Coghlan > wrote: > > >> > > >> The main change from the last version discussed on python-ideas > > > > > > Was it discussed there? That list has become totally useless, I've > > > stopped following it. > > > > Considering that a useful discussion of a useful PEP occurred there > > (not to mention other occasionally useful discussions) I'd say that > > such a value judgment is not only unnecessary but also inaccurate. > > As you point out, the words "totally" and "useless" were unnecessary > and inaccurate respectively. > > However, the gist of his post, that the S/N on Python-Ideas has become > substantially lower in the last few months, seems accurate to me. At > least two recent threads could have been continued on Python-List, > where they would have benefited a lot more users, and they didn't seem > profitable on Python-Ideas since it was quite evident that Those Who > Know About Python were adamantly opposed to the idea as discussed in > the thread, while the proponent kept pushing on that brick wall rather > than seeking a way around it. > > I myself continue to follow Python-Ideas, Nick and other committers > are posting here daily, and even Guido manages to pop up occasionally, > so that may be no problem (or even a good thing if it results in > educating and inviting new committers in the long run). But I think > it's worth considering whether it we should cultivate a bit more > discipline here. > > Again, discussion on Python-Ideas, please. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Tue Jul 7 14:14:00 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 7 Jul 2015 05:14:00 -0700 Subject: [Python-ideas] OrderedDict.peekitem() In-Reply-To: References: <559A933E.4090407@thekunderts.net> <3c4f0379-9b0a-49a5-b3d2-a583b527a99c@googlegroups.com> <132FBF34-EEDE-439A-B096-4C6662DA2CCC@yahoo.com> Message-ID: <1E198AE0-1B3A-4014-921E-984219F3FE57@yahoo.com> On Jul 6, 2015, at 21:27, Neil Girdhar wrote: > >> On Tue, Jul 7, 2015 at 12:08 AM, Andrew Barnert wrote: >>> On Jul 6, 2015, at 14:30, Neil Girdhar wrote: >>> >>> SortedDict (http://www.grantjenks.com/docs/sortedcontainers/sorteddict.html) manages to support indexing. >> >> Only by having a special view, accessed as .index. If it just took indices as subscripts, that would be ambiguous with integer keys. >> >>> Can OrderedDict do the same thing? >> >> It's worth noting that most of the various different sorted containers on PyPI support something equivalent, but all with different interfaces. > > 2015 might not be the right year to add sorted containers to Python. However, it might be a good idea to standardize the interface of a "sorted map" and "sorted set" in a PEP. I suggested that before, but apparently 2013 was not the right time to suggest standardizing the interface without adding an implementation to the stdlib; maybe things are different enough in 2015. Thanks for raising the idea. I'll go back and try to dig up other past proposals and re-survey the different options to see if that seems plausible, and meanwhile I'll stop derailing this thread with it. :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Jul 7 14:18:12 2015 From: guido at python.org (Guido van Rossum) Date: Tue, 7 Jul 2015 14:18:12 +0200 Subject: [Python-ideas] [Python-Dev] PEP 493: Redistributor guidance for Python 2.7 HTTPS In-Reply-To: References: <20150706122123.195c4c57@fsol> <87615wiup4.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Tue, Jul 7, 2015 at 11:24 AM, Mark Lawrence wrote: > From https://mail.python.org/mailman/listinfo/python-ideas > > > This list is to contain discussion of speculative language ideas for > Python for possible inclusion into the language. If an idea gains traction > it can then be discussed and honed to the point of becoming a solid > proposal to put to python-dev as appropriate. > > > Relative to the above I believe that far too many proposals are for > trivial ideas, mainly targetted at the stdlib, that would be better suited > to the main python list. > > As for gaining traction, it's often the complete opposite, flogging a dead > horse is an understatement for some threads. Gently putting the OP down > with a firm but polite "it ain't gonna happen" would save a lot of time all > around. > > Just my ?0.02p worth. > Agreed. It's also probably easier to just ignore an obviously bad or poorly thought-out idea than to try to engage the OP in lengthy explanations of why that is so. After all there's a huge amount of subjectivity -- we won't change our minds, but it takes forever to explain to someone who's new to Python core development. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Tue Jul 7 14:34:06 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 7 Jul 2015 22:34:06 +1000 Subject: [Python-ideas] OrderedDict.peekitem() In-Reply-To: <559B86CB.5050209@thekunderts.net> References: <559A933E.4090407@thekunderts.net> <3c4f0379-9b0a-49a5-b3d2-a583b527a99c@googlegroups.com> <559B782A.2080405@thekunderts.net> <559B86CB.5050209@thekunderts.net> Message-ID: <20150707123405.GC10773@ando.pearwood.info> On Tue, Jul 07, 2015 at 12:59:07AM -0700, Kale Kundert wrote: > On 07/06/2015 11:59 PM, Neil Girdhar wrote: > > What's wrong with "next(iter(o))" and "next(reversed(o))"? > > Three things: > > 1. Someone who is not all that familiar with python would never think to do > either of those things. I've been using python for years and it didn't occur to > me that I could peek at the ends of an OrderedDict like that until I saw someone > on stack overflow suggest it. > > 2. Furthermore, if you don't have a good understanding of iterators in python, > you could be excused for thinking that 'next(reversed(o))' creates a temporary > list and is O(n) in time and memory. Those expressions read like they're doing > a lot more work than they actually are, and that's confusing. Do we know that OrderedDict.__reversed__ *doesn't* do that? I assume it won't, but I'm not sure it's a guarantee. > 3. Readability counts, and those expressions hide the intent of the programmer. > You wouldn't use 'next(iter(o))' to access the first element of a list, because > that would be confusing and obfuscated. I agree with all of those objections, especially the third. Which is why the reasonable and simple solution is to write: def first(od): return next(iter(od)) def last(od): return next(reversed(od)) then call first(o), last(o). Add comments, documentation and tests to taste. Or subclass OrderedDict and add first() last() methods. I am unconvinced that peeking at the first and last items of a dict (ordered, sorted or otherwise), let alone O(1) indexed access to items, is a good fit for the OrderedDict API. If I were the OD maintainer, I would want to understand your use-case (why are you using an OD as a queue, instead of a list, deque, or queue?), and possibly hear two or three additional uses, before agreeing to add it to the API. -- Steve From ncoghlan at gmail.com Tue Jul 7 14:55:39 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 7 Jul 2015 22:55:39 +1000 Subject: [Python-ideas] [Python-Dev] PEP 493: Redistributor guidance for Python 2.7 HTTPS In-Reply-To: References: <20150706122123.195c4c57@fsol> <87615wiup4.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 7 July 2015 at 18:45, Guido van Rossum wrote: > I do see all subjects in python-ideas, but I often mute unproductive > threads. I personally had to give up on actually *reading* all the threads quite some time ago, so if it's a topic where the outcome doesn't matter to me, I'll tune it out. I may occasionally dive back into a long-lived thread if I get curious as to how it has spun on for so long :) > I also take my cues from the responses of others. I also spend an > inordinate amount skimming email in general, and I don't expect others to do > so (in fact, I wish some folks wouldn't spend so much time writing long > emails -- or would spend more time making them shorter :-). Aye, I know I'm prone to the latter problem - to the point where I'll sometimes include an opening TL;DR on emails in a work context :P However, I also tend to assume everyone treats python-ideas as a "when I have time, and the thread seems interesting" activity (since any actual proposed changes will still need to go through python-dev or the issue tracker), so I'm a bit more self-indulgent here. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From breamoreboy at yahoo.co.uk Tue Jul 7 15:10:55 2015 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Tue, 07 Jul 2015 14:10:55 +0100 Subject: [Python-ideas] OrderedDict.peekitem() In-Reply-To: <20150707123405.GC10773@ando.pearwood.info> References: <559A933E.4090407@thekunderts.net> <3c4f0379-9b0a-49a5-b3d2-a583b527a99c@googlegroups.com> <559B782A.2080405@thekunderts.net> <559B86CB.5050209@thekunderts.net> <20150707123405.GC10773@ando.pearwood.info> Message-ID: On 07/07/2015 13:34, Steven D'Aprano wrote: > > I am unconvinced that peeking at the first and last items of a dict > (ordered, sorted or otherwise), let alone O(1) indexed access to items, > is a good fit for the OrderedDict API. If I were the OD maintainer, I > would want to understand your use-case (why are you using an OD as a > queue, instead of a list, deque, or queue?), and possibly hear two or > three additional uses, before agreeing to add it to the API. > That's not what the original message said:- Today I was trying to use collections.OrderedDict to manage a LIFO queue, and I was surprised to realize that OrderedDict doesn't provide a way to look at its first or last item. I've no idea what the use case is for managing a LIFO queue with anything. Only the OP can tell us what he's trying to achieve, so over to you Kale :) -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From oscar.j.benjamin at gmail.com Tue Jul 7 16:35:47 2015 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Tue, 07 Jul 2015 14:35:47 +0000 Subject: [Python-ideas] OrderedDict.peekitem() In-Reply-To: <20150707123405.GC10773@ando.pearwood.info> References: <559A933E.4090407@thekunderts.net> <3c4f0379-9b0a-49a5-b3d2-a583b527a99c@googlegroups.com> <559B782A.2080405@thekunderts.net> <559B86CB.5050209@thekunderts.net> <20150707123405.GC10773@ando.pearwood.info> Message-ID: On Tue, 7 Jul 2015 at 13:34 Steven D'Aprano wrote: > On Tue, Jul 07, 2015 at 12:59:07AM -0700, Kale Kundert wrote: > > > 3. Readability counts, and those expressions hide the intent of the > programmer. > > You wouldn't use 'next(iter(o))' to access the first element of a list, > because > > that would be confusing and obfuscated. > > I agree with all of those objections, especially the third. Which is why > the reasonable and simple solution is to write: > > def first(od): > return next(iter(od)) > > def last(od): > return next(reversed(od)) > > then call first(o), last(o). Add comments, documentation and tests to > taste. Or subclass OrderedDict and add first() last() methods. > If the OrderedDict is empty both of the above will raise StopIteration. This can have unintended consequences if the OrderedDict is called from another iterator/generator etc. So it should probably be: def first(od): try: return next(iter(od)) except StopIteration: raise KeyError > I am unconvinced that peeking at the first and last items of a dict > (ordered, sorted or otherwise), let alone O(1) indexed access to items, > is a good fit for the OrderedDict API. If I were the OD maintainer, I > would want to understand your use-case (why are you using an OD as a > queue, instead of a list, deque, or queue?), and possibly hear two or > three additional uses, before agreeing to add it to the API. > I once wanted to be able to peek the last item. My usecase was something to do with traversing a graph. Perhaps a depth-first search where I wanted a stack of the vertices I was traversing and an efficient way to check for a vertex in the stack (to avoid traversing cycles). I think the keys were the vertices and the values were iterators over the neighbours of the corresponding vertices. After traversing all neighbours of a vertex I want to pop that vertex off and continue traversing the vertex preceding it on the stack. I can retrieve the vertex from the top of the stack with popitem. However I want the returned vertex to remain in the stack while traversing its other neighbours so I pop it off and then push it back on again akin to: def peekitem(od): key, val = od.popitem() od[key] = val return key, val -- Oscar -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Tue Jul 7 16:58:43 2015 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 8 Jul 2015 00:58:43 +1000 Subject: [Python-ideas] OrderedDict.peekitem() In-Reply-To: References: <559A933E.4090407@thekunderts.net> <3c4f0379-9b0a-49a5-b3d2-a583b527a99c@googlegroups.com> <559B782A.2080405@thekunderts.net> <559B86CB.5050209@thekunderts.net> <20150707123405.GC10773@ando.pearwood.info> Message-ID: On Wed, Jul 8, 2015 at 12:35 AM, Oscar Benjamin wrote: > If the OrderedDict is empty both of the above will raise StopIteration. This > can have unintended consequences if the OrderedDict is called from another > iterator/generator etc. So it should probably be: > > def first(od): > try: > return next(iter(od)) > except StopIteration: > raise KeyError Agreed, though it's worth noting that as of Python 3.5, generators can be made not-vulnerable to this problem (and as of 3.7ish, they automatically will be protected). See PEP 479 for details. But yes, turning that into KeyError is the right thing to do, and is exactly why something like this wants to be a published recipe rather than simply dismissed as "see how easy it is?". Does it belong in a collection like moreitertools? ChrisA From rosuav at gmail.com Tue Jul 7 16:59:54 2015 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 8 Jul 2015 00:59:54 +1000 Subject: [Python-ideas] OrderedDict.peekitem() In-Reply-To: References: <559A933E.4090407@thekunderts.net> <3c4f0379-9b0a-49a5-b3d2-a583b527a99c@googlegroups.com> <559B782A.2080405@thekunderts.net> <559B86CB.5050209@thekunderts.net> <20150707123405.GC10773@ando.pearwood.info> Message-ID: On Wed, Jul 8, 2015 at 12:58 AM, Chris Angelico wrote: > On Wed, Jul 8, 2015 at 12:35 AM, Oscar Benjamin > wrote: >> If the OrderedDict is empty both of the above will raise StopIteration. This >> can have unintended consequences if the OrderedDict is called from another >> iterator/generator etc. So it should probably be: >> >> def first(od): >> try: >> return next(iter(od)) >> except StopIteration: >> raise KeyError > > Agreed, though it's worth noting that as of Python 3.5, generators can > be made not-vulnerable to this problem (and as of 3.7ish, they > automatically will be protected). See PEP 479 for details. But yes, > turning that into KeyError is the right thing to do, and is exactly > why something like this wants to be a published recipe rather than > simply dismissed as "see how easy it is?". > > Does it belong in a collection like moreitertools? > > ChrisA Doh. Next time, Chris, look first, *then* post. https://github.com/erikrose/more-itertools/blob/master/more_itertools/more.py#L40 ChrisA From ncoghlan at gmail.com Wed Jul 8 07:53:44 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 8 Jul 2015 15:53:44 +1000 Subject: [Python-ideas] Reporting unmatched parentheses in SyntaxError messages? Message-ID: One of the more opaque error messages new Python users can encounter is a syntax error due to unmatched parentheses: File "/home/me/myfile.py", line 11 data = func() ^ SyntaxError: invalid syntax While I have no idea how we could implement it, I'm wondering if that might be clearer if the error message instead looked more like this: File "/home/me/myfile.py", line 11 data = func() ^ SyntaxError: invalid syntax (Unmatched '(' on line 10) Or, similarly, SyntaxError: invalid syntax (Unmatched '[' on line 10) SyntaxError: invalid syntax (Unmatched '{' on line 10) I'm not sure it would be feasible though - we generate syntax errors from a range of locations where we don't have access to the original token data any more :( Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From me at the-compiler.org Wed Jul 8 08:09:20 2015 From: me at the-compiler.org (Florian Bruhin) Date: Wed, 8 Jul 2015 08:09:20 +0200 Subject: [Python-ideas] Reporting unmatched parentheses in SyntaxError messages? In-Reply-To: References: Message-ID: <20150708060920.GG22364@tonks> * Nick Coghlan [2015-07-08 15:53:44 +1000]: > One of the more opaque error messages new Python users can encounter > is a syntax error due to unmatched parentheses: > > File "/home/me/myfile.py", line 11 > data = func() > ^ > SyntaxError: invalid syntax > > While I have no idea how we could implement it, I'm wondering if that > might be clearer if the error message instead looked more like this: > > File "/home/me/myfile.py", line 11 > data = func() > ^ > SyntaxError: invalid syntax (Unmatched '(' on line 10) > > [...] I can't comment on the implementation - but I can confirm this comes up a lot in the #python IRC channel. People usually paste that line and ask what is wrong with it, and the answer usually is "look on the previous line". I can imagine having this error message instead would help people understand what's going on, without having to *know* they should look at the previous line when there's a SyntaxError which "makes no sense". Florian -- http://www.the-compiler.org | me at the-compiler.org (Mail/XMPP) GPG: 916E B0C8 FD55 A072 | http://the-compiler.org/pubkey.asc I love long mails! | http://email.is-not-s.ms/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 819 bytes Desc: not available URL: From abarnert at yahoo.com Wed Jul 8 09:30:16 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 8 Jul 2015 00:30:16 -0700 Subject: [Python-ideas] Reporting unmatched parentheses in SyntaxError messages? In-Reply-To: References: Message-ID: <6A269371-CC59-4607-8088-C13A1A38627C@yahoo.com> On Jul 7, 2015, at 22:53, Nick Coghlan wrote: > > One of the more opaque error messages new Python users can encounter > is a syntax error due to unmatched parentheses: > > File "/home/me/myfile.py", line 11 > data = func() > ^ > SyntaxError: invalid syntax > > While I have no idea how we could implement it, I'm wondering if that > might be clearer if the error message instead looked more like this: > > File "/home/me/myfile.py", line 11 > data = func() > ^ > SyntaxError: invalid syntax (Unmatched '(' on line 10) > > Or, similarly, > > SyntaxError: invalid syntax (Unmatched '[' on line 10) > SyntaxError: invalid syntax (Unmatched '{' on line 10) > > I'm not sure it would be feasible though - we generate syntax errors > from a range of locations where we don't have access to the original > token data any more :( Do we know that there's an unbalanced parens (or bracket or brace) even when we don't know where it is? I think this would still be sufficient: File "/home/me/myfile.py", line 11 data = func() ^ SyntaxError: invalid syntax (Unmatched '(', possibly on a previous line) Really, just telling people to look at a previous line for unmatched pairs is sufficient to solve the problem every time it comes up on #python, StackOverflow, etc. (Of course this would take away a great opportunity to explain to novices why they might want to use a better editor than Notepad, something which can show them mismatched parens automatically.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From breamoreboy at yahoo.co.uk Wed Jul 8 09:50:16 2015 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Wed, 08 Jul 2015 08:50:16 +0100 Subject: [Python-ideas] Reporting unmatched parentheses in SyntaxError messages? In-Reply-To: <6A269371-CC59-4607-8088-C13A1A38627C@yahoo.com> References: <6A269371-CC59-4607-8088-C13A1A38627C@yahoo.com> Message-ID: On 08/07/2015 08:30, Andrew Barnert via Python-ideas wrote: > On Jul 7, 2015, at 22:53, Nick Coghlan > > wrote: > >> One of the more opaque error messages new Python users can encounter >> is a syntax error due to unmatched parentheses: >> >> File "/home/me/myfile.py", line 11 >> data = func() >> ^ >> SyntaxError: invalid syntax >> >> While I have no idea how we could implement it, I'm wondering if that >> might be clearer if the error message instead looked more like this: >> >> File "/home/me/myfile.py", line 11 >> data = func() >> ^ >> SyntaxError: invalid syntax (Unmatched '(' on line 10) >> >> Or, similarly, >> >> SyntaxError: invalid syntax (Unmatched '[' on line 10) >> SyntaxError: invalid syntax (Unmatched '{' on line 10) >> >> I'm not sure it would be feasible though - we generate syntax errors >> from a range of locations where we don't have access to the original >> token data any more :( > > Do we know that there's an unbalanced parens (or bracket or brace) even > when we don't know where it is? I think this would still be sufficient: > > File "/home/me/myfile.py", line 11 > data = func() > ^ > SyntaxError: invalid syntax (Unmatched '(', possibly on a previous line) > > Really, just telling people to look at a previous line for unmatched > pairs is sufficient to solve the problem every time it comes up on > #python, StackOverflow, etc. > > (Of course this would take away a great opportunity to explain to > novices why they might want to use a better editor than Notepad, > something which can show them mismatched parens automatically.) > Add something here https://docs.python.org/3/tutorial/errors.html#syntax-errors taking into account both of the above paragraphs? Put it in the FAQs, if it's there already I missed it :) -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From stephen at xemacs.org Wed Jul 8 10:02:24 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 08 Jul 2015 17:02:24 +0900 Subject: [Python-ideas] Reporting unmatched parentheses in SyntaxError messages? In-Reply-To: References: Message-ID: <87pp43gjqn.fsf@uwakimon.sk.tsukuba.ac.jp> Nick Coghlan writes: > One of the more opaque error messages new Python users can encounter > is a syntax error due to unmatched parentheses: > > File "/home/me/myfile.py", line 11 > data = func() > ^ > SyntaxError: invalid syntax Agreed. Could be worse, though, it could be Lisp! > While I have no idea how we could implement it, I'm wondering if that > might be clearer if the error message instead looked more like this: > > File "/home/me/myfile.py", line 11 > data = func() > ^ > SyntaxError: invalid syntax (Unmatched '(' on line 10) I think I would prefer "Expected ')'". I think that typos like a = ((1, "one"), (2, "two)", (3, "three")) data = func() are likely to be fairly common (I make them often enough!), but I don't see how you're going to get the parser to identify the line containing "couple #2" as the source of the error (without a *really* dubious heuristic). > I'm not sure it would be feasible though - we generate syntax errors > from a range of locations where we don't have access to the original > token data any more :( Is the problem that we don't know which line the unmatched parenthesis was on, or that we don't even know that the syntax error is an unmatched parenthesis? From toddrjen at gmail.com Wed Jul 8 11:03:42 2015 From: toddrjen at gmail.com (Todd) Date: Wed, 8 Jul 2015 11:03:42 +0200 Subject: [Python-ideas] Reporting unmatched parentheses in SyntaxError messages? In-Reply-To: <87pp43gjqn.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87pp43gjqn.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Jul 8, 2015 10:02 AM, "Stephen J. Turnbull" wrote: > > Nick Coghlan writes: > > > While I have no idea how we could implement it, I'm wondering if that > > might be clearer if the error message instead looked more like this: > > > > File "/home/me/myfile.py", line 11 > > data = func() > > ^ > > SyntaxError: invalid syntax (Unmatched '(' on line 10) > > I think I would prefer "Expected ')'". I think that typos like > > a = ((1, "one"), > (2, "two)", > (3, "three")) > data = func() > > are likely to be fairly common (I make them often enough!), but I > don't see how you're going to get the parser to identify the line > containing "couple #2" as the source of the error (without a *really* > dubious heuristic). True, but we can definitely say it occurs on or after the first line in your example. So could we do something like: File "/home/me/myfile.py", line 11 data = func() ^ SyntaxError: Unmatched '(' starting somewhere after line 7 That would at least allow you to narrow down where to look for the problem. Of course in cases where the starting point is the current line or the previous line you could, in principle, have simpler exception messages. It may not be worth the increased complexity or decreased consistency, though. > > I'm not sure it would be feasible though - we generate syntax errors > > from a range of locations where we don't have access to the original > > token data any more :( > > Is the problem that we don't know which line the unmatched parenthesis > was on, or that we don't even know that the syntax error is an > unmatched parenthesis? I think there are two problems. First, it isn't clear what the problem is. Second, it is misleading about where the problem occurs. So I think the goal would be to have an exception that states what the problem is and doesn't give the wrong place to look for the problem. It doesn't have to give the right place, but it can't give the wrong place. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Wed Jul 8 13:33:59 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 8 Jul 2015 21:33:59 +1000 Subject: [Python-ideas] Reporting unmatched parentheses in SyntaxError messages? In-Reply-To: References: <87pp43gjqn.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20150708113359.GG10773@ando.pearwood.info> On Wed, Jul 08, 2015 at 11:03:42AM +0200, Todd wrote: > On Jul 8, 2015 10:02 AM, "Stephen J. Turnbull" wrote: > > > > Nick Coghlan writes: > > > > > While I have no idea how we could implement it, I'm wondering if that > > > might be clearer if the error message instead looked more like this: > > > > > > File "/home/me/myfile.py", line 11 > > > data = func() > > > ^ > > > SyntaxError: invalid syntax (Unmatched '(' on line 10) > > > > I think I would prefer "Expected ')'". I think that typos like > > > > a = ((1, "one"), > > (2, "two)", > > (3, "three")) > > data = func() > > > > are likely to be fairly common (I make them often enough!), but I > > don't see how you're going to get the parser to identify the line > > containing "couple #2" as the source of the error (without a *really* > > dubious heuristic). > > True, but we can definitely say it occurs on or after the first line in > your example. So could we do something like: > > File "/home/me/myfile.py", line 11 > data = func() > ^ > SyntaxError: Unmatched '(' starting somewhere after line 7 > > That would at least allow you to narrow down where to look for the problem. Even if you can't report a line number, you could report "on this or a previous line". The way I see it, if the parser knows enough to point the ^ before the first token on the line, it can report that there is a missing ) on a previous line, otherwise it may have to hedge. SyntaxError: Unmatched '(' before this line SyntaxError: Unmatched '(' on this or a previous line I believe that this would be a big help to beginners and casual users such as sys admins. Experienced programmers have learned the hard way that a SyntaxError may mean an unmatched bracket of some kind, but I think it would help even experienced coders to be explicit about the error. -- Steve From rosuav at gmail.com Wed Jul 8 13:42:03 2015 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 8 Jul 2015 21:42:03 +1000 Subject: [Python-ideas] Reporting unmatched parentheses in SyntaxError messages? In-Reply-To: <20150708113359.GG10773@ando.pearwood.info> References: <87pp43gjqn.fsf@uwakimon.sk.tsukuba.ac.jp> <20150708113359.GG10773@ando.pearwood.info> Message-ID: On Wed, Jul 8, 2015 at 9:33 PM, Steven D'Aprano wrote: > I believe that this would be a big help to beginners and casual users > such as sys admins. Experienced programmers have learned the hard way > that a SyntaxError may mean an unmatched bracket of some kind, but I > think it would help even experienced coders to be explicit about the > error. It's worth noting that this isn't peculiar to Python. *Any* error that gives a location can potentially have been caused by a misinterpretation of a previous line. So if it's too hard to say exactly where the likely problem is, I think it'd still be of value to suggest looking a line or two above for the actual problem. ChrisA From toddrjen at gmail.com Wed Jul 8 14:28:58 2015 From: toddrjen at gmail.com (Todd) Date: Wed, 8 Jul 2015 14:28:58 +0200 Subject: [Python-ideas] Reporting unmatched parentheses in SyntaxError messages? In-Reply-To: <20150708113359.GG10773@ando.pearwood.info> References: <87pp43gjqn.fsf@uwakimon.sk.tsukuba.ac.jp> <20150708113359.GG10773@ando.pearwood.info> Message-ID: On Wed, Jul 8, 2015 at 1:33 PM, Steven D'Aprano wrote: > On Wed, Jul 08, 2015 at 11:03:42AM +0200, Todd wrote: > > On Jul 8, 2015 10:02 AM, "Stephen J. Turnbull" > wrote: > > > > > > Nick Coghlan writes: > > > > > > > While I have no idea how we could implement it, I'm wondering if > that > > > > might be clearer if the error message instead looked more like this: > > > > > > > > File "/home/me/myfile.py", line 11 > > > > data = func() > > > > ^ > > > > SyntaxError: invalid syntax (Unmatched '(' on line 10) > > > > > > I think I would prefer "Expected ')'". I think that typos like > > > > > > a = ((1, "one"), > > > (2, "two)", > > > (3, "three")) > > > data = func() > > > > > > are likely to be fairly common (I make them often enough!), but I > > > don't see how you're going to get the parser to identify the line > > > containing "couple #2" as the source of the error (without a *really* > > > dubious heuristic). > > > > True, but we can definitely say it occurs on or after the first line in > > your example. So could we do something like: > > > > File "/home/me/myfile.py", line 11 > > data = func() > > ^ > > SyntaxError: Unmatched '(' starting somewhere after line 7 > > > > That would at least allow you to narrow down where to look for the > problem. > > Even if you can't report a line number, you could report "on this or a > previous line". > > The way I see it, if the parser knows enough to point the ^ before the > first token on the line, it can report that there is a missing ) on a > previous line, otherwise it may have to hedge. > > SyntaxError: Unmatched '(' before this line > > SyntaxError: Unmatched '(' on this or a previous line > > I believe that this would be a big help to beginners and casual users > such as sys admins. Experienced programmers have learned the hard way > that a SyntaxError may mean an unmatched bracket of some kind, but I > think it would help even experienced coders to be explicit about the > error. > > I think it should always be possible to report a range of line numbers within which the problem must occur. The start would be the outermost unclosed parenthesis. The end would the first statement that cannot be in a parentheses or the end of the file, whichever comes first (the latter is currently listed as the location of the exception). Although we can't say exactly where the problem occurs in this range, I think we can say that it must be somewhere in this range. So it should be possible to do something like this (I don't like the error message, this is just an example): File "/home/me/myfile.py", line 8:12 a = ((1, "one"), ^ ... data = func() ^ SyntaxError: Unmatched '(' in line range However, I don't know if this exception message structure could be a problem. Hence my original proposal, which would keep a simpler exception message. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bussonniermatthias at gmail.com Wed Jul 8 16:19:27 2015 From: bussonniermatthias at gmail.com (Matthias Bussonnier) Date: Wed, 8 Jul 2015 09:19:27 -0500 Subject: [Python-ideas] Reporting unmatched parentheses in SyntaxError messages? In-Reply-To: References: Message-ID: <3F6DB3E3-4235-4A49-A3B4-19047F2D1E75@gmail.com> Hi, > On Jul 8, 2015, at 00:53, Nick Coghlan wrote: > > > File "/home/me/myfile.py", line 11 > data = func() > ^ > SyntaxError: invalid syntax (Unmatched '(' on line 10) > > Or, similarly, > > SyntaxError: invalid syntax (Unmatched '[' on line 10) > SyntaxError: invalid syntax (Unmatched '{' on line 10) That would be great ! In[1]: for i in range(10) ...: print(i) File "", line 1 for i in range(10) ^ SyntaxError: invalid syntax Adding ?Missing colon? here would be also super-helpful for education. Same for ifs, elses... -- M From ron3200 at gmail.com Wed Jul 8 17:25:36 2015 From: ron3200 at gmail.com (Ron Adam) Date: Wed, 08 Jul 2015 11:25:36 -0400 Subject: [Python-ideas] Reporting unmatched parentheses in SyntaxError messages? In-Reply-To: References: Message-ID: On 07/08/2015 01:53 AM, Nick Coghlan wrote: > Or, similarly, > > SyntaxError: invalid syntax (Unmatched '[' on line 10) > SyntaxError: invalid syntax (Unmatched '{' on line 10) I don't think "invalid syntax" is needed here. SyntaxError is enough. > I'm not sure it would be feasible though - we generate syntax errors > from a range of locations where we don't have access to the original > token data any more :( Possibly another way to do this is to create a "SyntaxError token" in the parser with the needed information, then raise it if it's found in a later step. These aren't always found at the end of the file, they can come up when a brace or parentheses is mismatched. Currently those generate the syntax error at the end location, but they could say why and where the other brace is at. SyntaxError: found ] , instead of ) I think it would be better if the message's did not contain the location, and that part was moved to the traceback instead. Have a more general non location dependent error message is helpful for comparing similar Exceptions without having to filter out the numbers which can change between edits. File "/home/me/myfile.py", line 10 to 11 <----- # here data = func() ^ SyntaxError: unmatched '(' <---- not here Cheers. Ron From tjreedy at udel.edu Thu Jul 9 00:03:11 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 8 Jul 2015 18:03:11 -0400 Subject: [Python-ideas] Reporting unmatched parentheses in SyntaxError messages? In-Reply-To: References: Message-ID: On 7/8/2015 1:53 AM, Nick Coghlan wrote: > One of the more opaque error messages new Python users can encounter > is a syntax error due to unmatched parentheses: > > File "/home/me/myfile.py", line 11 > data = func() > ^ > SyntaxError: invalid syntax > I'm not sure it would be feasible though - we generate syntax errors > from a range of locations where we don't have access to the original > token data any more :( Could that be changed? An alternate approach is a separate fence-matcher function. Before I switched to Python 17+ years ago, I wrote a table-driven finite-state-machine matcher in C and a complete table for K&R/C89 C, which included info that openers were be be ignored within comments and strings. It reported the line and column of unclosed openers. I wrote it for my own use because I was frustrated by poor C compiler error messages. I have occasionally thought about developing a table for Python (and rewriting in Python), but indents and dedents are not trivial. (Even tokenizer.py does not handle \t indents correctly.) Maybe I should think a bit harder. Idle has an option to syntax-check a module without running it. If compile messages are not improved, it would certainly be sensible to run a separate fence-checker at least when check-only is requested, for better error messages. These could potentially include 'missing :' when a header 'opened' by for/while/if/elif/else/class/def/with is not closed by ':'. -- Terry Jan Reedy From ncoghlan at gmail.com Thu Jul 9 13:58:58 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 9 Jul 2015 21:58:58 +1000 Subject: [Python-ideas] Reporting unmatched parentheses in SyntaxError messages? In-Reply-To: References: Message-ID: On 9 July 2015 at 08:03, Terry Reedy wrote: > On 7/8/2015 1:53 AM, Nick Coghlan wrote: >> >> One of the more opaque error messages new Python users can encounter >> is a syntax error due to unmatched parentheses: >> >> File "/home/me/myfile.py", line 11 >> data = func() >> ^ >> SyntaxError: invalid syntax > > >> I'm not sure it would be feasible though - we generate syntax errors >> from a range of locations where we don't have access to the original >> token data any more :( > > > Could that be changed? I think we're already down to only having four places where they can be thrown (tokeniser, parser, symbol table analysis, byte code generator), so reducing it further seems unlikely. > I have occasionally thought about developing a table for Python (and > rewriting in Python), but indents and dedents are not trivial. (Even > tokenizer.py does not handle \t indents correctly.) Maybe I should think a > bit harder. Idle has an option to syntax-check a module without running it. > If compile messages are not improved, it would certainly be sensible to run > a separate fence-checker at least when check-only is requested, for better > error messages. These could potentially include 'missing :' when a header > 'opened' by for/while/if/elif/else/class/def/with is not closed by ':'. That sounds like a plausible direction, as it turned out the particular case that prompted this thread wasn't due to missing parentheses at all, it was a block of code like: try: .... statement dedented early except ...: ... I think Stephen Turnbull may also be on to something: we don't necessarily need to tell the user what fenced token was unmatched from earlier, it may be enough to tell them what *would* have been acceptable as the next token where the caret is pointing so they have something more specific to consider than "invalid syntax". For example, in the case I was attempting to help debug remotely, the error message might have been: File "/home/me/myfile.py", line 11 data = func() ^ SyntaxError: expected "except" or "finally" Other fence errors would then be: SyntaxError: expected ":" SyntaxError: expected ")" SyntaxError: expected "]" SyntaxError: expected "}" SyntaxError: expected "import" # from ... import ... SyntaxError: expected "else" # ... if ... else ... SyntaxError: expected "in" # for ... in ... And once 'async' is a proper keyword: SyntaxError: expected "def", "with" or "for" # async ... The currently problematic cases are those in https://docs.python.org/3/reference/grammar.html where seeing "foo" at one point in the token stream sets up the expectation in the parser that "bar" must appear a bit further along. At the moment, the parser bails out saying "I wasn't expecting this!", and doesn't answer the obvious follow on question "Well, what *were* you expecting?". Strings would also qualify for a similar kind of treatment, as the current error message doesn't tell us whether the parser was looking for closing single or double quotes: $ python3 -c "'" File "", line 1 ' ^ SyntaxError: EOL while scanning string literal $ python3 -c "'''" File "", line 1 ''' ^ SyntaxError: EOF while scanning triple-quoted string literal $ python3 -c '"' File "", line 1 " ^ SyntaxError: EOL while scanning string literal $ python3 -c '"""' File "", line 1 """ ^ SyntaxError: EOF while scanning triple-quoted string literal This discussion has headed into a part of the compiler chain that I don't actually know myself, though - the only thing I've ever had to do with the parser is modifying the grammar file and adding the brute force error message override when someone leaves out the parentheses on print() and exec() calls. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From rymg19 at gmail.com Thu Jul 9 21:11:36 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Thu, 09 Jul 2015 14:11:36 -0500 Subject: [Python-ideas] Rename python-dev? Message-ID: This is a crazy idea that popped into my head recently. What if python-dev were to be renamed to something like python-internals? (Yeah, that's a bad name, but my naming skills suck...) Basically, something that gets across the idea that it's for development of Python, not in Python. Python-dev would visibly disappear, but mails sent to python-dev would be redirected to the new name. BTW, in reality, it's not an uncommon mistake. I know of a few mailing lists built for development *with* a tool that end in -dev, such as asmjit-dev. -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Thu Jul 9 21:56:03 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 9 Jul 2015 15:56:03 -0400 Subject: [Python-ideas] Rename python-dev? In-Reply-To: References: Message-ID: On 7/9/2015 3:11 PM, Ryan Gonzalez wrote: > This is a crazy idea that popped into my head recently. > > What if python-dev were to be renamed to something like > python-internals? (Yeah, that's a bad name, but my naming skills > suck...) Basically, something that gets across the idea that it's for > development of Python, not in Python. Changing the list would take more effort than severals years of redirecting people. (The name bikeshedding alone ...) The current description is "Python core developers". Is that ambiguous? Think of anything better. "Development of the Python language and CPython implementation"? > Python-dev would visibly disappear, but mails sent to python-dev would > be redirected to the new name. > > BTW, in reality, it's not an uncommon mistake. I know of a few mailing > lists built for development *with* a tool that end in -dev, such as > asmjit-dev. -- Terry Jan Reedy From tjreedy at udel.edu Thu Jul 9 22:10:02 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 9 Jul 2015 16:10:02 -0400 Subject: [Python-ideas] Reporting unmatched parentheses in SyntaxError messages? In-Reply-To: References: Message-ID: On 7/9/2015 7:58 AM, Nick Coghlan wrote: [... message might have been] > File "/home/me/myfile.py", line 11 > data = func() > ^ > SyntaxError: expected "except" or "finally" The opposite problem is a closer without an opener. >>> ] SyntaxError: invalid syntax >>> except KeyError: SyntaxError: invalid syntax In both cases, "unexpected '{}'.format(obj) would be better, even without given the missing opener. -- Terry Jan Reedy From g.rodola at gmail.com Thu Jul 9 23:34:19 2015 From: g.rodola at gmail.com (Giampaolo Rodola') Date: Thu, 9 Jul 2015 23:34:19 +0200 Subject: [Python-ideas] @contextlib.contextmanager(with_context=True) - passing a context to contextmanager Message-ID: contextlib.contextmanager in Python 3 added the ability to define functions which work both as decorators and context managers, which is great and one of the features I appreciate the most about Python 3 (seriously). I am facing a use case where I find contextlib.contextmanager has some limitations though. I have a decorator which looks like this: def wrap_exceptions(fun): """Decorator which translates bare OSError and IOError exceptions into NoSuchProcess and AccessDenied. """ @functools.wraps(fun) def wrapper(self, *args, **kwargs): # here "self" is the Process class instance try: return fun(self, *args, **kwargs) except EnvironmentError as err: if err.errno in (errno.ENOENT, errno.ESRCH): raise NoSuchProcess(self.pid, self._name) if err.errno in (errno.EPERM, errno.EACCES): raise AccessDenied(self.pid, self._name) raise return wrapper ...and I use it like this: class Process: @wrap_exceptions def exe(): ... The two key things about this decorator are: - it's designed to be used with class methods - it has a reference to the method's class instance (self) I would like to push this a bit further and make wrap_exceptions() work also a contextmanager, which is what contextlib.contextmanager should allow me to do in an easy way. As contextlib.contextmanager stands right it won't allow my use case though as there's no way to pass a reference of the class instance (Process) to wrap_exceptions. So here is my proposal: what if we add a new "with_context" argument to contextlib.contextmanager? The resulting code would look like this: @contextlib.contextmanager(with_context=True) def wrap_exceptions(ctx): # ctx is the Process class instance try: yield except EnvironmentError as err: pid = ctx.pid name = ctx.name if err.errno in (errno.ENOENT, errno.ESRCH): raise NoSuchProcess(pid, name) if err.errno in (errno.EPERM, errno.EACCES): raise AccessDenied(pid, name) raise class Process: @wrap_exceptions() def exe(): ... class Process: def exe(self): with wrap_exceptions(self): ... It must be noted that: - when with_context=True and wrap_exceptions is used as a decorator it can only be used to decorate class methods and not regular functions - when used as a decorator "self" is automatically passed as the first argument for wrap_exceptions I'm not sure if this is actually possible as I haven't gone through contextlib.contextmanager in details (it's quite magical). Thoughts? -- Giampaolo - http://grodola.blogspot.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Thu Jul 9 23:56:46 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 9 Jul 2015 23:56:46 +0200 Subject: [Python-ideas] Rename python-dev? References: Message-ID: <20150709235646.374e70b0@fsol> On Thu, 09 Jul 2015 14:11:36 -0500 Ryan Gonzalez wrote: > This is a crazy idea that popped into my head recently. > > What if python-dev were to be renamed to something like python-internals? (Yeah, that's a bad name, but my naming skills suck...) Basically, something that gets across the idea that it's for development of Python, not in Python. We don't really get many people making the mistake, so I think it's ok to keep it as-is. Regards Antoine. From skip.montanaro at gmail.com Fri Jul 10 00:46:10 2015 From: skip.montanaro at gmail.com (Skip Montanaro) Date: Thu, 9 Jul 2015 17:46:10 -0500 Subject: [Python-ideas] Rename python-dev? In-Reply-To: <20150709235646.374e70b0@fsol> References: <20150709235646.374e70b0@fsol> Message-ID: On Thu, Jul 9, 2015 at 4:56 PM, Antoine Pitrou wrote: > We don't really get many people making the mistake ... And I've yet to see anyone make that mistake twice. Skip -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Fri Jul 10 00:53:39 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Fri, 10 Jul 2015 00:53:39 +0200 Subject: [Python-ideas] Concurrency Modules Message-ID: <559EFB73.5050606@mail.de> Hi, that's a follow up on the discussion started on python-dev ('The importance of the async keyword') and this issue http://bugs.python.org/issue24571 . After discussing the whole topic and reading it up further, it became clear to me what's actually missing in Python. That is a definitive guide of why/when a certain concurrency module is supposed to be used: Currently, I can name 4 modules of which I know that they more or less deal with the topic: - concurrent - threading - asyncio - multiprocessing In order to make a sound decision for the question: "Which one(s) do I use?", at least the following items should be somehow defined clearly for these modules: 1) relationship between the modules 2) NON-overlapping usage scenarios 3) future development intentions 4) ease of usage of the modules => future syntax 5) examples Remarks to the items: 1) For the basic understanding Do they complement each other? Differences in behavior? Do they overlap from the perspective of the programmer? They mostly do not care about internal details; they need to get things done (threads <-> processes) as long as the result is the same. 2) Extremely important to make the decision fast 3) Will asyncio incorporate all concepts of the other modules in a seamless way? Or are they just complementary? 4) Closely related to 3) 5) Maybe in close correlation with 2) and 1) Cheers, Chuck From g.rodola at gmail.com Fri Jul 10 01:51:00 2015 From: g.rodola at gmail.com (Giampaolo Rodola') Date: Fri, 10 Jul 2015 01:51:00 +0200 Subject: [Python-ideas] Concurrency Modules In-Reply-To: <559EFB73.5050606@mail.de> References: <559EFB73.5050606@mail.de> Message-ID: On Fri, Jul 10, 2015 at 12:53 AM, Sven R. Kunze wrote: > Hi, > > that's a follow up on the discussion started on python-dev ('The > importance of the async keyword') and this issue > http://bugs.python.org/issue24571 . > > After discussing the whole topic and reading it up further, it became > clear to me what's actually missing in Python. That is a definitive guide > of why/when a certain concurrency module is supposed to be used: > > Currently, I can name 4 modules of which I know that they more or less > deal with the topic: > - concurrent > - threading > - asyncio > - multiprocessing > +1 on the overall idea. Technically there's also asyncore and asynchat but they are deprecated. It might also be worth it to add a section listing the main third-party modules (twisted, tornado, gevent comes to mind). -- Giampaolo - http://grodola.blogspot.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ianlee1521 at gmail.com Fri Jul 10 03:06:34 2015 From: ianlee1521 at gmail.com (Ian Lee) Date: Thu, 9 Jul 2015 18:06:34 -0700 Subject: [Python-ideas] Concurrency Modules In-Reply-To: <559EFB73.5050606@mail.de> References: <559EFB73.5050606@mail.de> Message-ID: On Thursday, July 9, 2015, Sven R. Kunze wrote: > Hi, > > that's a follow up on the discussion started on python-dev ('The > importance of the async keyword') and this issue > http://bugs.python.org/issue24571 . > > After discussing the whole topic and reading it up further, it became > clear to me what's actually missing in Python. That is a definitive guide > of why/when a certain concurrency module is supposed to be used: > > Currently, I can name 4 modules of which I know that they more or less > deal with the topic: > - concurrent > - threading > - asyncio > - multiprocessing > > In order to make a sound decision for the question: "Which one(s) do I > use?", at least the following items should be somehow defined clearly for > these modules: > > 1) relationship between the modules > 2) NON-overlapping usage scenarios > 3) future development intentions > 4) ease of usage of the modules => future syntax > 5) examples +1 and also with regard specifically to the examples where there are overlap between different modules, equivalent approaches to performing some task. > > > Remarks to the items: > > 1) > For the basic understanding > Do they complement each other? > Differences in behavior? > Do they overlap from the perspective of the programmer? > They mostly do not care about internal details; they need to get things > done (threads <-> processes) as long as the result is the same. > > 2) Extremely important to make the decision fast > > 3) > Will asyncio incorporate all concepts of the other modules in a seamless > way? > Or are they just complementary? > > 4) Closely related to 3) > > 5) Maybe in close correlation with 2) and 1) > > Cheers, > Chuck > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- ~ Ian Lee | IanLee1521 at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Fri Jul 10 04:01:59 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 9 Jul 2015 22:01:59 -0400 Subject: [Python-ideas] Concurrency Modules In-Reply-To: References: <559EFB73.5050606@mail.de> Message-ID: On 7/9/2015 7:51 PM, Giampaolo Rodola' wrote: > On Fri, Jul 10, 2015 at 12:53 AM, Sven R. Kunze > > wrote: > > Hi, > > that's a follow up on the discussion started on python-dev ('The > importance of the async keyword') and this issue > http://bugs.python.org/issue24571 . > > After discussing the whole topic and reading it up further, it > became clear to me what's actually missing in Python. That is a > definitive guide of why/when a certain concurrency module is > supposed to be used: > > Currently, I can name 4 modules of which I know that they more or > less deal with the topic: > - concurrent > - threading > - asyncio > - multiprocessing > > > +1 on the overall idea. > Technically there's also asyncore and asynchat but they are deprecated. They should be listed as deprecated, with pointers to what superceded them. > It might also be worth it to add a section listing the main third-party > modules (twisted, tornado, gevent comes to mind). -- Terry Jan Reedy From rosuav at gmail.com Fri Jul 10 04:09:10 2015 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 10 Jul 2015 12:09:10 +1000 Subject: [Python-ideas] Concurrency Modules In-Reply-To: <559EFB73.5050606@mail.de> References: <559EFB73.5050606@mail.de> Message-ID: On Fri, Jul 10, 2015 at 8:53 AM, Sven R. Kunze wrote: > After discussing the whole topic and reading it up further, it became clear > to me what's actually missing in Python. That is a definitive guide of > why/when a certain concurrency module is supposed to be used I'm not sure how easy the decisions will be in all cases, but certainly some broad guidelines would be awesome. (The exact analysis of "when should I use threads and when should I use processes" is a big enough one that there've been a few million blog posts on the subject, and I doubt that asyncio will shrink that.) A basic summary would be hugely helpful. "Here's four similar modules, and why they all exist in the standard library." ChrisA From russell at keith-magee.com Fri Jul 10 04:53:02 2015 From: russell at keith-magee.com (Russell Keith-Magee) Date: Fri, 10 Jul 2015 10:53:02 +0800 Subject: [Python-ideas] Rename python-dev? In-Reply-To: References: Message-ID: On Fri, Jul 10, 2015 at 3:11 AM, Ryan Gonzalez wrote: > This is a crazy idea that popped into my head recently. > > What if python-dev were to be renamed to something like python-internals? > (Yeah, that's a bad name, but my naming skills suck...) Basically, > something that gets across the idea that it's for development of Python, > not in Python. > > Python-dev would visibly disappear, but mails sent to python-dev would be > redirected to the new name. > > BTW, in reality, it's not an uncommon mistake. I know of a few mailing > lists built for development *with* a tool that end in -dev, such as > asmjit-dev. You can't solve a social problem with technology, and I have evidence to back up this specific case. Django has a -dev and -users mailing list (following Python's example). We have exactly the same problem of people posting "how do I" questions to -dev, and the same recurring theme of posts that claim "if we just rename the group, the problem will go away". Our mailing lists are on Google Groups, so we have the option of setting a public name for the group that is different to the mail alias. About a year ago, we changed the name of the group from "Django Developers" to "Django developers (Contributions to Django itself)". When you sign up for the mailing list, you see that title, and confirm that this is the group you want to sign up for. Less than 2 days after the rename took effect, we had our first -dev post that should have been posted to -users. Since then, the rate of incorrectly addressed posts hasn't significantly changed from before the name change. We have a similar problem with the DSF contact page: https://www.djangoproject.com/contact/foundation/ The top of that page has a series of instructions indicating that the contact form is for the legal and fundraising arm of the project, and suggests several places to post inquiries about technical matters. And yet, we get 2-3 contact requests a week for technical assistance. The same is also true of the page to create new tickets on the Django bug tracker. https://code.djangoproject.com/newticket Despite the notices, we still get security reports and requests for help lodged as tickets. The moral of the story: evidence shows that no matter what the name, or the instructions given, people will get it wrong. Yes, these posts are annoying - but it's a teachable moment for people you are hoping to incorporate into your community. The best option (IMHO) is to politely redirect their question to -users, possibly with a link to a wiki page or documentation entry that describes the sources of help that are available for newcomers. It's also worth taking the time to work out what funnel has led people to post to the "wrong place". *Something* has led them to believe that posting to -dev is the right solution to their problem - how can the website or other resources be changed to alter that perception? Some deeper analytics on the path people have taken to get to the -dev signup page might help here. Yours, Russ Magee %-) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Fri Jul 10 09:09:03 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 10 Jul 2015 17:09:03 +1000 Subject: [Python-ideas] @contextlib.contextmanager(with_context=True) - passing a context to contextmanager In-Reply-To: References: Message-ID: On 10 July 2015 at 07:34, Giampaolo Rodola' wrote: > > @contextlib.contextmanager(with_context=True) > def wrap_exceptions(ctx): > # ctx is the Process class instance > try: > yield > except EnvironmentError as err: > pid = ctx.pid > name = ctx.name > if err.errno in (errno.ENOENT, errno.ESRCH): > raise NoSuchProcess(pid, name) > if err.errno in (errno.EPERM, errno.EACCES): > raise AccessDenied(pid, name) > raise There isn't anything in this syntax which says to me "designed to be used as a class method decorator", nor in the invocation that says "this CM gets access to the class or instance object" :) The mechanism underlying the context-manager-or-decorator behaviour is actually https://docs.python.org/3/library/contextlib.html#contextlib.ContextDecorator, and that's entirely unaware of both the function being decorated *and* its runtime arguments. This means it is unaware of the details of method invocations as well. If you want a method aware context manager, you'll likely want to write a decorator factory that accepts the relevant CM as a parameter. For example (untested): def with_cm(cm): def decorator(f): @functools.wraps(f) def wrapper(self, *args, **kwargs): with cm(self): return f(self, *args, **kwargs) Used as: class Process: @with_cm(wrap_exceptions) def exe(): ... Regardless, I'd advise against trying to hide the fact that there's an extra step going on in order to make the function being wrapped and/or one or more of its arguments available to the context manager, as that doesn't happen by default. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Fri Jul 10 09:18:22 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 10 Jul 2015 17:18:22 +1000 Subject: [Python-ideas] Concurrency Modules In-Reply-To: References: <559EFB73.5050606@mail.de> Message-ID: On 10 July 2015 at 12:09, Chris Angelico wrote: > On Fri, Jul 10, 2015 at 8:53 AM, Sven R. Kunze wrote: >> After discussing the whole topic and reading it up further, it became clear >> to me what's actually missing in Python. That is a definitive guide of >> why/when a certain concurrency module is supposed to be used > > I'm not sure how easy the decisions will be in all cases, but > certainly some broad guidelines would be awesome. (The exact analysis > of "when should I use threads and when should I use processes" is a > big enough one that there've been a few million blog posts on the > subject, and I doubt that asyncio will shrink that.) A basic summary > would be hugely helpful. "Here's four similar modules, and why they > all exist in the standard library." Q: Why are there four different modules A: Because they solve different problems Q: What are those problems? A: How long have you got? Choosing an appropriate concurrency model for a problem is one of the hardest tasks in software architecture design. The only way to make it appear simple is to focus in on a specific class of problems where there *is* a single clearly superior answer for that problem domain :) That said, I think there may be a way to make the boundary between synchronous and asynchronous execution easier to conceptualise, so I'll put up a thread about that. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Fri Jul 10 12:49:31 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 10 Jul 2015 20:49:31 +1000 Subject: [Python-ideas] Learning from the shell in supporting asyncio background calls Message-ID: Hi folks, Based on the recent discussions Sven kicked off regarding the complexity of interacting with asyncio from otherwise synchronous code, I came up with an API design that I like inspired by the way background and foreground tasks in the POSIX shell work. My blog post about this design is at http://www.curiousefficiency.org/posts/2015/07/asyncio-background-calls.html, but the essential components are the following two APIs: def run_in_background(target, *, loop=None): """Schedules target as a background task Returns the scheduled task. If target is a future or coroutine, equivalent to asyncio.ensure_future If target is a callable, it is scheduled in the default executor """ ... def run_in_foreground(task, *, loop=None): """Runs event loop in current thread until the given task completes Returns the result of the task. For more complex conditions, combine with asyncio.wait() To include a timeout, combine with asyncio.wait_for() """ ... run_in_background is akin to invoking a shell command with a trailing "&" - it puts the operation into the background, leaving the current thread to move on to the next operation (or wait for input at the REPL). When coroutines are scheduled, they won't start running until you start a foreground task, while callables delegated to the default executor will start running immediately. To actually get the *results* of that task, you have to run it in the foreground of the current thread using run_in_foreground - this is akin to bringing a background process to the foreground of a shell session using "fg". To relate this idea back to some of the examples Sven was discussing, here's how translating some old serialised synchronous code to use those APIs might look in practice: # Serial synchronous data loading def load_and_process_data(): data1 = load_remote_data_set1() data2 = load_remote_data_set2() return process_data(data1, data2) # Parallel asynchronous data loading def load_and_process_data(): future1 = asyncio.run_in_background(load_remote_data_set1_async()) future2 = asyncio.run_in_background(load_remote_data_set2_async()) data1 = asyncio.run_in_foreground(future1) data2 = asyncio.run_in_foreground(future2) return process_data(data1, data2) The application remains fundamentally synchronous, but the asyncio event loop is exploited to obtain some local concurrency in waiting for client IO operations. Regards, Nick. P.S. time.sleep() and asyncio.sleep() are rather handy as standins for blocking and non-blocking IO operations. I wish I'd remembered that earlier :) -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From oscar.j.benjamin at gmail.com Fri Jul 10 13:48:23 2015 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Fri, 10 Jul 2015 12:48:23 +0100 Subject: [Python-ideas] Learning from the shell in supporting asyncio background calls In-Reply-To: References: Message-ID: On 10 July 2015 at 11:49, Nick Coghlan wrote: > run_in_background is akin to invoking a shell command with a trailing > "&" - it puts the operation into the background, leaving the current > thread to move on to the next operation (or wait for input at the > REPL). When coroutines are scheduled, they won't start running until > you start a foreground task, while callables delegated to the default > executor will start running immediately. > > To actually get the *results* of that task, you have to run it in the > foreground of the current thread using run_in_foreground - this is > akin to bringing a background process to the foreground of a shell > session using "fg". > > To relate this idea back to some of the examples Sven was discussing, > here's how translating some old serialised synchronous code to use > those APIs might look in practice: > > # Serial synchronous data loading > def load_and_process_data(): > data1 = load_remote_data_set1() > data2 = load_remote_data_set2() > return process_data(data1, data2) > > # Parallel asynchronous data loading > def load_and_process_data(): > future1 = asyncio.run_in_background(load_remote_data_set1_async()) > future2 = asyncio.run_in_background(load_remote_data_set2_async()) > data1 = asyncio.run_in_foreground(future1) > data2 = asyncio.run_in_foreground(future2) > return process_data(data1, data2) Why is that better than something like: data1, data2 = asyncio.run([future1, future2]) IIUC your proposal is that run_in_background adds the tasks to an implicit global variable. Then the first call to run_in_foreground runs both tasks returning when future1 is ready. At that point it suspends future2 if incomplete? Then the second call to run_in_foreground returns immediately if future2 is ready or otherwise runs that task until complete? -- Oscar From guido at python.org Fri Jul 10 13:51:37 2015 From: guido at python.org (Guido van Rossum) Date: Fri, 10 Jul 2015 13:51:37 +0200 Subject: [Python-ideas] Learning from the shell in supporting asyncio background calls In-Reply-To: References: Message-ID: As I wrote on the issue, I'm -1 on this proposal. Not only does this API encourage beginners to ignore the essential difference between synchronous functions meant to run in a thread (using synchronous I/O and pre-emptive CPU scheduling) and asyncio coroutines/tasks (which use overlapped I/O and require explicit scheduling), it also encourages avoiding the "await" primitive (formerly "yield from") in favor of a function call which cannot be used from within a coroutine/task. This particular spelling moreover introduces a "similarity" between foreground and background tasks that doesn't actually exist. The example suggests that this should really be a pair of convenience functions in collections.futures, as it does not make any use of asyncio. On Fri, Jul 10, 2015 at 12:49 PM, Nick Coghlan wrote: > Hi folks, > > Based on the recent discussions Sven kicked off regarding the > complexity of interacting with asyncio from otherwise synchronous > code, I came up with an API design that I like inspired by the way > background and foreground tasks in the POSIX shell work. > > My blog post about this design is at > > http://www.curiousefficiency.org/posts/2015/07/asyncio-background-calls.html > , > but the essential components are the following two APIs: > > def run_in_background(target, *, loop=None): > """Schedules target as a background task > > Returns the scheduled task. > > If target is a future or coroutine, equivalent to > asyncio.ensure_future > If target is a callable, it is scheduled in the default executor > """ > ... > > def run_in_foreground(task, *, loop=None): > """Runs event loop in current thread until the given task completes > > Returns the result of the task. > For more complex conditions, combine with asyncio.wait() > To include a timeout, combine with asyncio.wait_for() > """ > ... > > run_in_background is akin to invoking a shell command with a trailing > "&" - it puts the operation into the background, leaving the current > thread to move on to the next operation (or wait for input at the > REPL). When coroutines are scheduled, they won't start running until > you start a foreground task, while callables delegated to the default > executor will start running immediately. > > To actually get the *results* of that task, you have to run it in the > foreground of the current thread using run_in_foreground - this is > akin to bringing a background process to the foreground of a shell > session using "fg". > > To relate this idea back to some of the examples Sven was discussing, > here's how translating some old serialised synchronous code to use > those APIs might look in practice: > > # Serial synchronous data loading > def load_and_process_data(): > data1 = load_remote_data_set1() > data2 = load_remote_data_set2() > return process_data(data1, data2) > > # Parallel asynchronous data loading > def load_and_process_data(): > future1 = asyncio.run_in_background(load_remote_data_set1_async()) > future2 = asyncio.run_in_background(load_remote_data_set2_async()) > data1 = asyncio.run_in_foreground(future1) > data2 = asyncio.run_in_foreground(future2) > return process_data(data1, data2) > > The application remains fundamentally synchronous, but the asyncio > event loop is exploited to obtain some local concurrency in waiting > for client IO operations. > > Regards, > Nick. > > P.S. time.sleep() and asyncio.sleep() are rather handy as standins for > blocking and non-blocking IO operations. I wish I'd remembered that > earlier :) > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Fri Jul 10 14:16:02 2015 From: wes.turner at gmail.com (Wes Turner) Date: Fri, 10 Jul 2015 07:16:02 -0500 Subject: [Python-ideas] Rename python-dev? In-Reply-To: References: Message-ID: On Jul 9, 2015 9:53 PM, "Russell Keith-Magee" wrote: > > > On Fri, Jul 10, 2015 at 3:11 AM, Ryan Gonzalez wrote: >> >> This is a crazy idea that popped into my head recently. >> >> What if python-dev were to be renamed to something like python-internals? (Yeah, that's a bad name, but my naming skills suck...) Basically, something that gets across the idea that it's for development of Python, not in Python. >> >> Python-dev would visibly disappear, but mails sent to python-dev would be redirected to the new name. >> >> BTW, in reality, it's not an uncommon mistake. I know of a few mailing lists built for development *with* a tool that end in -dev, such as asmjit-dev. > > > You can't solve a social problem with technology, and I have evidence to back up this specific case. > > Django has a -dev and -users mailing list (following Python's example). We have exactly the same problem of people posting "how do I" questions to -dev, and the same recurring theme of posts that claim "if we just rename the group, the problem will go away". Either: * not responded to * redirected (project src, builds,docs, issues links) * request for more information (to narrow scope; information gain; "-keywords TypeError") * "is there a coredump / stack trace uploader?" * in scope; ensuing discussion * answered/solved > > Our mailing lists are on Google Groups, so we have the option of setting a public name for the group that is different to the mail alias. About a year ago, we changed the name of the group from "Django Developers" to "Django developers (Contributions to Django itself)". When you sign up for the mailing list, you see that title, and confirm that this is the group you want to sign up for. > > Less than 2 days after the rename took effect, we had our first -dev post that should have been posted to -users. Since then, the rate of incorrectly addressed posts hasn't significantly changed from before the name change. Is there an ASCII block of text with each of the project links that could be appropriately pasted/helpfully suggested according to e.g. a #keywords and/or natural language patterns? https://westurner.org/wiki/ideas#open-source-mailing-list-extractor ("Looks like a mailing list optimization request; here are the relevant project links") ... The workflow is always "find similar from ( defined set, wider set ) [go fish]", link/crossref/clarify; test; write tests/docs; build (test); backtrack to traceable issue identifier (#6, urn:x-:org/proj/6) and follow-up (with a link for traceability). Is this at all relevant? * No, there is no justifiable reason to rename the mailing list (because I label them all .l.py) * #MailingListTriage > > We have a similar problem with the DSF contact page: > > https://www.djangoproject.com/contact/foundation/ > > The top of that page has a series of instructions indicating that the contact form is for the legal and fundraising arm of the project, and suggests several places to post inquiries about technical matters. And yet, we get 2-3 contact requests a week for technical assistance. > > The same is also true of the page to create new tickets on the Django bug tracker. > > https://code.djangoproject.com/newticket > > Despite the notices, we still get security reports and requests for help lodged as tickets. > > The moral of the story: evidence shows that no matter what the name, or the instructions given, people will get it wrong. Yes, these posts are annoying - but it's a teachable moment for people you are hoping to incorporate into your community. The best option (IMHO) is to politely redirect their question to -users, possibly with a link to a wiki page or documentation entry that describes the sources of help that are available for newcomers. > > It's also worth taking the time to work out what funnel has led people to post to the "wrong place". *Something* has led them to believe that posting to -dev is the right solution to their problem - how can the website or other resources be changed to alter that perception? Some deeper analytics on the path people have taken to get to the -dev signup page might help here. > > Yours, > Russ Magee %-) > > > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Fri Jul 10 22:45:26 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Fri, 10 Jul 2015 22:45:26 +0200 Subject: [Python-ideas] Concurrency Modules In-Reply-To: References: <559EFB73.5050606@mail.de> Message-ID: <55A02EE6.3080709@mail.de> That'll be great, Nick. I look forward to your proposal. Alongside with you proposal, there might be capable guys who would like to contribute on the questions I raised in my initial mail on this list. This also might help Nick to hammer out a good proposal. On 10.07.2015 09:18, Nick Coghlan wrote: > On 10 July 2015 at 12:09, Chris Angelico wrote: >> On Fri, Jul 10, 2015 at 8:53 AM, Sven R. Kunze wrote: >>> After discussing the whole topic and reading it up further, it became clear >>> to me what's actually missing in Python. That is a definitive guide of >>> why/when a certain concurrency module is supposed to be used >> I'm not sure how easy the decisions will be in all cases, but >> certainly some broad guidelines would be awesome. (The exact analysis >> of "when should I use threads and when should I use processes" is a >> big enough one that there've been a few million blog posts on the >> subject, and I doubt that asyncio will shrink that.) A basic summary >> would be hugely helpful. "Here's four similar modules, and why they >> all exist in the standard library." > Q: Why are there four different modules > A: Because they solve different problems > Q: What are those problems? > A: How long have you got? > > Choosing an appropriate concurrency model for a problem is one of the > hardest tasks in software architecture design. The only way to make it > appear simple is to focus in on a specific class of problems where > there *is* a single clearly superior answer for that problem domain :) > > That said, I think there may be a way to make the boundary between > synchronous and asynchronous execution easier to conceptualise, so > I'll put up a thread about that. > > Cheers, > Nick. > From srkunze at mail.de Fri Jul 10 22:57:51 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Fri, 10 Jul 2015 22:57:51 +0200 Subject: [Python-ideas] Concurrency Modules In-Reply-To: References: <559EFB73.5050606@mail.de> Message-ID: <55A031CF.9050002@mail.de> Seems like many people agree with the general idea of having a standard explanation and guideline of when to use which concurrency module. Nice! On the difference of threads and processes; it is an interesting topic (IMHO) but: *1) both processes and threads are just means to an end* 2) I would like to interchange them when necessary/one appears to be better than another 3) I would like to use the *same API* for both (for the major use cases), and switch from threads to processes if necessary or the other way round Regarding asyncio: 1) I do not know what its purposes really is COMPARED to all the other modules; that really needs clarification first before anything else 2) sometimes, I get the feeling people understand it as a third way to do concurrency (along with processes and threads) but then Guido and others tell me it makes no sense to use asyncio for stuff that can be done with threading or multiprocesses Let us see where these questions lead us. (the following two weeks, I will not able to contribute thoughts here, as am on a tour; I am curious of what the guys of python-ideas will post here :) ) Regards, Sven On 10.07.2015 04:09, Chris Angelico wrote: > On Fri, Jul 10, 2015 at 8:53 AM, Sven R. Kunze wrote: >> After discussing the whole topic and reading it up further, it became clear >> to me what's actually missing in Python. That is a definitive guide of >> why/when a certain concurrency module is supposed to be used > I'm not sure how easy the decisions will be in all cases, but > certainly some broad guidelines would be awesome. (The exact analysis > of "when should I use threads and when should I use processes" is a > big enough one that there've been a few million blog posts on the > subject, and I doubt that asyncio will shrink that.) A basic summary > would be hugely helpful. "Here's four similar modules, and why they > all exist in the standard library." > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael at hewitts.us Sat Jul 11 00:31:28 2015 From: michael at hewitts.us (Michael Hewitt) Date: Fri, 10 Jul 2015 15:31:28 -0700 Subject: [Python-ideas] Mitigating 'self.' Method Pollution Message-ID: Last night I made a post to the neopythonic blog proposing a Python 3.x feature that Guido asked me to forward to this alias. For the full background, see the link to my post below. For brevity, I will simply submit the proposal here. The specific problem I am addressing is the pollution of Python methods by 'self.' to reference fields. Here is the proposal: The name of the first parameter to a method can be used to scope subsequent variable references similar to the behavior of 'global'. Here are some examples: class Foo: def method_a(self) self x # subsequent 'x' references are scoped to 'self' x = 5 # same as self.x = 5 def method_b(this) this x, y # subsequent 'x' & 'y' refs are scoped to 'this' x = y # same as this.x = this.y def method_c(its) its.x = 5 # still works just like it used to This suggestion is fully backward compatible with existing Python code, but would eliminate the need to pollute future Python methods with copious 'self.' prefixes, thereby improving both readability and maintainabilty. Thank you for your consideration. Michael Hewitt Original Post: http://neopythonic.blogspot.com/2008/10/why-explicit-self-has-to-stay.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From joejev at gmail.com Sat Jul 11 01:28:14 2015 From: joejev at gmail.com (Joseph Jevnik) Date: Fri, 10 Jul 2015 19:28:14 -0400 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: References: Message-ID: How does this interact with descriptors, are they looked up everytime x and y are referenced or are they cached at the `this x, y` statement? On Fri, Jul 10, 2015 at 6:31 PM, Michael Hewitt wrote: > Last night I made a post to the neopythonic blog proposing a Python 3.x > feature that Guido asked me to forward to this alias. For the full > background, see the link to my post below. For brevity, I will simply > submit the proposal here. The specific problem I am addressing is the > pollution of Python methods by 'self.' to reference fields. Here is the > proposal: > > The name of the first parameter to a method can be used to scope > subsequent variable references similar to the behavior of 'global'. > > > Here are some examples: > > class Foo: > > def method_a(self) > > self x # subsequent 'x' references are scoped to 'self' > > x = 5 # same as self.x = 5 > > def method_b(this) > > this x, y # subsequent 'x' & 'y' refs are scoped to 'this' > > x = y # same as this.x = this.y > > def method_c(its) > > its.x = 5 # still works just like it used to > > > This suggestion is fully backward compatible with existing Python code, > but would eliminate the need to pollute future Python methods with copious > 'self.' prefixes, thereby improving both readability and maintainabilty. > > Thank you for your consideration. > > Michael Hewitt > > Original Post: > http://neopythonic.blogspot.com/2008/10/why-explicit-self-has-to-stay.html > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From breamoreboy at yahoo.co.uk Sat Jul 11 01:49:34 2015 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Sat, 11 Jul 2015 00:49:34 +0100 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: References: Message-ID: On 10/07/2015 23:31, Michael Hewitt wrote: > Last night I made a post to the neopythonic blog proposing a Python 3.x > feature that Guido asked me to forward to this alias. For the full > background, see the link to my post below. For brevity, I will simply > submit the proposal here. The specific problem I am addressing is the > pollution of Python methods by 'self.' to reference fields. Here is the > proposal: > > The name of the first parameter to a method can be used to scope > subsequent variable references similar to the behavior of 'global'. > > > Here are some examples: > > class Foo: > > def method_a(self) > > self x # subsequent 'x' references are scoped to 'self' > > x = 5 # same as self.x = 5 > > def method_b(this) > > this x, y # subsequent 'x' & 'y' refs are scoped to 'this' > > x = y # same as this.x = this.y > > def method_c(its) > > its.x = 5 # still works just like it used to > > > This suggestion is fully backward compatible with existing Python code, > but would eliminate the need to pollute future Python methods with > copious 'self.' prefixes, thereby improving both readability and > maintainabilty. I disagree completely. When I see:- self.x = 1 I currently know exactly what I'm looking at. All I see with this proposal is more work for my MKI eyeballs, which are already knackered, and more work for the people who maintain our static analysis tools, as you can still forget to properly scope your variables. So -1. > > Thank you for your consideration. > > Michael Hewitt > > Original Post: > http://neopythonic.blogspot.com/2008/10/why-explicit-self-has-to-stay.html > -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From alexander.belopolsky at gmail.com Sat Jul 11 02:00:15 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 10 Jul 2015 20:00:15 -0400 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: References: Message-ID: On Fri, Jul 10, 2015 at 6:31 PM, Michael Hewitt wrote: > Here is the proposal: > > The name of the first parameter to a method can be used to scope > subsequent variable references similar to the behavior of 'global'. > > > Here are some examples: > > class Foo: > > def method_a(self) > > self x # subsequent 'x' references are scoped to 'self' > > x = 5 # same as self.x = 5 > > I will give it a month after something like this goes into the language for style guides to appear that will recommend prefixing all member variables with a triple underscore to distinguish them from the locals. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pmiscml at gmail.com Sat Jul 11 02:01:23 2015 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Sat, 11 Jul 2015 03:01:23 +0300 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: References: Message-ID: <20150711030123.4ce6c15c@x230> Hello, On Fri, 10 Jul 2015 15:31:28 -0700 Michael Hewitt wrote: > Last night I made a post to the neopythonic blog proposing a Python > 3.x feature that Guido asked me to forward to this alias. For the > full background, see the link to my post below. For brevity, I will > simply submit the proposal here. The specific problem I am > addressing is the pollution of Python methods by 'self.' to reference > fields. Here is the proposal: > > The name of the first parameter to a method can be used to scope > subsequent variable references similar to the behavior of 'global'. > > > Here are some examples: > > class Foo: > > def method_a(self) > > self x # subsequent 'x' references are scoped to 'self' > > x = 5 # same as self.x = 5 > > def method_b(this) > > this x, y # subsequent 'x' & 'y' refs are scoped to 'this' > > x = y # same as this.x = this.y > > def method_c(its) > > its.x = 5 # still works just like it used to > > > This suggestion is fully backward compatible with existing Python > code, but would eliminate the need to pollute future Python methods > with copious 'self.' prefixes, thereby improving both readability and > maintainabilty. Even the language which is bag of warts - JavaScript - has essentially banned usage of its "with" statement: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/with . And the reason for this is (as the red box on that page says) that it interferes with readability, maintainability, and performance. And yet you propose to add essentially the same to Python. If you have troubles explaining why "self" is needed to kids, I'd suggest going along the lines of why there's capital letter at the start of sentence, full stop at the end, and wh hmn lnggs hv vwls, vn thgh ts pssbl t rd wtht thm. > > Thank you for your consideration. > > Michael Hewitt > > Original Post: > http://neopythonic.blogspot.com/2008/10/why-explicit-self-has-to-stay.html -- Best regards, Paul mailto:pmiscml at gmail.com From ethan at stoneleaf.us Sat Jul 11 02:05:03 2015 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 10 Jul 2015 17:05:03 -0700 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: References: Message-ID: <55A05DAF.9050402@stoneleaf.us> On 07/10/2015 03:31 PM, Michael Hewitt wrote: > This suggestion is fully backward compatible with existing Python code, but would eliminate the need to pollute future Python methods with copious 'self.' prefixes, thereby improving both readability > and maintainabilty. I disagree. Having a method with nothing but plain variables, plus a "scoping" line towards the top, will cause more work in keeping track of which are instance variables and which are local variables. Besides that, what happens when you create a function outside a class, and then add it to a class? class Blah: pass def a_func(self, a, b): self c, d ... Will that work? - Yes seems silly in the case of non-method functions - No seems artificially restrictive -- ~Ethan~ From ethan at stoneleaf.us Sat Jul 11 02:06:39 2015 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 10 Jul 2015 17:06:39 -0700 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: <20150711030123.4ce6c15c@x230> References: <20150711030123.4ce6c15c@x230> Message-ID: <55A05E0F.2010005@stoneleaf.us> On 07/10/2015 05:01 PM, Paul Sokolovsky wrote: > If you have troubles explaining why "self" is needed to kids, I'd > suggest going along the lines of why there's capital letter at the > start of sentence, full stop at the end, and wh hmn lnggs hv vwls, vn > thgh ts pssbl t rd wtht thm. +1 QotW :) -- ~Ethan~ From rosuav at gmail.com Sat Jul 11 03:04:07 2015 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 11 Jul 2015 11:04:07 +1000 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: <55A05E0F.2010005@stoneleaf.us> References: <20150711030123.4ce6c15c@x230> <55A05E0F.2010005@stoneleaf.us> Message-ID: On Sat, Jul 11, 2015 at 10:06 AM, Ethan Furman wrote: > On 07/10/2015 05:01 PM, Paul Sokolovsky wrote: > >> If you have troubles explaining why "self" is needed to kids, I'd >> suggest going along the lines of why there's capital letter at the >> start of sentence, full stop at the end, and wh hmn lnggs hv vwls, vn >> thgh ts pssbl t rd wtht thm. > > > +1 QotW :) > Hebrew and Arabic manage without vowels. English manages (almost) without diacriticals. Hungarian (I think) manages without verb tenses. Chinese and Japanese can manage without punctuation, although these days they usually use it. And Python manages without declared variables, which is why 'self.x = self.y' is completely different from C++ and family. But that also means that it's different from the JavaScript 'with' block, because the "self x, y" declaration is explicit about exactly which names are scoped that way. This wouldn't have that problem. However, I'm not sure it would be all that useful. Also, the current definition means that the first parameter magically becomes a keyword during the parsing of that function, which is extremely odd. I'm not seeing huge value in this, given that the declaration means yet another place to repeat all your attribute names; the only way I could imagine this being smoother is if you could make your declaration at class scope, and then it applies to _all_ the functions defined in it - but that wouldn't solve the problems either. ChrisA From steve at pearwood.info Sat Jul 11 03:21:29 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 11 Jul 2015 11:21:29 +1000 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: References: Message-ID: <20150711012129.GA27268@ando.pearwood.info> On Fri, Jul 10, 2015 at 03:31:28PM -0700, Michael Hewitt wrote: > Last night I made a post to the neopythonic blog proposing a Python 3.x > feature that Guido asked me to forward to this alias. For the full > background, see the link to my post below. For brevity, I will simply > submit the proposal here. The specific problem I am addressing is the > pollution of Python methods by 'self.' to reference fields. Here is the > proposal: > > The name of the first parameter to a method can be used to scope subsequent > variable references similar to the behavior of 'global'. At first, I was all "not another poorly thought out implicit self proposal!", but then I read it and thought, "you know, of all the implicit self proposals I've read, this is the first one that I might be able to get behind". It's clever. Possibly too clever. You cleverly avoid needing to introduce a new keyword to the language by using the name of the first parameter as the pseudo-keyword. I like that, but I think it may be a little to radical for the conservatives in the Python community. We tend to suffer from a sort of anti-"not invented here" syndrome: unless something has been tried and tested in at least one other language, and preferably a dozen, we don't want to know about it. It would help your case if you can show a language that already has this feature. Personally, I don't think of "self" as pollution at all. It's nice and explicit and helps readability. But if people wanted to avoid using it, and I can think of the odd case here and there where I might want to do so, I think an explicit declaration is a decent way to go. > Here are some examples: > > class Foo: > def method_a(self) > self x # subsequent 'x' references are scoped to 'self' > x = 5 # same as self.x = 5 You say "same as", I presume that this is a compile-time transformation that you write "x = 5" and the compiler treats it as "self.x = 5". So it doesn't matter whether x is a property or other descriptor, it will resolve the same as it currently does. I can see one (horrible) corner case: suppose you have code like this: def spam(self): self self self = self My guess is that this ought to be banned, as it's not clear when self refers to the self local variable and when it refers to the implicit self.self. It might also help your case to convince people that implicit attribute references are not as bad as they might seem. Can you give a brief survey of languages with implicit attribute references, and how they deal with the ambiguity of "attribute or variable"? -- Steve From steve at pearwood.info Sat Jul 11 03:34:13 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 11 Jul 2015 11:34:13 +1000 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: References: Message-ID: <20150711013412.GB27268@ando.pearwood.info> On Fri, Jul 10, 2015 at 07:28:14PM -0400, Joseph Jevnik wrote: > How does this interact with descriptors, are they looked up everytime x and > y are referenced or are they cached at the `this x, y` statement? My understanding is that the actual execution of Python code does not change. Today, if you write: def method(self): self.spam = self.eggs + 1 it compiles to something like this (in Python 2.7, different versions may use different bytecode): 2 0 LOAD_FAST 0 (self) 3 LOAD_ATTR 0 (eggs) 6 LOAD_CONST 1 (1) 9 BINARY_ADD 10 LOAD_FAST 0 (self) 13 STORE_ATTR 1 (spam) 16 LOAD_CONST 0 (None) 19 RETURN_VALUE There's no caching of attribute look-ups, if spam or eggs are descriptors (methods, properties, etc.) the descriptor protocol gets called each time LOAD_ATTR or STORE_ATTR is called. If you re-write that to: def method(self): self spam, eggs spam = eggs + 1 it should compile to *exactly the same byte code*. It's just a compiler directive, telling the compiler to treat any "spam" and "eggs" tokens as if they were actually "self.spam" and "self.eggs". That would, of course, mean that it's impossible to have a local variable with the same name as the declared attribute. I often write micro-optimized code like this: spam = self.spam = {} for x in big_loop: # avoid the repeated look-ups of ".spam" spam[x] = something If I declared "self spam", this wouldn't work (or rather, it would work, but it wouldn't have the effect I am looking for). The solution is simple: don't declare "self spam" in this case. -- Steve From mertz at gnosis.cx Sat Jul 11 03:34:24 2015 From: mertz at gnosis.cx (David Mertz) Date: Fri, 10 Jul 2015 18:34:24 -0700 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: References: Message-ID: -1000 This looks like an absolutely terrible anti-feature. It makes all code less readable, and violates the principle of locality (albeit, `global` and `nonlocal` can also be declared relatively far away from the use of those scoped variables too). On Fri, Jul 10, 2015 at 3:31 PM, Michael Hewitt wrote: > Last night I made a post to the neopythonic blog proposing a Python 3.x > feature that Guido asked me to forward to this alias. For the full > background, see the link to my post below. For brevity, I will simply > submit the proposal here. The specific problem I am addressing is the > pollution of Python methods by 'self.' to reference fields. Here is the > proposal: > > The name of the first parameter to a method can be used to scope > subsequent variable references similar to the behavior of 'global'. > > > Here are some examples: > > class Foo: > > def method_a(self) > > self x # subsequent 'x' references are scoped to 'self' > > x = 5 # same as self.x = 5 > > def method_b(this) > > this x, y # subsequent 'x' & 'y' refs are scoped to 'this' > > x = y # same as this.x = this.y > > def method_c(its) > > its.x = 5 # still works just like it used to > > > This suggestion is fully backward compatible with existing Python code, > but would eliminate the need to pollute future Python methods with copious > 'self.' prefixes, thereby improving both readability and maintainabilty. > > Thank you for your consideration. > > Michael Hewitt > > Original Post: > http://neopythonic.blogspot.com/2008/10/why-explicit-self-has-to-stay.html > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sat Jul 11 03:38:04 2015 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 11 Jul 2015 11:38:04 +1000 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: <20150711013412.GB27268@ando.pearwood.info> References: <20150711013412.GB27268@ando.pearwood.info> Message-ID: On Sat, Jul 11, 2015 at 11:34 AM, Steven D'Aprano wrote: > That would, of course, mean that it's impossible to have a local > variable with the same name as the declared attribute. I often write > micro-optimized code like this: > > spam = self.spam = {} > for x in big_loop: > # avoid the repeated look-ups of ".spam" > spam[x] = something > > If I declared "self spam", this wouldn't work (or rather, it would work, > but it wouldn't have the effect I am looking for). The solution is > simple: don't declare "self spam" in this case. That's not a killer; you'd just have to do something like "_spam = self.spam = {}", which would have the exact same effect. You abbreviate "self." to "_" and cache the lookup. ChrisA From ben+python at benfinney.id.au Sat Jul 11 04:20:33 2015 From: ben+python at benfinney.id.au (Ben Finney) Date: Sat, 11 Jul 2015 12:20:33 +1000 Subject: [Python-ideas] Mitigating 'self.' Method Pollution References: <20150711012129.GA27268@ando.pearwood.info> Message-ID: <85io9rxwni.fsf@benfinney.id.au> Steven D'Aprano writes: > Personally, I don't think of "self" as pollution at all. It's nice and > explicit and helps readability. +1. All the complaints that I see about ?pollution? or ?extra work? or etc., seem IME to be accompanied by a dislike of Python's ?explicit is better than implicit? principle. Since I think that principle is essential to Python's readability and maintainability, I think that removing explicit ?self? would tangibly *harm* readability and maintainability of Python code. -- \ ?I bet one legend that keeps recurring throughout history, in | `\ every culture, is the story of Popeye.? ?Jack Handey | _o__) | Ben Finney From rymg19 at gmail.com Sat Jul 11 04:42:03 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Fri, 10 Jul 2015 21:42:03 -0500 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: References: Message-ID: <105918D6-6C15-44D5-915C-5825C56922EE@gmail.com> -1e200. I have to deal with implicit 'this' enough in C++ at the risk of writing "bad style" or something like that. It has bitten before. Until I started following C++ style guides, I always put 'this->' in front of every variable but still had to deal with the accidental member reference when you meant something else. I also like the fact that Python lets you name constructor arguments the same way as members, e.g.: class X: def __init__(self, x, y): self.x = x self.y = y You can do the same in C++, but compilers with -Wall will complain unless you do this: class X { public: X(int x_, int y_): x{x_}, y{y_} {} ... } which is just ugly. On July 10, 2015 5:31:28 PM CDT, Michael Hewitt wrote: >Last night I made a post to the neopythonic blog proposing a Python 3.x >feature that Guido asked me to forward to this alias. For the full >background, see the link to my post below. For brevity, I will simply >submit the proposal here. The specific problem I am addressing is the >pollution of Python methods by 'self.' to reference fields. Here is >the >proposal: > >The name of the first parameter to a method can be used to scope >subsequent >variable references similar to the behavior of 'global'. > > >Here are some examples: > >class Foo: > >def method_a(self) > >self x # subsequent 'x' references are scoped to 'self' > >x = 5 # same as self.x = 5 > >def method_b(this) > >this x, y # subsequent 'x' & 'y' refs are scoped to 'this' > >x = y # same as this.x = this.y > >def method_c(its) > >its.x = 5 # still works just like it used to > > >This suggestion is fully backward compatible with existing Python code, >but >would eliminate the need to pollute future Python methods with copious >'self.' prefixes, thereby improving both readability and >maintainabilty. > >Thank you for your consideration. > >Michael Hewitt > >Original Post: >http://neopythonic.blogspot.com/2008/10/why-explicit-self-has-to-stay.html > > >------------------------------------------------------------------------ > >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >https://mail.python.org/mailman/listinfo/python-ideas >Code of Conduct: http://python.org/psf/codeofconduct/ -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Steve.Dower at microsoft.com Sat Jul 11 01:42:17 2015 From: Steve.Dower at microsoft.com (Steve Dower) Date: Fri, 10 Jul 2015 23:42:17 +0000 Subject: [Python-ideas] Concurrency Modules In-Reply-To: <55A031CF.9050002@mail.de> References: <559EFB73.5050606@mail.de> <55A031CF.9050002@mail.de> Message-ID: Sven R. Kunze wrote: > 1) I do not know what its purposes really is COMPARED to all the other modules; > that really needs clarification first before anything else > 2) sometimes, I get the feeling people understand it as a third way to do > concurrency (along with processes and threads) but then Guido and others tell me > it makes no sense to use asyncio for stuff that can be done with threading or > multiprocesses I'm going to dive into an analogy here. Hopefully it holds up better than most... Let's say you are making a cake. There are two high-level steps involved: 1. Gather all the ingredients 2. Mix all the ingredients 3. Bake it in the oven You are personally required to do steps 1 and 2 ("hands-on"). They takes all of your time and attention and you can't do anything else simultaneously. For step 3, you hand off the work to the oven. While the oven is baking, you are basically free to do other things. In this analogy, "you" are the main thread and the oven is another thread. (Thread and process are interchangeable here in the general sense - the GIL in Python is practicality that makes processes preferable, but that doesn't affect the concepts.) Steps 1 and 2 are CPU bound (as far as "you" the main thread are concerned), and step 3 is IO bound from "your" (the main thread's) point-of-view. Step 3 requires you to wait until it is complete: * You can do a synchronous wait, by sitting and staring at the oven until it's done. * You can poll, by occasionally interrupting yourself to walk over to the oven and see if it's done yet. * You can use a signal/interrupt, where the oven is going to make some noise and interrupt you when you're ready (but note: you know that the oven is done without having to walk over and check it). * Or you can use asyncio, where you occasionally interrupt yourself and, when you do, the oven will make some noise if it has finished. (and if you never interrupt yourself, the oven never makes a sound) This last option is most efficient for you, because you aren't interrupted at awkward times (i.e. greatly reduced need for locking on shared state) but you also don't have to walk all the way over to the oven to check whether it is done. You pause, listen, and get straight back to work if the oven is still going. That's the core feature of asyncio - not the networking or subprocess support - the ability to be notified efficiently that a task is complete without being interrupted by that notification. Now let's expand this to making 3 cakes in parallel to see how "parallelism" works. Since there's so much going on, we'll create a TODO list: 1. Make cake #1 2. Make cake #2 3. Make cake #3 (This means we've started three tasks to the current event loop. It's likely these are three external requests from clients, such as HTTP requests. It is possible, though not common in my experience, for production software to explicitly start with multiple tasks like this. More common is to have one task and a UI event loop that injects UI events as necessary.) Task 1 is the obvious place to start, so we take that off the TODO list and start working on it. The steps to make cake #1 are: * Gather ingredients for cake #1 * Mix ingredients for cake #1 * Bake cake #1 Gathering ingredients is a synchronous operation (`def gather_ingredients()`) so we do that until we've gathered everything. Mixing ingredients is a long, interruptible operation (`async def mix_ingredients()`, with occasional explicit `await yield()` or whatever syntax was chosen for this), so we start mixing and then pause. When we pause, we put our current task on the TODO list: 1. Make cake #2 2. Make cake #3 3. Continue mixing cake #1 We see that our next task is to make cake #2, so we repeat the steps above and eventually pause while we're mixing. Now the TODO list looks like: 1. Make cake #3 2. Continue mixing cake #1 3. Continue mixing cake #2 And this continues. (Note that selecting which task to continue with is a detail of the event loop you're using. Check the spec to see whether some tasks have a higher priority or what order tasks are continued in. And bear in mind that so far, we've only used explicit yields - "I'm ready to do something else now if something needs doing".) Eventually we will finish mixing one of the cakes, let's say it's cake #1. We will put it in the oven (`await put_in_oven()`) and then check the TODO list for what we should do next. There's nothing for us to do with cake #1, so our TODO list looks like: 1. Continue mixing cake #2 2. Continue mixing cake #3 Eventually, the oven will finish baking cake #1 and will add its own item to the TODO list: 1. Continue mixing cake #2 2. Continue mixing cake #3 3. Cake #1 is ready When we take a break from mixing cake #2, we will continue mixing cake #3 (again, depending on your event loop's policy with regards to prioritisation). When we take a break from mixing cake #3, "Cake #1 is ready" will be the top of our TODO list and so we will continue with the statement following where we awaited it (it probably looked like `await put_in_oven(); remove_from_oven()` or maybe `baked_cake = await put_in_oven(mixed_ingredients)`). Eventually our TODO list will be empty, and so we will sit there waiting for something to appear on it (such as another incoming request, or an oven adding a "remove cake" item). Processes and threads only really enter into asyncio as a "thing that can post messages back to my TODO list/event loop", while asyncio provides an efficient mechanism for interleaving (not parallelising) multiple tasks throughout an entire application (or a very significant self-contained piece of it). The parallelism only comes when all the main thread has to do for a particular task is wait, because another thread/process/service/device/etc. is doing the actual work. Hopefully that helps clear things up for some people. No example is perfect for everyone, ultimately, so the more we put out there the more likely Cheers, Steve From ben+python at benfinney.id.au Sat Jul 11 05:10:39 2015 From: ben+python at benfinney.id.au (Ben Finney) Date: Sat, 11 Jul 2015 13:10:39 +1000 Subject: [Python-ideas] Concurrency HOWTO for Python (was: Concurrency Modules) References: <559EFB73.5050606@mail.de> <55A031CF.9050002@mail.de> Message-ID: <854mlbxuc0.fsf_-_@benfinney.id.au> Steve Dower writes: > I'm going to dive into an analogy here. Hopefully it holds up better > than most... > [?] > > * You can do a synchronous wait, by sitting and staring at the oven > until it's done. > > * You can poll, by occasionally interrupting yourself to walk over to > the oven and see if it's done yet. > > * You can use a signal/interrupt, where the oven is going to make some > noise and interrupt you when you're ready (but note: you know that the > oven is done without having to walk over and check it). > > * Or you can use asyncio, where you occasionally interrupt yourself > and, when you do, the oven will make some noise if it has finished. > (and if you never interrupt yourself, the oven never makes a sound) > [?] > Hopefully that helps clear things up for some people. No example is > perfect for everyone, ultimately, so the more we put out there the > more likely Thank you, Steve! That is the clearest explanation of different concurrency models I've ever read. I now feel I have a firm understanding of how they're different and their relative merits. I hope that analogy can be worked into a putative ?Concurrency HOWTO? at . -- \ ?If consumers even know there's a DRM, what it is, and how it | `\ works, we've already failed.? ?Peter Lee, Disney corporation, | _o__) 2005 | Ben Finney From ncoghlan at gmail.com Sat Jul 11 06:33:24 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 11 Jul 2015 14:33:24 +1000 Subject: [Python-ideas] Learning from the shell in supporting asyncio background calls In-Reply-To: References: Message-ID: On 10 July 2015 at 21:48, Oscar Benjamin wrote: > Why is that better than something like: > > data1, data2 = asyncio.run([future1, future2]) > > IIUC your proposal is that run_in_background adds the tasks to an > implicit global variable. It just schedules them normally using asyncio.get_event_loop().create_task() (or run_in_executor if you pass in a callable). For folks that don't want to click through to the blog post (which has a full worked example using asynchronous timers), the full implementations of the two functions (with a slight tweak to run_in_background to make the executor configurable as well) are: def run_in_background(target, *, loop=None, executor=None): if loop is None: loop = asyncio.get_event_loop() try: return asyncio.ensure_future(target, loop=loop) except TypeError: pass if callable(target): return loop.run_in_executor(executor, target) raise TypeError("background task must be future, coroutine or " "callable, not {!r}".format(type(target))) def run_in_foreground(task, *, loop=None): if loop is None: loop = asyncio.get_event_loop() return loop.run_until_complete(asyncio.ensure_future(task)) > Then the first call to run_in_foreground > runs both tasks returning when future1 is ready. At that point it > suspends future2 if incomplete? Then the second call to > run_in_foreground returns immediately if future2 is ready or otherwise > runs that task until complete? No, it's all driven by the main asyncio event loop - the suggested functions are relatively thin shims designed to let people make effective use of asyncio with just the POSIX shell foreground & background task mental model, rather than having to learn how asyncio *really* works in order to benefit from it. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From gokoproject at gmail.com Sat Jul 11 06:49:18 2015 From: gokoproject at gmail.com (John Wong) Date: Sat, 11 Jul 2015 00:49:18 -0400 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: <105918D6-6C15-44D5-915C-5825C56922EE@gmail.com> References: <105918D6-6C15-44D5-915C-5825C56922EE@gmail.com> Message-ID: Hi. Steven D'Aprano writes: > Personally, I don't think of "self" as pollution at all. It's nice and > explicit and helps readability. + 1 as well I am definitely in the conservative camp (I don't know who else but I am). Let's stick to Zen of Python. * I see no real use case why I would write self x,y instead of self.x = 1, self.y = 2. This reminds me of Javascript var x,y. * While this is not exactly like Javascript's "this", I can see the resemblance, and no thanks. * This change will be confusing to everyone reading Python code. We have to stick to real readability, not some rare . We should not introduce this kind of change to say "we can do X in two ways, and most people will write in the old convention ANYWAY but be hold there is a second geeky way to do the same thing." Let's reserve the idea that self x where now x is scoped to self .... reserve this idea to just nonlocal and global - I personally haven't found a use case for nonlocal and global in my own work but that's just my opinion. Thanks. John On Fri, Jul 10, 2015 at 10:42 PM, Ryan Gonzalez wrote: > -1e200. > > I have to deal with implicit 'this' enough in C++ at the risk of writing > "bad style" or something like that. It has bitten before. Until I started > following C++ style guides, I always put 'this->' in front of every > variable but still had to deal with the accidental member reference when > you meant something else. > > I also like the fact that Python lets you name constructor arguments the > same way as members, e.g.: > > class X: > def __init__(self, x, y): > self.x = x > self.y = y > > You can do the same in C++, but compilers with -Wall will complain unless > you do this: > > class X { > public: > X(int x_, int y_): x{x_}, y{y_} {} > ... > } > > which is just ugly. > > > On July 10, 2015 5:31:28 PM CDT, Michael Hewitt > wrote: > >> Last night I made a post to the neopythonic blog proposing a Python 3.x >> feature that Guido asked me to forward to this alias. For the full >> background, see the link to my post below. For brevity, I will simply >> submit the proposal here. The specific problem I am addressing is the >> pollution of Python methods by 'self.' to reference fields. Here is the >> proposal: >> >> The name of the first parameter to a method can be used to scope >> subsequent variable references similar to the behavior of 'global'. >> >> >> Here are some examples: >> >> class Foo: >> >> def method_a(self) >> >> self x # subsequent 'x' references are scoped to 'self' >> >> x = 5 # same as self.x = 5 >> >> def method_b(this) >> >> this x, y # subsequent 'x' & 'y' refs are scoped to 'this' >> >> x = y # same as this.x = this.y >> >> def method_c(its) >> >> its.x = 5 # still works just like it used to >> >> >> This suggestion is fully backward compatible with existing Python code, >> but would eliminate the need to pollute future Python methods with copious >> 'self.' prefixes, thereby improving both readability and maintainabilty. >> >> Thank you for your consideration. >> >> Michael Hewitt >> >> Original Post: >> http://neopythonic.blogspot.com/2008/10/why-explicit-self-has-to-stay.html >> >> ------------------------------ >> >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> > -- > Sent from my Android device with K-9 Mail. Please excuse my brevity. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Jul 11 07:04:22 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 11 Jul 2015 15:04:22 +1000 Subject: [Python-ideas] Learning from the shell in supporting asyncio background calls In-Reply-To: References: Message-ID: On 10 July 2015 at 21:51, Guido van Rossum wrote: > As I wrote on the issue, I'm -1 on this proposal. Not only does this API > encourage beginners to ignore the essential difference between synchronous > functions meant to run in a thread (using synchronous I/O and pre-emptive > CPU scheduling) and asyncio coroutines/tasks (which use overlapped I/O and > require explicit scheduling), it also encourages avoiding the "await" > primitive (formerly "yield from") in favor of a function call which cannot > be used from within a coroutine/task. My apologies for the confusion - the revised proposal focuses on coroutines, not threads. With the benefit of hindight, leaving the implementation details out of the python-ideas post was clearly a mistake, as my previous posts had been more focused on threads. I've added the full implementation details in my reply to Oscar, which will hopefully make the revised proposal clearer. The blog post goes into detail on this - it specifically takes a synchronous function, replaces it with an asynchronous coroutine using the await syntax, and then uses run_in_background() to manipulate the asynchronous version from the REPL. The main operation I use with "run_in_foreground" in the post is actually asyncio.sleep, as "run_in_foreground(asyncio.sleep(0))" was the simplest way I found to single step the event loop, and it also allows you to trivially say "run the event loop for 5 seconds", etc. Concatenating some of the example code from the post together gives this demonstration of the basic UX: >>> async def ticker(): ... for i in itertools.count(): ... print(i) ... await asyncio.sleep(1) ... >>> ticker1 = run_in_background(ticker()) >>> ticker1 :1>> >>> run_in_foreground(asyncio.sleep(5)) 0 1 2 3 4 If there isn't a coroutine currently running in the foreground, then background coroutines don't run either. All of the currently running tasks can be interrogated through the existing asyncio.Task.all_tasks() class method. > This particular spelling moreover introduces a "similarity" between > foreground and background tasks that doesn't actually exist. The concept behind the revised proposal is layering the simpler foreground/background task representational model on top of the full complexity of the asyncio implementation model. The "run_in_foreground" naming is technically a lie - what actually gets run in the foreground is the current thread's event loop. However, I think it's an acceptable and useful lie, as what it does is run the event loop in the current thread until the supplied future produces a result, which means the current thread isn't going to be doing anything other than running the event loop until the specified operation is completed. This approach *doesn't* expose the full power of asyncio and native coroutines, but it exposes a lot of it, and it should be relatively easy to grasp for anyone that's already familiar with background processes in POSIX shell environments. > The example suggests that this should really be a pair of convenience > functions in collections.futures, as it does not make any use of asyncio. While that was true of the previous proposal (which always used the executor), this new proposal only falls back to using run_in_executor if asyncio.ensure_future fails with TypeError and the supplied background task target is a callable. Regards, Nick. P.S. If anyone reading this isn't already familiar with the concept of representational models vs implementation models, then I highly recommend http://www.uxpassion.com/blog/implementation-mental-representation-models-ux-user-experience/ as a good introduction to the idea -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From njs at pobox.com Sat Jul 11 07:16:04 2015 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 11 Jul 2015 00:16:04 -0500 Subject: [Python-ideas] Learning from the shell in supporting asyncio background calls In-Reply-To: References: Message-ID: On Jul 11, 2015 12:04 AM, "Nick Coghlan" wrote: > [...] > > Concatenating some of the example code from the post together gives > this demonstration of the basic UX: > > >>> async def ticker(): > ... for i in itertools.count(): > ... print(i) > ... await asyncio.sleep(1) > ... > >>> ticker1 = run_in_background(ticker()) > >>> ticker1 > :1>> > >>> run_in_foreground(asyncio.sleep(5)) > 0 > 1 > 2 > 3 > 4 > > If there isn't a coroutine currently running in the foreground, then > background coroutines don't run either. All of the currently running > tasks can be interrogated through the existing > asyncio.Task.all_tasks() class method. For what it's worth, I find it extraordinarily confusing that background tasks don't run in the background and foreground tasks don't run in the foreground. The functionality doesn't strike me as obviously unreasonable, it just doesn't at all match my expectations from encountering these words in the shell context. Maybe start_suspended(...) and unsuspend_all_until(task)? -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Jul 11 09:04:09 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 11 Jul 2015 17:04:09 +1000 Subject: [Python-ideas] Learning from the shell in supporting asyncio background calls In-Reply-To: References: Message-ID: On 11 July 2015 at 15:04, Nick Coghlan wrote: > On 10 July 2015 at 21:51, Guido van Rossum wrote: >> As I wrote on the issue, I'm -1 on this proposal. Not only does this API >> encourage beginners to ignore the essential difference between synchronous >> functions meant to run in a thread (using synchronous I/O and pre-emptive >> CPU scheduling) and asyncio coroutines/tasks (which use overlapped I/O and >> require explicit scheduling), it also encourages avoiding the "await" >> primitive (formerly "yield from") in favor of a function call which cannot >> be used from within a coroutine/task. > > My apologies for the confusion - the revised proposal focuses on > coroutines, not threads. With the benefit of hindight, leaving the > implementation details out of the python-ideas post was clearly a > mistake, as my previous posts had been more focused on threads. I've > added the full implementation details in my reply to Oscar, which will > hopefully make the revised proposal clearer. I wrote a second post about this foreground/background task idea, which presents a hopefully more compelling example: setting up two TCP echo servers from the interactive prompt, and then interacting with them using asynchronous clients, including an example using run_in_background, run_in_foreground and asyncio.wait to run parallel client commands. This is all done using the main thread in the REPL, and takes advantage of the fact that it's all running in the same thread to dynamically allocate the server ports and pass that information to the demonstration clients. I believe this particular example effectively demonstrates the power of asyncio to dramatically simplify the testing of both network clients and network servers, as you don't need to mess about with synchronising across threads or processes. The full post is at http://www.curiousefficiency.org/posts/2015/07/asyncio-tcp-echo-server.html, but I'll include the examples of usage inline. The code for setting up the servers and retrieving their chosen ports looks like: >>> make_server = asyncio.start_server(handle_tcp_echo, '127.0.0.1') >>> server = run_in_foreground(make_server) >>> port = server.sockets[0].getsockname()[1] >>> make_server2 = asyncio.start_server(handle_tcp_echo, '127.0.0.1') >>> server2 = run_in_foreground(make_server2) >>> port2 = server2.sockets[0].getsockname()[1] This is an effectively synchronous operation, so it could be readily encapsulated in a normal function call for invocation from a test suite to set up local test servers, and report the port number to connect to. The code for running parallel clients looks like: >>> echo1 = run_in_background(tcp_echo_client('Hello World!', port)) >>> echo2 = run_in_background(tcp_echo_client('Hello World!', port2)) >>> run_in_foreground(asyncio.wait([echo1, echo2])) >>> echo1.result() 'Hello World!' >>> echo2.result() 'Hello World!' While I don't go into it in the post, blocking clients could also be tested in much the same way, by using run_in_background's callable support to run them as call-and-response operations through the default executor, while running the asynchronous server components in the main thread. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From michael at hewitts.us Sat Jul 11 09:25:39 2015 From: michael at hewitts.us (Michael Hewitt) Date: Sat, 11 Jul 2015 00:25:39 -0700 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: References: Message-ID: Let's compare two versions of a method taken from some code that my 11 year old son wrote yesterday: *Current* global height def keep_moving_gravity(self): self.y += self.gravity self.y = max(self.y, 0) self.y = min(self.y, height - 1) *Proposed* global height def keep_moving_gravity(self): self y, gravity y += gravity y = max(y, 0) y = min(y, height - 1) Is anyone actually going to argue that the first version is cleaner and more readable than the second? All I see when I read the first is 'self', 'self', 'self' -- my son's exact words to me last night. As far as maintainability, the author of the first version must repeatedly make the same decisions over and over regarding how to scope each variable reference as he/she types 'y', 'gravity', and 'height'. The author of the second code makes these decisions exactly once at the top of the method and then is free to refer to each variable naturally without the mental overhead of prefixing each 'y' and 'gravity' with 'self.', but God forbid - not 'height'. I can tell you that the mental overhead of this is taxing my son & is the cause of many painful mistakes -- just forgetting a single 'self.' prefix on one of the above field references can waste a significant amount of time. As far as static analysis tools, this should honestly not be a lot of extra work, since the tools must already handle 'global' in a very similar fashion. If the 'self.' prefix really does make code clearer, then we should do away with the 'global' scope declaration as well as automatic local scoping and require prefixing of all Python variables with 'self.', 'global.' or 'local.'. My mind becomes numb thinking about writing such code. To me, the existence of the keyword 'global' for automatically scoping subsequent variable references is a strong argument that a similar 'self' scoping mechanism is called for as well. And, for folks who still prefer to prefix all their field references with 'self.', the proposal in no way prevents them from doing so. It merely allows the rest of us to be a bit less wordy and more pithy in our code. Mike On Friday, July 10, 2015, Mark Lawrence wrote: > On 10/07/2015 23:31, Michael Hewitt wrote: > >> Last night I made a post to the neopythonic blog proposing a Python 3.x >> feature that Guido asked me to forward to this alias. For the full >> background, see the link to my post below. For brevity, I will simply >> submit the proposal here. The specific problem I am addressing is the >> pollution of Python methods by 'self.' to reference fields. Here is the >> proposal: >> >> The name of the first parameter to a method can be used to scope >> subsequent variable references similar to the behavior of 'global'. >> >> >> Here are some examples: >> >> class Foo: >> >> def method_a(self) >> >> self x # subsequent 'x' references are scoped to 'self' >> >> x = 5 # same as self.x = 5 >> >> def method_b(this) >> >> this x, y # subsequent 'x' & 'y' refs are scoped to 'this' >> >> x = y # same as this.x = this.y >> >> def method_c(its) >> >> its.x = 5 # still works just like it used to >> >> >> This suggestion is fully backward compatible with existing Python code, >> but would eliminate the need to pollute future Python methods with >> copious 'self.' prefixes, thereby improving both readability and >> maintainabilty. >> > > I disagree completely. When I see:- > > self.x = 1 > > I currently know exactly what I'm looking at. All I see with this > proposal is more work for my MKI eyeballs, which are already knackered, and > more work for the people who maintain our static analysis tools, as you can > still forget to properly scope your variables. So -1. > > >> Thank you for your consideration. >> >> Michael Hewitt >> >> Original Post: >> http://neopythonic.blogspot.com/2008/10/why-explicit-self-has-to-stay.html >> >> > > > -- > My fellow Pythonistas, ask not what our language can do for you, ask > what you can do for our language. > > Mark Lawrence > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sat Jul 11 09:36:32 2015 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 11 Jul 2015 17:36:32 +1000 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: References: Message-ID: On Sat, Jul 11, 2015 at 5:25 PM, Michael Hewitt wrote: > Let's compare two versions of a method taken from some code that my 11 year > old son wrote yesterday: > > Current > > global height > def keep_moving_gravity(self): > self.y += self.gravity > self.y = max(self.y, 0) > self.y = min(self.y, height - 1) > > > Proposed > > global height > def keep_moving_gravity(self): > self y, gravity > y += gravity > y = max(y, 0) > y = min(y, height - 1) Alternate proposal: # You could in-line this if you want to. def limit(low, n, high): """Limit a number to be within certain bounds""" return min(low, max(n, high)) def keep_moving_gravity(self): self.y = limit(0, self.y + self.gravity, height - 1) There are precisely two references to self.y (one reading, one writing), and one reference to self.gravity. The global identifier height needs no declaration, because it's not being assigned to. Instead of doing three operations (increment, then check against zero, then check against height), it simply does one: a bounded increment by gravity. Bad code is not itself an argument for a language change. Sometimes there's an even better alternative :) ChrisA From michael at hewitts.us Sat Jul 11 09:49:37 2015 From: michael at hewitts.us (Michael Hewitt) Date: Sat, 11 Jul 2015 00:49:37 -0700 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: References: Message-ID: Let me state my argument more succinctly. If 'self.' is a good thing, then *all* Python variable references should always be explicitly prefixed by their scope -- 'global.', 'local.', and 'self.'. Why does it make more sense to not require explicit scoping of global variables, which are far more dangerous, while requiring explicit class scoping when nothing is more natural than for a method to refer to variables within its own class? This is totally backwards. If anything, the opposite should be true. 'global.' should be required for *all* global references, of which there should be very few in well-written, modular code, and 'self.' should not be required for class references because of course methods are going to naturally refer to variables within their class. Keep in mind that the proposed design explicitly specifies the class variable scoping at the top of each method, so that anyone reading the method can simply look at the top of the method to find out which variables are class variables. On Sat, Jul 11, 2015 at 12:25 AM, Michael Hewitt wrote: > Let's compare two versions of a method taken from some code that my 11 > year old son wrote yesterday: > > *Current* > > global height > def keep_moving_gravity(self): > self.y += self.gravity > self.y = max(self.y, 0) > self.y = min(self.y, height - 1) > > > *Proposed* > > global height > def keep_moving_gravity(self): > self y, gravity > y += gravity > y = max(y, 0) > y = min(y, height - 1) > > > Is anyone actually going to argue that the first version is cleaner and > more readable than the second? All I see when I read the first is 'self', > 'self', 'self' -- my son's exact words to me last night. > > As far as maintainability, the author of the first version must repeatedly > make the same decisions over and over regarding how to scope each variable > reference as he/she types 'y', 'gravity', and 'height'. The author of the > second code makes these decisions exactly once at the top of the method and > then is free to refer to each variable naturally without the mental > overhead of prefixing each 'y' and 'gravity' with 'self.', but God forbid - > not 'height'. I can tell you that the mental overhead of this is taxing my > son & is the cause of many painful mistakes -- just forgetting a single > 'self.' prefix on one of the above field references can waste a significant > amount of time. > > As far as static analysis tools, this should honestly not be a lot of > extra work, since the tools must already handle 'global' in a very similar > fashion. > > If the 'self.' prefix really does make code clearer, then we should do > away with the 'global' scope declaration as well as automatic local scoping > and require prefixing of all Python variables with 'self.', 'global.' or > 'local.'. My mind becomes numb thinking about writing such code. To me, > the existence of the keyword 'global' for automatically scoping subsequent > variable references is a strong argument that a similar 'self' scoping > mechanism is called for as well. > > And, for folks who still prefer to prefix all their field references with > 'self.', the proposal in no way prevents them from doing so. It merely > allows the rest of us to be a bit less wordy and more pithy in our code. > > Mike > > > On Friday, July 10, 2015, Mark Lawrence wrote: > >> On 10/07/2015 23:31, Michael Hewitt wrote: >> >>> Last night I made a post to the neopythonic blog proposing a Python 3.x >>> feature that Guido asked me to forward to this alias. For the full >>> background, see the link to my post below. For brevity, I will simply >>> submit the proposal here. The specific problem I am addressing is the >>> pollution of Python methods by 'self.' to reference fields. Here is the >>> proposal: >>> >>> The name of the first parameter to a method can be used to scope >>> subsequent variable references similar to the behavior of 'global'. >>> >>> >>> Here are some examples: >>> >>> class Foo: >>> >>> def method_a(self) >>> >>> self x # subsequent 'x' references are scoped to 'self' >>> >>> x = 5 # same as self.x = 5 >>> >>> def method_b(this) >>> >>> this x, y # subsequent 'x' & 'y' refs are scoped to 'this' >>> >>> x = y # same as this.x = this.y >>> >>> def method_c(its) >>> >>> its.x = 5 # still works just like it used to >>> >>> >>> This suggestion is fully backward compatible with existing Python code, >>> but would eliminate the need to pollute future Python methods with >>> copious 'self.' prefixes, thereby improving both readability and >>> maintainabilty. >>> >> >> I disagree completely. When I see:- >> >> self.x = 1 >> >> I currently know exactly what I'm looking at. All I see with this >> proposal is more work for my MKI eyeballs, which are already knackered, and >> more work for the people who maintain our static analysis tools, as you can >> still forget to properly scope your variables. So -1. >> >> >>> Thank you for your consideration. >>> >>> Michael Hewitt >>> >>> Original Post: >>> >>> http://neopythonic.blogspot.com/2008/10/why-explicit-self-has-to-stay.html >>> >>> >> >> >> -- >> My fellow Pythonistas, ask not what our language can do for you, ask >> what you can do for our language. >> >> Mark Lawrence >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sat Jul 11 10:02:42 2015 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 11 Jul 2015 18:02:42 +1000 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: References: Message-ID: On Sat, Jul 11, 2015 at 5:49 PM, Michael Hewitt wrote: > This is totally backwards. If anything, the opposite should be true. > 'global.' should be required for *all* global references, of which there > should be very few in well-written, modular code, and 'self.' should not be > required for class references because of course methods are going to > naturally refer to variables within their class. How many times do you call on standard library functions? Those are almost always global references. Either you're looking for something directly from the builtins (eg len, max, range), or you import a module at the top of your file and then use that (eg math.sin, logging.info), in which case the module reference is itself a global of your own module. So, no. Globals are extremely common - just not often reassigned. ChrisA From ncoghlan at gmail.com Sat Jul 11 10:13:58 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 11 Jul 2015 18:13:58 +1000 Subject: [Python-ideas] Concurrency Modules In-Reply-To: References: <559EFB73.5050606@mail.de> <55A031CF.9050002@mail.de> Message-ID: On 11 July 2015 at 09:42, Steve Dower wrote: > Processes and threads only really enter into asyncio as a "thing that can post messages back to my TODO list/event loop", while asyncio provides an efficient mechanism for interleaving (not parallelising) multiple tasks throughout an entire application (or a very significant self-contained piece of it). The parallelism only comes when all the main thread has to do for a particular task is wait, because another thread/process/service/device/etc. is doing the actual work. > > Hopefully that helps clear things up for some people. No example is perfect for everyone, ultimately, so the more we put out there the more likely I really like this example, so I had a go at expressing it in the foreground/background terms I use in http://www.curiousefficiency.org/posts/2015/07/asyncio-tcp-echo-server.html For folks that already know asyncio, the relevant semantic details of that proposal for this example are: run_in_foreground -> convenience wrapper for run_until_complete run_in_background(coroutine) -> convenience wrapper for ensure_future run_in_background(callable) -> convenience wrapper for run_in_executor I quite like the end result: # We'll need the concept of an oven class Oven: # There's a shared pool of ovens we can use @classmethod async def get_oven(): ... # An oven can only have one set of current settings def configure(self, recipe): ... # An oven can only cook one thing at a time def bake(self, mixture): ... # We stay focused on this task def gather_ingredients(recipe): ... return ingredients # Helper to indicate readiness to switch tasks def survey_the_kitchen(): return asyncio.sleep(0) # This task may be interleaved with other activities async def mix_ingredients(recipe, ingredients): mixture = CakeMixture(recipe) for ingredient in ingredients: mixture.add(ingredient) await survey_the_kitchen() return mixture # This task may be interleaved with other activities async def make_cake(recipe): # First, we gather and start mixing the ingredients ingredients = gather_ingredients(recipe) mixture = await mix_ingredients(recipe, ingredients) # We wait for a free oven, then configure it for our recipe oven = await Oven.get_oven() oven.configure(recipe) # Baking is synchronous for the *oven*, but *we* don't # want to sit around waiting for it the entire time bake_cake = functools.partial(oven.bake, mixture) return await run_in_background(bake_cake) # We have three cakes to make make_sponge = make_cake("sponge") make_madeira = make_cake("madeira") make_chocolate = make_cake("chocolate") # Which we'll try to do concurrently run_in_foreground(asyncio.wait([make_sponge, make_madeira, make_chocolate])) sponge_cake = make_sponge.result() madeira_cake = make_madeira.result() chocalate_cake = make_chocolate.result() Now, to upgrade this to full event driven programming: imagine you're modeling a professional bakery, accepting cake orders from customers. Then you would need to define a server process that turns orders from customers into cake making requests, and completed cake notifications into delivery orders, and your main thread becomes devoted to running that server, rather than specifying a pre-selected set of cakes to make. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ben+python at benfinney.id.au Sat Jul 11 10:33:19 2015 From: ben+python at benfinney.id.au (Ben Finney) Date: Sat, 11 Jul 2015 18:33:19 +1000 Subject: [Python-ideas] Mitigating 'self.' Method Pollution References: Message-ID: <85zj33w0ts.fsf@benfinney.id.au> Michael Hewitt writes: > Is anyone actually going to argue that the first version is cleaner > and more readable than the second? All I see when I read the first is > 'self', 'self', 'self' -- my son's exact words to me last night. That's a pity. You can choose a different word if you like. > As far as maintainability, the author of the first version must > repeatedly make the same decisions over and over regarding how to > scope each variable reference as he/she types 'y', 'gravity', and > 'height'. Another way to say that is that the author must *explicitly communicate*, to the reader, the namespace of each name within the function scope. don't make the mistake of attempting to optimise time spent typing in code, versus time spent comprehending the code as a reader. The latter far outweighs the former, and is the preferred target for optimisation. > I can tell you that the mental overhead of [writing] this is taxing my > son & is the cause of many painful mistakes -- just forgetting a > single 'self.' prefix on one of the above field references can waste a > significant amount of time. With respect, I doubt that the time spent on that is anywhere near the time which would be spent trying to diagnose bugs caused by implicit namespaces. You're aware of the cost in writing explicit names, because you're paying it now. You are, I testify from painful experience in languages other than Python, saving a much greater amount of time in hunting bugs caused by ambiguously-written code. > If the 'self.' prefix really does make code clearer, then we should do > away with the 'global' scope declaration as well as automatic local > scoping and require prefixing of all Python variables with 'self.', > 'global.' or 'local.'. My mind becomes numb thinking about writing > such code. I share your abhorrence, that would be an unwarranted step. Who is advocating taking that step? No-one in this thread that I can see. If someone advocates ?require prefixing of all Python [names] with ?self.?, ?global.?, or ?local.?? within my earshot, I'll join you in criticising the foolishness of such an idea. > To me, the existence of the keyword 'global' for automatically scoping > subsequent variable references is a strong argument that a similar > 'self' scoping mechanism is called for as well. You haven't presented anything compelling in support of that. The latter doesn't follow at all from the former. > And, for folks who still prefer to prefix all their field references > with 'self.', the proposal in no way prevents them from doing so. It > merely allows the rest of us to be a bit less wordy and more pithy in > our code. Python requires explicit declaration of the namespace for names. That protects me from ambiguities that would otherwise be very common in other people's code. I appreciate that and do not want it threatened. -- \ ?I distrust those people who know so well what God wants them | `\ to do to their fellows, because it always coincides with their | _o__) own desires.? ?Susan Brownell Anthony, 1896 | Ben Finney From rosuav at gmail.com Sat Jul 11 10:38:34 2015 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 11 Jul 2015 18:38:34 +1000 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: <85zj33w0ts.fsf@benfinney.id.au> References: <85zj33w0ts.fsf@benfinney.id.au> Message-ID: On Sat, Jul 11, 2015 at 6:33 PM, Ben Finney wrote: >> And, for folks who still prefer to prefix all their field references >> with 'self.', the proposal in no way prevents them from doing so. It >> merely allows the rest of us to be a bit less wordy and more pithy in >> our code. > > Python requires explicit declaration of the namespace for names. That > protects me from ambiguities that would otherwise be very common in > other people's code. I appreciate that and do not want it threatened. Not quite true; Python requires explicit declaration of names when they're bound to, but is quite happy to do a scope search (local, nonlocal, global, builtin) for references. But the rules are fairly simple. Aside from the possibility that someone imports your module and sets module.len to shadow a builtin, everything can be worked out lexically by looking for the assignments. ChrisA From ncoghlan at gmail.com Sat Jul 11 10:40:04 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 11 Jul 2015 18:40:04 +1000 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: References: Message-ID: On 11 July 2015 at 17:25, Michael Hewitt wrote: > Let's compare two versions of a method taken from some code that my 11 year > old son wrote yesterday: > > Current > > global height > def keep_moving_gravity(self): > self.y += self.gravity > self.y = max(self.y, 0) > self.y = min(self.y, height - 1) > > Proposed > > global height > def keep_moving_gravity(self): > self y, gravity > y += gravity > y = max(y, 0) > y = min(y, height - 1) Alternative proposal: def keep_moving_gravity(self): .y += .gravity .y = max(.y, 0) .y = min(.y, height - 1) The main objection folks have to the declarative proposal is the fact that there's nothing obvious at the point of reference to indicate whether we're manipulating a local variable or an instance variable. I can speak from experience in saying that there's a reason the "m_*" prefix notation for C++ member variables is popular: it makes C++ method implementations much easier to read when you can tell at a glance if a line is working with an instance member or a local variable. With the language not providing that separation by default, folks added it by convention. An "implied attribute reference" proposal would be different: it could build on the same mechanism that powers zero-argument super to make it possible to say "if I reference an object attribute in a function without saying which object I'm referring to, then I mean the first parameter". This should work cleanly for standalone functions, class methods, instance methods, etc, as the compiler already keeps track of the necessary details in order to implement PEP 3135 (https://www.python.org/dev/peps/pep-3135/#specification) The main downside is that a leading dot isn't as good a visual indicator as a dot appearing between two other characters. That could potentially be alleviated with a double-dot notation: def keep_moving_gravity(self): ..y += ..gravity ..y = max(..y, 0) ..y = min(..y, height - 1) The downside of *that* is it might make people start thinking in terms stepping up scopes, rather than referring to the first parameter. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From breamoreboy at yahoo.co.uk Sat Jul 11 11:03:50 2015 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Sat, 11 Jul 2015 10:03:50 +0100 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: References: Message-ID: On 11/07/2015 08:25, Michael Hewitt wrote: > Let's compare two versions of a method taken from some code that my 11 > year old son wrote yesterday: > > *_Current_* > > global height > def keep_moving_gravity(self): > self.y += self.gravity > self.y = max(self.y, 0) > self.y = min(self.y, height - 1) > > > *_Proposed_* > > global height > def keep_moving_gravity(self): > self y, gravity > y += gravity > y = max(y, 0) > y = min(y, height - 1) > > > Is anyone actually going to argue that the first version is cleaner and > more readable than the second? Yes, as I already have. > > As far as maintainability, the author of the first version must > repeatedly make the same decisions over and over regarding how to scope > each variable reference as he/she types 'y', 'gravity', and 'height'. > The author of the second code makes these decisions exactly once at the > top of the method and then is free to refer to each variable naturally > without the mental overhead of prefixing each 'y' and 'gravity' with > 'self.', but God forbid - not 'height'. I can tell you that the mental > overhead of this is taxing my son & is the cause of many painful > mistakes -- just forgetting a single 'self.' prefix on one of the above > field references can waste a significant amount of time. > Wrong, as Ben Finney has already pointed out. You'll also have to wade up and down any large methods to ensure that you distinguish local variables from instance variables. Frankly I see nothing in this proposal at all, so I'm moving from -1 to the -1000 somebody else gave it in the early hours of the mornig, I'm sorry but I forget who that was. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From ncoghlan at gmail.com Sat Jul 11 11:12:22 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 11 Jul 2015 19:12:22 +1000 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: <85zj33w0ts.fsf@benfinney.id.au> References: <85zj33w0ts.fsf@benfinney.id.au> Message-ID: On 11 July 2015 at 18:33, Ben Finney wrote: > Michael Hewitt writes: > >> Is anyone actually going to argue that the first version is cleaner >> and more readable than the second? All I see when I read the first is >> 'self', 'self', 'self' -- my son's exact words to me last night. > > That's a pity. You can choose a different word if you like. > >> As far as maintainability, the author of the first version must >> repeatedly make the same decisions over and over regarding how to >> scope each variable reference as he/she types 'y', 'gravity', and >> 'height'. > > Another way to say that is that the author must *explicitly > communicate*, to the reader, the namespace of each name within the > function scope. > > don't make the mistake of attempting to optimise time spent typing in > code, versus time spent comprehending the code as a reader. The latter > far outweighs the former, and is the preferred target for optimisation. That's not the way I read Michael's proposal. I read his proposal as aiming to cut down on visual *noise* when *reading*. It's worthwhile to compare the different examples in terms of "characters conveying useful information": def keep_moving_gravity(self): self.y += self.gravity self.y = max(self.y, 0) self.y = min(self.y, height - 1) Here, self is repeated 7 times: 28 characters. Discounting indentation and line breaks, the whole function is only 96 characters long, so almost a third of the function definition consists of the word "self". With Michael's proposal, that changes to: def keep_moving_gravity(self): self y, gravity y += gravity y = max(y, 0) y = min(y, height - 1) Total character length is now 79 characters, with 15 characters devoted to the "self y, gravity" declarative element. That's a worthy reduction in visual scoping noise (to less than 20%), but it's come at the cost of losing actual signal: there's no longer a local marker on the references to "y" and "gravity" to say they're instance variables rather than local variables. That's also still worse than the best current code can already do (without breaking the "self" convention) which is to assign a frequently accessed object to a single character variable name: def keep_moving_gravity(self): s = self s.y += s.gravity s.y = max(s.y, 0) s.y = min(s.y, height - 1) That keeps all of the relevant attribute lookup signal, and is only 14 characters of scoping noise (8 for the "s = self", 6 for the single character object references) out of 86 total characters. Finally, an "implied reference to first parameter" for attribute lookups inside functions offers the most minimalist proposal I can think of that would still be easy to read from a semantic perspective: def keep_moving_gravity(self): .y += .gravity .y = max(.y, 0) .y = min(.y, height - 1) We've reduced the function length to only 72 characters, and it's 100% signal. The only thing we've lost relative to the original is the 6 explicit references to "self", which are now implied by looking up an attribute without specifying a source object. If we *did* do something like that last example (which I'm not convinced is a good idea, given that the "n = name" trick works for *any* frequently referenced variable, whether inside a function or not), we'd need to think seriously about the implications for implicit scopes, like those created for generator expressions and container comprehensions. At the moment, their first argument is the outermost iterable, which would become accessible within the expression body if there was a syntax permitting implied references for attribute lookups. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ben+python at benfinney.id.au Sat Jul 11 11:24:32 2015 From: ben+python at benfinney.id.au (Ben Finney) Date: Sat, 11 Jul 2015 19:24:32 +1000 Subject: [Python-ideas] Mitigating 'self.' Method Pollution References: <85zj33w0ts.fsf@benfinney.id.au> Message-ID: <85vbdrvygf.fsf@benfinney.id.au> Nick Coghlan writes: > I read [Michael Hewitt's] proposal as aiming to cut down on visual > *noise* when *reading*. Okay. In response to that, then: Michael appears to want most of those references to entirely omit the explicit namespace. The desire to entirely remove explicit namespaces to distinguish names in a function, I don't sympatise with. Those explicit namespaces make the code much more readable. I don't know what Michael means by ?clean?; if it is a simple ?fewer characters?, that's not strongly correlated with easier-to-read code. Especially because readable code *also* requires clarity and rapid comprehension ? including disambiguation ? of the writer's intent. For those purposes, many times the better solution involves writing significantly *more* characters than the strict minimum. So arguments merely leaning on ?fewer characters? don't impress me, and I hope they don't impress many others. As for simply choosing a shorter name: > It's worthwhile to compare the different examples in terms of > "characters conveying useful information": Given that ?self?, while merely a convention, is now such a strong convention that it is hard-coded into many tools, I argue that ?self? conveys much *more* information than a different shorter name (such as ?s?). The latter will tend to require rather more effort to comprehend the writer's intent, and to that extent is less readable. -- \ ?The right to use [strong cryptography] is the right to speak | `\ Navajo.? ?Eben Moglen | _o__) | Ben Finney From mal at egenix.com Sat Jul 11 11:54:53 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Sat, 11 Jul 2015 11:54:53 +0200 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: References: Message-ID: <55A0E7ED.8070209@egenix.com> On 11.07.2015 00:31, Michael Hewitt wrote: > Last night I made a post to the neopythonic blog proposing a Python 3.x feature that Guido asked me > to forward to this alias. For the full background, see the link to my post below. For brevity, I > will simply submit the proposal here. The specific problem I am addressing is the pollution of > Python methods by 'self.' to reference fields. Here is the proposal: > > The name of the first parameter to a method can be used to scope subsequent variable references > similar to the behavior of 'global'. > > Here are some examples: > > class Foo: > > def method_a(self) > > self x # subsequent 'x' references are scoped to 'self' > > x = 5 # same as self.x = 5 -1 on this. How would I know that x is "bound" to self later on in the method without scanning the whole method body for a possible reference to self.x ? As Python programmer, you default to assume that x = 5 refers to a local variable assignment (globals aren't used much, and when they are, the global definition is usually written as first statement in a function/method). If we'd have something like: with self: x = 5 y = 6 as in Pascal, this would be explicit and it's also clear that there's special scoping going on in the body of the with statement block. I still wouldn't like this much, but at least it follows explicit is better than implicit. Overall, I don't believe much in keystroke optimizations - editors provide all the help you need these days to avoid typing too much, while still allowing you to use descriptive identifiers throughout your program. And that results in much better readability than any scoping tricks ;-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From abarnert at yahoo.com Sat Jul 11 12:02:23 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 11 Jul 2015 03:02:23 -0700 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: References: Message-ID: <7FB34A43-B901-449B-BF4B-B3EFF3FBE29E@yahoo.com> On Jul 11, 2015, at 00:49, Michael Hewitt wrote: > > Let me state my argument more succinctly. If 'self.' is a good thing, then *all* Python variable references should always be explicitly prefixed by their scope -- 'global.', 'local.', and 'self.'. If I were designing a new pythonesque language, I think I might well add something like that for when you need to disambiguate from the default, and then get rid of the global statement. In fact, if it weren't for the obvious huge backward compat problems, I might even go for a suggestion to do that in Python. Even for closure variables, I kind of like the look of "nonlocal.x = 3", although it's not _as_ compelling as "global.x = 3" to me. But the idea that self is a more important default than local, and we should therefore have SLEGB instead of LEGB for unqualified lookups, and self for unqualified assignments instead of local? Definitely not. Creating new locals is something I do in half my functions and methods; creating new attributes is something I rarely do outside of __init__ methods. I've used plenty of languages that don't require declarations, and I've never found one whose default rules made me as happy as Python's LEGB/L. It sort of works in languages where classes have a predefined set of attributes, but even there I don't like it as much; I always end up naming one set or the other with an underscore prefix or suffix or some other marker which is just "self." in a disguise that fools my tools. > Why does it make more sense to not require explicit scoping of global variables, which are far more dangerous, while requiring explicit class scoping when nothing is more natural than for a method to refer to variables within its own class? > > This is totally backwards. If anything, the opposite should be true. 'global.' should be required for *all* global references, of which there should be very few in well-written, modular code, and 'self.' should not be required for class references because of course methods are going to naturally refer to variables within their class. > > Keep in mind that the proposed design explicitly specifies the class variable scoping at the top of each method, so that anyone reading the method can simply look at the top of the method to find out which variables are class variables. > > >> On Sat, Jul 11, 2015 at 12:25 AM, Michael Hewitt wrote: >> Let's compare two versions of a method taken from some code that my 11 year old son wrote yesterday: >> >> Current >> >> global height >> def keep_moving_gravity(self): >> self.y += self.gravity >> self.y = max(self.y, 0) >> self.y = min(self.y, height - 1) >> >> Proposed >> >> global height >> def keep_moving_gravity(self): >> self y, gravity >> y += gravity >> y = max(y, 0) >> y = min(y, height - 1) >> >> Is anyone actually going to argue that the first version is cleaner and more readable than the second? All I see when I read the first is 'self', 'self', 'self' -- my son's exact words to me last night. >> >> As far as maintainability, the author of the first version must repeatedly make the same decisions over and over regarding how to scope each variable reference as he/she types 'y', 'gravity', and 'height'. The author of the second code makes these decisions exactly once at the top of the method and then is free to refer to each variable naturally without the mental overhead of prefixing each 'y' and 'gravity' with 'self.', but God forbid - not 'height'. I can tell you that the mental overhead of this is taxing my son & is the cause of many painful mistakes -- just forgetting a single 'self.' prefix on one of the above field references can waste a significant amount of time. >> >> As far as static analysis tools, this should honestly not be a lot of extra work, since the tools must already handle 'global' in a very similar fashion. >> >> If the 'self.' prefix really does make code clearer, then we should do away with the 'global' scope declaration as well as automatic local scoping and require prefixing of all Python variables with 'self.', 'global.' or 'local.'. My mind becomes numb thinking about writing such code. To me, the existence of the keyword 'global' for automatically scoping subsequent variable references is a strong argument that a similar 'self' scoping mechanism is called for as well. >> >> And, for folks who still prefer to prefix all their field references with 'self.', the proposal in no way prevents them from doing so. It merely allows the rest of us to be a bit less wordy and more pithy in our code. >> >> Mike >> >> >>> On Friday, July 10, 2015, Mark Lawrence wrote: >>>> On 10/07/2015 23:31, Michael Hewitt wrote: >>>> Last night I made a post to the neopythonic blog proposing a Python 3.x >>>> feature that Guido asked me to forward to this alias. For the full >>>> background, see the link to my post below. For brevity, I will simply >>>> submit the proposal here. The specific problem I am addressing is the >>>> pollution of Python methods by 'self.' to reference fields. Here is the >>>> proposal: >>>> >>>> The name of the first parameter to a method can be used to scope >>>> subsequent variable references similar to the behavior of 'global'. >>>> >>>> >>>> Here are some examples: >>>> >>>> class Foo: >>>> >>>> def method_a(self) >>>> >>>> self x # subsequent 'x' references are scoped to 'self' >>>> >>>> x = 5 # same as self.x = 5 >>>> >>>> def method_b(this) >>>> >>>> this x, y # subsequent 'x' & 'y' refs are scoped to 'this' >>>> >>>> x = y # same as this.x = this.y >>>> >>>> def method_c(its) >>>> >>>> its.x = 5 # still works just like it used to >>>> >>>> >>>> This suggestion is fully backward compatible with existing Python code, >>>> but would eliminate the need to pollute future Python methods with >>>> copious 'self.' prefixes, thereby improving both readability and >>>> maintainabilty. >>> >>> I disagree completely. When I see:- >>> >>> self.x = 1 >>> >>> I currently know exactly what I'm looking at. All I see with this proposal is more work for my MKI eyeballs, which are already knackered, and more work for the people who maintain our static analysis tools, as you can still forget to properly scope your variables. So -1. >>> >>>> >>>> Thank you for your consideration. >>>> >>>> Michael Hewitt >>>> >>>> Original Post: >>>> http://neopythonic.blogspot.com/2008/10/why-explicit-self-has-to-stay.html >>> >>> >>> >>> -- >>> My fellow Pythonistas, ask not what our language can do for you, ask >>> what you can do for our language. >>> >>> Mark Lawrence >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sat Jul 11 12:13:44 2015 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 11 Jul 2015 20:13:44 +1000 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: <7FB34A43-B901-449B-BF4B-B3EFF3FBE29E@yahoo.com> References: <7FB34A43-B901-449B-BF4B-B3EFF3FBE29E@yahoo.com> Message-ID: On Sat, Jul 11, 2015 at 8:02 PM, Andrew Barnert via Python-ideas wrote: > If I were designing a new pythonesque language, I think I might well add > something like that for when you need to disambiguate from the default, and > then get rid of the global statement. In fact, if it weren't for the obvious > huge backward compat problems, I might even go for a suggestion to do that > in Python. Even for closure variables, I kind of like the look of > "nonlocal.x = 3", although it's not _as_ compelling as "global.x = 3" to me. Want it in Python? No problem. You can't have "global.x" because it's a keyword, but... globl = SimpleNamespace() def use_globals(): globl.x += 1 return globl.y If you want it, it's there, but I wouldn't bother with it for constants - including functions, which are globals just the same as any other. We do NOT need this kind of enforced adornment for all names. ChrisA From ncoghlan at gmail.com Sat Jul 11 12:17:22 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 11 Jul 2015 20:17:22 +1000 Subject: [Python-ideas] Learning from the shell in supporting asyncio background calls In-Reply-To: References: Message-ID: On 11 July 2015 at 15:16, Nathaniel Smith wrote: > For what it's worth, I find it extraordinarily confusing that background > tasks don't run in the background and foreground tasks don't run in the > foreground. I think the core problem lies in trying to reduce "the bare minimum you need to know to work effectively with native coroutines" to only two concepts, when there's actually three: * scheduling a coroutine in the event loop without waiting for it * executing a call in a background thread or process * running the event loop in the foreground while waiting for one of the above two operations I'm trying to do this *without* requiring folks to actually know what a future is: I want them to be able to use asyncio *without* learning about all the moving parts first. Once they appreciate what it can do for them, *then* they may have the motivation to tackle the task of figuring out how all the pieces fit together. However, In the design I put together for the blog posts, "run_in_background" currently handles both of the first two tasks, and it likely makes more sense to split them, which would give: def run_in_foreground(task, *, loop=None): """Runs the given event loop in the current thread until the task completes If not given, *loop* defaults to the current thread's event loop Returns the result of the task. For more complex conditions, combine with asyncio.wait() To include a timeout, combine with asyncio.wait_for() """ # Convenience wrapper around get_event_loop + # ensure_future + run_until_complete ... def schedule_coroutine(target, *, loop=None): """Schedules target coroutine in the given event loop If not given, *loop* defaults to the current thread's event loop Returns the scheduled task. Use run_in_foreground to wait for the result. """ # This just means extracting the coroutine part of # asyncio.ensure_future out to its own function. ... def call_in_background(target, *, loop=None, executor=None): """Schedules and starts target callable as a background task If not given, *loop* defaults to the current thread's event loop If not given, *executor* defaults to the loop's default executor Returns the scheduled task. Use run_in_foreground to wait for the result. """ # Convenience wrapper around get_event_loop + run_in_executor ... I'll sleep on that, and if I still like that structure in the morning, I'll look at revising my coroutine posts. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From storchaka at gmail.com Sat Jul 11 12:19:17 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 11 Jul 2015 13:19:17 +0300 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: References: Message-ID: On 11.07.15 11:40, Nick Coghlan wrote: > Alternative proposal: > > def keep_moving_gravity(self): > .y += .gravity > .y = max(.y, 0) > .y = min(.y, height - 1) [...] > The main downside is that a leading dot isn't as good a visual > indicator as a dot appearing between two other characters. I suggest $. It is well known indicator in other languages. :-) def keep_moving_gravity(self): $y += $gravity $y = max($y, 0) $y = min($y, height - 1) From abarnert at yahoo.com Sat Jul 11 12:52:44 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 11 Jul 2015 03:52:44 -0700 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: References: <7FB34A43-B901-449B-BF4B-B3EFF3FBE29E@yahoo.com> Message-ID: <2D46E80E-919E-45B7-BEE5-069273418CE0@yahoo.com> On Jul 11, 2015, at 03:13, Chris Angelico wrote: > > On Sat, Jul 11, 2015 at 8:02 PM, Andrew Barnert via Python-ideas > wrote: >> If I were designing a new pythonesque language, I think I might well add >> something like that for when you need to disambiguate from the default, and >> then get rid of the global statement. In fact, if it weren't for the obvious >> huge backward compat problems, I might even go for a suggestion to do that >> in Python. Even for closure variables, I kind of like the look of >> "nonlocal.x = 3", although it's not _as_ compelling as "global.x = 3" to me. > > Want it in Python? No problem. You can't have "global.x" because it's > a keyword, but... > > globl = SimpleNamespace() > > def use_globals(): > globl.x += 1 > return globl.y > > If you want it, it's there, but I wouldn't bother with it for > constants But that doesn't give me what I want. The whole point is to continue to use the existing default LEGB rules for lookup (which is much more common), but to mark global assignments (where you want to differ from the default of local assignments) on the assignment statement itself, instead of function-wide. Using a different namespace obviously fails at the former, so it's like using a sledgehammer as a flyswatter. (I suppose you _could_ use global.const instead of const with this change, but there'd never actually be a reason to do so, unless you intentionally shadowed the global with a local and wanted to access both, and that's a silly thing to do and trivial to avoid, even in auto-generated code.) > - including functions, which are globals just the same as > any other. We do NOT need this kind of enforced adornment for all > names. Which is why I suggested it might be good only for global assignments (and nonlocal assignments), not for all names. Global assignments are much rarer than global lookups and local assignments, and marking them at the point of assignment would make it explicit that you're doing something uncommon. If you keep your functions small enough, it's rarely a serious issue that you have to look upward to notice that the x=5 isn't creating a local variable, but I still think it might be better (again, in a new pythonesque language--I don't seriously want to change Python here) to remove the issue. From rosuav at gmail.com Sat Jul 11 13:01:32 2015 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 11 Jul 2015 21:01:32 +1000 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: <2D46E80E-919E-45B7-BEE5-069273418CE0@yahoo.com> References: <7FB34A43-B901-449B-BF4B-B3EFF3FBE29E@yahoo.com> <2D46E80E-919E-45B7-BEE5-069273418CE0@yahoo.com> Message-ID: On Sat, Jul 11, 2015 at 8:52 PM, Andrew Barnert wrote: > But that doesn't give me what I want. The whole point is to continue to use the existing default LEGB rules for lookup (which is much more common), but to mark global assignments (where you want to differ from the default of local assignments) on the assignment statement itself, instead of function-wide. Using a different namespace obviously fails at the former, so it's like using a sledgehammer as a flyswatter. > Oh, I see what you mean. My apologies, I thought you wanted adornment on all usage. There had at one point been a proposal to allow "global NAME = EXPR" to mean both "global NAME" and "NAME = EXPR", but I don't know that it ever took off. What you can do, though, is repeat the global declaration on every assignment: def func(x): if not _cached: global _cached; _cached = expensive() _cached[x] += 1 return _cached[x] Given that there'll often be just the one such assignment anyway (as in this example), it won't be a big problem. This is perfectly legal, although it's worth noting that it does trigger a SyntaxWarning (name used prior to global declaration). It's not perfect, but it does allow you to put the word "global" right next to the assignment. ChrisA From liik.joonas at gmail.com Sat Jul 11 13:53:51 2015 From: liik.joonas at gmail.com (Joonas Liik) Date: Sat, 11 Jul 2015 14:53:51 +0300 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: <55A0E7ED.8070209@egenix.com> References: <55A0E7ED.8070209@egenix.com> Message-ID: On 11 July 2015 at 12:54, M.-A. Lemburg wrote: > > with self: > x = 5 > y = 6 > sounds a lot like JavaScript: the bad parts and has the exact same shortcomings. consider: var somename = "" with (someobject){ somename // is this a member of someobject or a reference to the local variable?? } From steve at pearwood.info Sat Jul 11 16:41:16 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 12 Jul 2015 00:41:16 +1000 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: <7FB34A43-B901-449B-BF4B-B3EFF3FBE29E@yahoo.com> References: <7FB34A43-B901-449B-BF4B-B3EFF3FBE29E@yahoo.com> Message-ID: <20150711144116.GD27268@ando.pearwood.info> On Sat, Jul 11, 2015 at 03:02:23AM -0700, Andrew Barnert via Python-ideas wrote: > But the idea that self is a more important default than local, and we > should therefore have SLEGB instead of LEGB for unqualified lookups, > and self for unqualified assignments instead of local? Definitely not. Fortunately, Michael has not suggested that. Michael has suggested optional syntax to explicitly declare attributes. He has not suggested to add attributes to the existing implicit lookup order. If "spam" is declared with "self spam", then it must be an attribute. Not a local, nonlocal, global or builtin. If "eggs" is *not* likewise declared, then it *cannot* be an attribute, and the same implicit lookup rules that Python already uses will continue to apply. -- Steve From steve at pearwood.info Sat Jul 11 17:07:56 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 12 Jul 2015 01:07:56 +1000 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: References: <85zj33w0ts.fsf@benfinney.id.au> Message-ID: <20150711150756.GE27268@ando.pearwood.info> On Sat, Jul 11, 2015 at 06:38:34PM +1000, Chris Angelico wrote: > On Sat, Jul 11, 2015 at 6:33 PM, Ben Finney wrote: > >> And, for folks who still prefer to prefix all their field references > >> with 'self.', the proposal in no way prevents them from doing so. It > >> merely allows the rest of us to be a bit less wordy and more pithy in > >> our code. > > > > Python requires explicit declaration of the namespace for names. That > > protects me from ambiguities that would otherwise be very common in > > other people's code. I appreciate that and do not want it threatened. > > Not quite true; Python requires explicit declaration of names when > they're bound to, No it doesn't. It only requires an explicit declaration of names when you don't wish to use the default *implicit* rule that any name you bind to is a local. Accessing a local is always implicit. You cannot declare locals at all (apart from function parameters). You just bind to a name, and Python implicitly treats them as local. Accessing nonlocals and globals may be implicit or explicit, depending on whether you use a global or nonlocal declaration. Accessing builtins is implicit, unless you import builtins and write "builtins.len" (say), which we hardly ever do. In Python, most variables (or name bindings, if you prefer) are implicitly scoped. It is rare that they are explicitly scoped as global, and when they are, that's usually a code smell. ("Global variables considered harmful.") The same applies to attribute access: we normally don't explicitly read from a particular scope. That goes against the principle of inheritence. When you look up "self.attr", you don't know whether the attribute returned will come from the instance __dict__, the class, or a superclass, the scope is implied by the history and type of the instance itself. To say nothing of descriptors like property, or methods. That's about as implicit as you can get. A naive reading of the Zen would suggest that, explicit being better than implicit (what, always? under all circumstances?), Python must be a pretty crap language, it has got so many implicit semantics... In contrast, the new rule suggested is *explicit*. You explicitly declare that a name belongs to the scope: self spam, eggs, cheese declares that spam, eggs and cheese are members of self. That's good enough for nonlocals and globals. (By the way, I used to prefix all my global variables with "g", to be explicit that they were global. That stage in my development as a programmer didn't last very long. Fortunately, I was never silly enough to prefix all my locals with "l".) Ben has agreed that it would be harmful to have to explicitly scope each and every name access: if not builtins.len(nonlocals.x + locals.y): raise builtins.ValueError(globals.ERROR_MSG) I trust that everyone agrees with this point. So it seems that explicit is *not* always better than implicit. So why do we behave as if the Zen of Python is some source of ultimate truth? The Zen is not supposed to be a thought-terminating cliche. As Terry Reedy wrote a few years ago: "People sometimes misuse Tim Peter's Zen of Python points. He wrote them to stimulate thought, not serve as a substitute for thought, and certainly not be a pile of mudballs to be used to chase people away." When you read a line of code in isolation: spam = eggs + cheese there is no hint, no clue, as to which of these are globals, nonlocals, locals, or even builtins. And yet we cope with the ambiguity, and I don't hear people claiming that we need to explicitly tag names with their scope, as we do with attributes. So what makes attributes so special that we have to break the rules that apply to all other variables? No tags needed: builtins, globals, nonlocals, locals. Tags needed: attributes. This isn't a rhetorical question, but this post is long enough, and I daresay half my readers have tuned out a few paragraphs ago, so I'll take my answer to the question to a new post. -- Steve From steve at pearwood.info Sat Jul 11 17:10:04 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 12 Jul 2015 01:10:04 +1000 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: References: <55A0E7ED.8070209@egenix.com> Message-ID: <20150711151004.GF27268@ando.pearwood.info> On Sat, Jul 11, 2015 at 02:53:51PM +0300, Joonas Liik wrote: > On 11 July 2015 at 12:54, M.-A. Lemburg wrote: > > > > with self: > > x = 5 > > y = 6 > > > > sounds a lot like JavaScript: the bad parts and has the exact same shortcomings. > consider: It's actually taken from Pascal, where it works quite well due to the necessity of explicit declarations of all variables and record fields. There's even a Python FAQ about it. So don't worry, Python won't get this construct. -- Steve From mertz at gnosis.cx Sat Jul 11 17:13:25 2015 From: mertz at gnosis.cx (David Mertz) Date: Sat, 11 Jul 2015 08:13:25 -0700 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: References: Message-ID: The proposed version feels difficult to read, bug prone, and overwhelmingly non-Pythonic. The door for C++ is right down the hall.... ? On Jul 11, 2015 12:26 AM, "Michael Hewitt" wrote: > Let's compare two versions of a method taken from some code that my 11 > year old son wrote yesterday: > > *Current* > > global height > def keep_moving_gravity(self): > self.y += self.gravity > self.y = max(self.y, 0) > self.y = min(self.y, height - 1) > > > *Proposed* > > global height > def keep_moving_gravity(self): > self y, gravity > y += gravity > y = max(y, 0) > y = min(y, height - 1) > > > Is anyone actually going to argue that the first version is cleaner and > more readable than the second? All I see when I read the first is 'self', > 'self', 'self' -- my son's exact words to me last night. > > As far as maintainability, the author of the first version must repeatedly > make the same decisions over and over regarding how to scope each variable > reference as he/she types 'y', 'gravity', and 'height'. The author of the > second code makes these decisions exactly once at the top of the method and > then is free to refer to each variable naturally without the mental > overhead of prefixing each 'y' and 'gravity' with 'self.', but God forbid - > not 'height'. I can tell you that the mental overhead of this is taxing my > son & is the cause of many painful mistakes -- just forgetting a single > 'self.' prefix on one of the above field references can waste a significant > amount of time. > > As far as static analysis tools, this should honestly not be a lot of > extra work, since the tools must already handle 'global' in a very similar > fashion. > > If the 'self.' prefix really does make code clearer, then we should do > away with the 'global' scope declaration as well as automatic local scoping > and require prefixing of all Python variables with 'self.', 'global.' or > 'local.'. My mind becomes numb thinking about writing such code. To me, > the existence of the keyword 'global' for automatically scoping subsequent > variable references is a strong argument that a similar 'self' scoping > mechanism is called for as well. > > And, for folks who still prefer to prefix all their field references with > 'self.', the proposal in no way prevents them from doing so. It merely > allows the rest of us to be a bit less wordy and more pithy in our code. > > Mike > > On Friday, July 10, 2015, Mark Lawrence wrote: > >> On 10/07/2015 23:31, Michael Hewitt wrote: >> >>> Last night I made a post to the neopythonic blog proposing a Python 3.x >>> feature that Guido asked me to forward to this alias. For the full >>> background, see the link to my post below. For brevity, I will simply >>> submit the proposal here. The specific problem I am addressing is the >>> pollution of Python methods by 'self.' to reference fields. Here is the >>> proposal: >>> >>> The name of the first parameter to a method can be used to scope >>> subsequent variable references similar to the behavior of 'global'. >>> >>> >>> Here are some examples: >>> >>> class Foo: >>> >>> def method_a(self) >>> >>> self x # subsequent 'x' references are scoped to 'self' >>> >>> x = 5 # same as self.x = 5 >>> >>> def method_b(this) >>> >>> this x, y # subsequent 'x' & 'y' refs are scoped to 'this' >>> >>> x = y # same as this.x = this.y >>> >>> def method_c(its) >>> >>> its.x = 5 # still works just like it used to >>> >>> >>> This suggestion is fully backward compatible with existing Python code, >>> but would eliminate the need to pollute future Python methods with >>> copious 'self.' prefixes, thereby improving both readability and >>> maintainabilty. >>> >> >> I disagree completely. When I see:- >> >> self.x = 1 >> >> I currently know exactly what I'm looking at. All I see with this >> proposal is more work for my MKI eyeballs, which are already knackered, and >> more work for the people who maintain our static analysis tools, as you can >> still forget to properly scope your variables. So -1. >> >> >>> Thank you for your consideration. >>> >>> Michael Hewitt >>> >>> Original Post: >>> >>> http://neopythonic.blogspot.com/2008/10/why-explicit-self-has-to-stay.html >>> >>> >> >> >> -- >> My fellow Pythonistas, ask not what our language can do for you, ask >> what you can do for our language. >> >> Mark Lawrence >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sat Jul 11 17:18:08 2015 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 12 Jul 2015 01:18:08 +1000 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: <20150711150756.GE27268@ando.pearwood.info> References: <85zj33w0ts.fsf@benfinney.id.au> <20150711150756.GE27268@ando.pearwood.info> Message-ID: On Sun, Jul 12, 2015 at 1:07 AM, Steven D'Aprano wrote: > On Sat, Jul 11, 2015 at 06:38:34PM +1000, Chris Angelico wrote: >> On Sat, Jul 11, 2015 at 6:33 PM, Ben Finney wrote: >> >> And, for folks who still prefer to prefix all their field references >> >> with 'self.', the proposal in no way prevents them from doing so. It >> >> merely allows the rest of us to be a bit less wordy and more pithy in >> >> our code. >> > >> > Python requires explicit declaration of the namespace for names. That >> > protects me from ambiguities that would otherwise be very common in >> > other people's code. I appreciate that and do not want it threatened. >> >> Not quite true; Python requires explicit declaration of names when >> they're bound to, > > No it doesn't. It only requires an explicit declaration of names when > you don't wish to use the default *implicit* rule that any name you bind > to is a local. Sorry, that was sloppily worded. I was making the point that Python does _not_ require declarations when names are referenced, but in the process implied that Python _does_ require them when they're bound to, which as you say is not always the case. But it is still true that Python requires explicit declarations _only_ when names are bound to. In any case, I was arguing against a position which nobody actually held, due to a misunderstanding of a previous post. We're all in agreement that the search path is a Good Thing when referencing names. (Side point: This is true of a lot of other search paths, too. Create explicitly, reference implicitly. The command path on Unix or Windows, the PostgreSQL schema search path, the Python module search directories (sys.path), they're all looked up by unqualified name; but when you create something, you usually have to say exactly where it gets put. Sometimes there's a default (Postgres lets you implicitly put things into the first location on the search path, which is what you most commonly want anyway), sometimes not even that (creating a file without a path name will put it in the current directory, which on Unix is not in $PATH).) ChrisA From Steve.Dower at microsoft.com Sat Jul 11 17:20:36 2015 From: Steve.Dower at microsoft.com (Steve Dower) Date: Sat, 11 Jul 2015 15:20:36 +0000 Subject: [Python-ideas] Concurrency Modules In-Reply-To: References: <559EFB73.5050606@mail.de> <55A031CF.9050002@mail.de> Message-ID: Two minor corrections to my own post: > * You can use a signal/interrupt, where the oven is going to make some noise and interrupt you when you're ready Should be "... when *the oven* is ready, regardless of whether you are ready to handle the interruption" > Hopefully that helps clear things up for some people. No example is perfect for everyone, ultimately, so the more we put out there the more likely ... we'll help everyone get a clear understanding of when and how to use these tools. Cheers, Steve From mertz at gnosis.cx Sat Jul 11 17:23:39 2015 From: mertz at gnosis.cx (David Mertz) Date: Sat, 11 Jul 2015 08:23:39 -0700 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: References: Message-ID: You are confusing scoping with attribute access. 'self' is not a lexical scope. In a way, if you squint just right, it resembles a dynamic scope (or would under your proposal). But Python isn't elisp, and we don't want to have dynamic scoping. I.e. you'd like bare variables to be (sometimes) scoped to the namespace of the instance eventually created somewhere outside of the class definition (quite likely only created under runtime dependent conditions). In contrast, actual Python is far simpler... all variables are lexically local when defined, unless explicitly declared to have a different and specific *lexical* scope. On Jul 11, 2015 12:58 AM, "Michael Hewitt" wrote: > Let me state my argument more succinctly. If 'self.' is a good thing, > then *all* Python variable references should always be explicitly prefixed > by their scope -- 'global.', 'local.', and 'self.'. Why does it make more > sense to not require explicit scoping of global variables, which are far > more dangerous, while requiring explicit class scoping when nothing is more > natural than for a method to refer to variables within its own class? > > This is totally backwards. If anything, the opposite should be true. > 'global.' should be required for *all* global references, of which there > should be very few in well-written, modular code, and 'self.' should not be > required for class references because of course methods are going to > naturally refer to variables within their class. > > Keep in mind that the proposed design explicitly specifies the class > variable scoping at the top of each method, so that anyone reading the > method can simply look at the top of the method to find out which variables > are class variables. > > > On Sat, Jul 11, 2015 at 12:25 AM, Michael Hewitt > wrote: > >> Let's compare two versions of a method taken from some code that my 11 >> year old son wrote yesterday: >> >> *Current* >> >> global height >> def keep_moving_gravity(self): >> self.y += self.gravity >> self.y = max(self.y, 0) >> self.y = min(self.y, height - 1) >> >> >> *Proposed* >> >> global height >> def keep_moving_gravity(self): >> self y, gravity >> y += gravity >> y = max(y, 0) >> y = min(y, height - 1) >> >> >> Is anyone actually going to argue that the first version is cleaner and >> more readable than the second? All I see when I read the first is 'self', >> 'self', 'self' -- my son's exact words to me last night. >> >> As far as maintainability, the author of the first version must >> repeatedly make the same decisions over and over regarding how to scope >> each variable reference as he/she types 'y', 'gravity', and 'height'. The >> author of the second code makes these decisions exactly once at the top of >> the method and then is free to refer to each variable naturally without the >> mental overhead of prefixing each 'y' and 'gravity' with 'self.', but God >> forbid - not 'height'. I can tell you that the mental overhead of this is >> taxing my son & is the cause of many painful mistakes -- just forgetting a >> single 'self.' prefix on one of the above field references can waste a >> significant amount of time. >> >> As far as static analysis tools, this should honestly not be a lot of >> extra work, since the tools must already handle 'global' in a very similar >> fashion. >> >> If the 'self.' prefix really does make code clearer, then we should do >> away with the 'global' scope declaration as well as automatic local scoping >> and require prefixing of all Python variables with 'self.', 'global.' or >> 'local.'. My mind becomes numb thinking about writing such code. To me, >> the existence of the keyword 'global' for automatically scoping subsequent >> variable references is a strong argument that a similar 'self' scoping >> mechanism is called for as well. >> >> And, for folks who still prefer to prefix all their field references with >> 'self.', the proposal in no way prevents them from doing so. It merely >> allows the rest of us to be a bit less wordy and more pithy in our code. >> >> Mike >> >> >> On Friday, July 10, 2015, Mark Lawrence wrote: >> >>> On 10/07/2015 23:31, Michael Hewitt wrote: >>> >>>> Last night I made a post to the neopythonic blog proposing a Python 3.x >>>> feature that Guido asked me to forward to this alias. For the full >>>> background, see the link to my post below. For brevity, I will simply >>>> submit the proposal here. The specific problem I am addressing is the >>>> pollution of Python methods by 'self.' to reference fields. Here is the >>>> proposal: >>>> >>>> The name of the first parameter to a method can be used to scope >>>> subsequent variable references similar to the behavior of 'global'. >>>> >>>> >>>> Here are some examples: >>>> >>>> class Foo: >>>> >>>> def method_a(self) >>>> >>>> self x # subsequent 'x' references are scoped to 'self' >>>> >>>> x = 5 # same as self.x = 5 >>>> >>>> def method_b(this) >>>> >>>> this x, y # subsequent 'x' & 'y' refs are scoped to 'this' >>>> >>>> x = y # same as this.x = this.y >>>> >>>> def method_c(its) >>>> >>>> its.x = 5 # still works just like it used to >>>> >>>> >>>> This suggestion is fully backward compatible with existing Python code, >>>> but would eliminate the need to pollute future Python methods with >>>> copious 'self.' prefixes, thereby improving both readability and >>>> maintainabilty. >>>> >>> >>> I disagree completely. When I see:- >>> >>> self.x = 1 >>> >>> I currently know exactly what I'm looking at. All I see with this >>> proposal is more work for my MKI eyeballs, which are already knackered, and >>> more work for the people who maintain our static analysis tools, as you can >>> still forget to properly scope your variables. So -1. >>> >>> >>>> Thank you for your consideration. >>>> >>>> Michael Hewitt >>>> >>>> Original Post: >>>> >>>> http://neopythonic.blogspot.com/2008/10/why-explicit-self-has-to-stay.html >>>> >>>> >>> >>> >>> -- >>> My fellow Pythonistas, ask not what our language can do for you, ask >>> what you can do for our language. >>> >>> Mark Lawrence >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >>> >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Sat Jul 11 17:39:59 2015 From: mertz at gnosis.cx (David Mertz) Date: Sat, 11 Jul 2015 08:39:59 -0700 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: References: Message-ID: On Jul 11, 2015 1:40 AM, "Nick Coghlan" wrote: > Alternative proposal: > > def keep_moving_gravity(self): > .y += .gravity > .y = max(.y, 0) > .y = min(.y, height - 1) Assuming the parser could be made to handle this (which feels like a big stipulation), this variation would deal with the non-explicit dynamic scoping bug magnets. It might violate the "looks like grit on Tim's screen" principle, but it is explicit. I'd only be -0 or -0.5 on this idea. Since we have Unicode though, how about: ?y += ?gravity Or ?y += ?gravity Guaranteed no parser ambiguity, and some symbols are bigger than grit. In fact, maybe we could get the Unicode Consortium to add a glyph that looked like 'self.' To make it stand out even better. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Nikolaus at rath.org Sat Jul 11 17:00:34 2015 From: Nikolaus at rath.org (Nikolaus Rath) Date: Sat, 11 Jul 2015 08:00:34 -0700 Subject: [Python-ideas] Concurrency Modules In-Reply-To: (Nick Coghlan's message of "Fri, 10 Jul 2015 17:18:22 +1000") References: <559EFB73.5050606@mail.de> Message-ID: <87615qivsd.fsf@vostro.rath.org> On Jul 10 2015, Nick Coghlan wrote: > On 10 July 2015 at 12:09, Chris Angelico wrote: >> On Fri, Jul 10, 2015 at 8:53 AM, Sven R. Kunze wrote: >>> After discussing the whole topic and reading it up further, it became clear >>> to me what's actually missing in Python. That is a definitive guide of >>> why/when a certain concurrency module is supposed to be used >> >> I'm not sure how easy the decisions will be in all cases, but >> certainly some broad guidelines would be awesome. (The exact analysis >> of "when should I use threads and when should I use processes" is a >> big enough one that there've been a few million blog posts on the >> subject, and I doubt that asyncio will shrink that.) A basic summary >> would be hugely helpful. "Here's four similar modules, and why they >> all exist in the standard library." > > Q: Why are there four different modules > A: Because they solve different problems > Q: What are those problems? > A: How long have you got? > > Choosing an appropriate concurrency model for a problem is one of the > hardest tasks in software architecture design. The only way to make it > appear simple is to focus in on a specific class of problems where > there *is* a single clearly superior answer for that problem domain :) But even just documenting this subset would already provide a lot of improvement over the status quo. If for each module there were an example of a problem that's clearly best solved with this module rather than any of the others, that's a perfectly good anwser to the question why they all exist. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F ?Time flies like an arrow, fruit flies like a Banana.? From alexander.belopolsky at gmail.com Sat Jul 11 17:44:47 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sat, 11 Jul 2015 11:44:47 -0400 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: References: Message-ID: On Sat, Jul 11, 2015 at 11:39 AM, David Mertz wrote: > maybe we could get the Unicode Consortium to add a glyph that looked like > 'self.' $ ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael at hewitts.us Sat Jul 11 18:04:53 2015 From: michael at hewitts.us (Michael Hewitt) Date: Sat, 11 Jul 2015 09:04:53 -0700 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: References: Message-ID: I am officially dropping my original proposal in favor of Nick's. Using a single '.' prefix to reference class variables is brilliant. It has the added benefit of scoping auto-completion within IDEs, which is incredibly helpful for beginners and veterans alike. As an aside, I have learned that long functions are bad form (reference Robert Martin's Clean Code series - #3 Functions). Using short functions with descriptive names has revolutionized code readability at my company. Incidentally, it also mitigates concerns about distinguishing locals from class variables. Additionally, I would caution against allowing a lesser used language feature (decorators) to drive the design of a more common feature (field references in methods). I want to thank you folks for helping to shape one of the most beautiful languages I have encountered. I apologize for the hit-and-run, but I can see that being a part of this discussion is not good for my health, causing me to send emails at 1AM and stew about the issue all night, so I will need to withdraw. My primary reason for exploring Python was as a 'first language' for my son and also as a potential option for future introductory programming courses that I might teach. Except for 'self' redundancy, it is a nearly ideal first language. Hopefully my feedback has been useful. Michael On Sat, Jul 11, 2015 at 1:40 AM, Nick Coghlan wrote: > On 11 July 2015 at 17:25, Michael Hewitt wrote: > > Let's compare two versions of a method taken from some code that my 11 > year > > old son wrote yesterday: > > > > Current > > > > global height > > def keep_moving_gravity(self): > > self.y += self.gravity > > self.y = max(self.y, 0) > > self.y = min(self.y, height - 1) > > > > Proposed > > > > global height > > def keep_moving_gravity(self): > > self y, gravity > > y += gravity > > y = max(y, 0) > > y = min(y, height - 1) > > Alternative proposal: > > def keep_moving_gravity(self): > .y += .gravity > .y = max(.y, 0) > .y = min(.y, height - 1) > > The main objection folks have to the declarative proposal is the fact > that there's nothing obvious at the point of reference to indicate > whether we're manipulating a local variable or an instance variable. I > can speak from experience in saying that there's a reason the "m_*" > prefix notation for C++ member variables is popular: it makes C++ > method implementations much easier to read when you can tell at a > glance if a line is working with an instance member or a local > variable. With the language not providing that separation by default, > folks added it by convention. > > An "implied attribute reference" proposal would be different: it could > build on the same mechanism that powers zero-argument super to make it > possible to say "if I reference an object attribute in a function > without saying which object I'm referring to, then I mean the first > parameter". > > This should work cleanly for standalone functions, class methods, > instance methods, etc, as the compiler already keeps track of the > necessary details in order to implement PEP 3135 > (https://www.python.org/dev/peps/pep-3135/#specification) > > The main downside is that a leading dot isn't as good a visual > indicator as a dot appearing between two other characters. That could > potentially be alleviated with a double-dot notation: > > def keep_moving_gravity(self): > ..y += ..gravity > ..y = max(..y, 0) > ..y = min(..y, height - 1) > > The downside of *that* is it might make people start thinking in terms > stepping up scopes, rather than referring to the first parameter. > > Regards, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Jul 11 18:18:22 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 12 Jul 2015 02:18:22 +1000 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: <20150711150756.GE27268@ando.pearwood.info> References: <85zj33w0ts.fsf@benfinney.id.au> <20150711150756.GE27268@ando.pearwood.info> Message-ID: <20150711161819.GG27268@ando.pearwood.info> On Sun, Jul 12, 2015 at 01:07:56AM +1000, Steven D'Aprano wrote: > So what makes attributes so special that we have to break the rules that > apply to all other variables? > > No tags needed: builtins, globals, nonlocals, locals. > > Tags needed: attributes. > > > This isn't a rhetorical question The short answer is, sometimes they are special, and sometimes they aren't. Implicit self is a much-requested feature, and not just in Python. Eiffel, for example, uses the "Current" keyword, and it is sometimes optional. Swift has implicit member access as well: var x = MyEnumeration.SomeValue x = .AnotherValue https://developer.apple.com/library/ios/documentation/Swift/Conceptual/Swift_Programming_Language/Expressions.html#//apple_ref/swift/grammar/implicit-member-expression And yet C++ which has implicit "this", people regularly end up prfixing their attributes with a pseudo namespace, "m_..." (for member). I have come to the conclusion that this is not just personal preference. I now believe that where you stand on this question will, to some degree, depend on the type of code you are writing. If you are writing "enterprisey" heavily object oriented code, chances are you are dealing with lots of state, lots of methods, lots of classes, and you'll need all the help you can get to keep track of where that state lives. You will probably want explicit self for every attribute access, or a pseudo-self naming convention like m_ in C++, to help keep it straight in your head. *Understanding the code* is harder than writing it in the first place, so having to write a few extra selfs is a negligible cost with a big benefit. Attributes are special because they are state, and you have lots of state. But look at Michael's example code. That's real code too, it's just not enterprisey, and there are many Python programmers who don't write enterprisey code. They're beginners, or sys admins hacking together a short script, or editing one that already exists. For them, or at least for some of them, they have only a few classes with a little bit of state. Their methods tend to be short. Although they don't have much state, they refer to it over and over again. For these people, I believe, all those explicit selfs do is add visual noise to the code, especially if the code is the least bit mathematical or if there are well-known conventions that are being followed: class Vector: def abs(self): return sqrt(x**2 + y**2 + z**2) # versus return sqrt(self.x**2 + self.y**2 + self.z**2) We're told that programs are written firstly for the human readers, and only incidentally for the computer. I expect that most people, with any knowledge of vectors, would have understood that the x, y, z in the first return must refer to coordinates of the vector. What else could they be? The selfs are just noise. If we are writing for human readers, sometimes we can be too explicit. Or to put it another way: If we are writing text for human readers who will read that text written by us, sometimes we can be too explicit in our writing for those same readers, causing a decrease in readability, which is an undesirable outcome. I don't intend to defend "Do What I Mean" attribute inference. I don't believe that is practical or desirable in Python. But there's a middle ground: Michael's suggestion, where we can explicitly declare that certain names are attributes of self, and thereby decrease the visual noise in that method. If you're thinking about enterprisy 50- or 100-line methods, this solution probably sounds awful. Every time you see a variable name in the method, you have to scroll back to the top of the method to see whether it is declared as a self attribute or not. And there are too many potential attributes to keep track of them all in your head. And I agree. But that sort of code is not the code that would benefit from this. Instead, think of small methods, say, ten or fifteen lines at most, few enough that you can keep the whole method in view at once. Think of classes with only a few attributes, but where you refer to them repeatedly. Maybe *you* don't feel the need to remove those in-your-face selfs, but can you understand why some people do? In recent years, Python has gained some nice new features and batteries aimed at the advanced programmer, such as async etc. This, I believe, is one which may assist programmers at the less advanced end. They have no need for awaitables and static type checking, but they sure would like less visual noise in the methods. -- Steve From storchaka at gmail.com Sat Jul 11 18:37:00 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 11 Jul 2015 19:37:00 +0300 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: References: Message-ID: On 11.07.15 18:39, David Mertz wrote: > On Jul 11, 2015 1:40 AM, "Nick Coghlan" > > wrote: > > Alternative proposal: > > > > def keep_moving_gravity(self): > > .y += .gravity > > .y = max(.y, 0) > > .y = min(.y, height - 1) > > Assuming the parser could be made to handle this (which feels like a big > stipulation), this variation would deal with the non-explicit dynamic > scoping bug magnets. > > It might violate the "looks like grit on Tim's screen" principle, but it > is explicit. I'd only be -0 or -0.5 on this idea. > > Since we have Unicode though, how about: > > ?y += ?gravity > > Or > > ?y += ?gravity Brilliant proposal! Unfortunately ? is indistinguishable from ? in many fonts, and ? can be confused with ? or ? in selected text. > Guaranteed no parser ambiguity, and some symbols are bigger than grit. > In fact, maybe we could get the Unicode Consortium to add a glyph that > looked like 'self.' To make it stand out even better. May be use ? (KANGXI RADICAL SELF)? From storchaka at gmail.com Sat Jul 11 18:39:01 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 11 Jul 2015 19:39:01 +0300 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: <20150711151004.GF27268@ando.pearwood.info> References: <55A0E7ED.8070209@egenix.com> <20150711151004.GF27268@ando.pearwood.info> Message-ID: On 11.07.15 18:10, Steven D'Aprano wrote: > On Sat, Jul 11, 2015 at 02:53:51PM +0300, Joonas Liik wrote: >> On 11 July 2015 at 12:54, M.-A. Lemburg wrote: >>> >>> with self: >>> x = 5 >>> y = 6 >>> >> >> sounds a lot like JavaScript: the bad parts and has the exact same shortcomings. >> consider: > > > It's actually taken from Pascal, where it works quite well due to the > necessity of explicit declarations of all variables and record fields. > > There's even a Python FAQ about it. So don't worry, Python won't get > this construct. To be correct, it should be "without self". From ron3200 at gmail.com Sat Jul 11 18:59:51 2015 From: ron3200 at gmail.com (Ron Adam) Date: Sat, 11 Jul 2015 12:59:51 -0400 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: References: <85zj33w0ts.fsf@benfinney.id.au> Message-ID: On 07/11/2015 04:38 AM, Chris Angelico wrote: > On Sat, Jul 11, 2015 at 6:33 PM, Ben Finney wrote: >>> >>And, for folks who still prefer to prefix all their field references >>> >>with 'self.', the proposal in no way prevents them from doing so. It >>> >>merely allows the rest of us to be a bit less wordy and more pithy in >>> >>our code. >> > >> >Python requires explicit declaration of the namespace for names. That >> >protects me from ambiguities that would otherwise be very common in >> >other people's code. I appreciate that and do not want it threatened. > Not quite true; Python requires explicit declaration of names when > they're bound to, but is quite happy to do a scope search (local, > nonlocal, global, builtin) for references. But the rules are fairly > simple. Aside from the possibility that someone imports your module > and sets module.len to shadow a builtin, everything can be worked out > lexically by looking for the assignments. I think some don't realise the names in a class block are not part of the static scope of the methods defined in that same class block. The methods get the static scope the class is defined in, but that excludes the names in the class block. If a class inherits methods defined in another module, those methods get the static scope where they were defined, and the methods local to the child class get a completely different static scope. But usually it's not a problem, thanks to "self". ;-) Cheers, Ron From ron3200 at gmail.com Sat Jul 11 19:16:41 2015 From: ron3200 at gmail.com (Ron Adam) Date: Sat, 11 Jul 2015 13:16:41 -0400 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: References: <85zj33w0ts.fsf@benfinney.id.au> Message-ID: On 07/11/2015 05:12 AM, Nick Coghlan wrote: > With Michael's proposal, that changes to: > > def keep_moving_gravity(self): > self y, gravity > y += gravity > y = max(y, 0) > y = min(y, height - 1) This works now... def keep_moving_gravity(self): y = self.y + self.gravity y = max(y, 0) self.y = min(y, height - 1) I don't think the self comes up as frequently as it may seem in well written code. And in the above, self is written once, but the other names y and gravity are repeated again. So the gain isn't as great as it may seem at first. What would this do? def __init__(self, red, blue, green): self red, blue green red = red blue = blue green = green If the names are long, it would take more characters than the equivalent version using self. Cheers, Ron From steve at pearwood.info Sat Jul 11 19:21:16 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 12 Jul 2015 03:21:16 +1000 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: References: Message-ID: <20150711172116.GI27268@ando.pearwood.info> On Sat, Jul 11, 2015 at 08:23:39AM -0700, David Mertz wrote: > You are confusing scoping with attribute access. 'self' is not a lexical > scope. In a way, if you squint just right, it resembles a dynamic scope (or > would under your proposal). But Python isn't elisp, and we don't want to > have dynamic scoping. I believe you are misunderstanding either dynamic scoping or Michael's proposal. In dynamic scoping, the scope of variables depends on the call chain. So if you write: def spam(): print(x) and then *call* it from function eggs(), spam gets x from the scope of eggs. This is nothing like Michael's proposal. > I.e. you'd like bare variables to be (sometimes) scoped to the namespace of > the instance eventually created somewhere outside of the class definition > (quite likely only created under runtime dependent conditions). self is always an instance created outside of the class definition, under runtime dependent conditions. You are describing every instance of every class. You can't create an instance of class Spam *inside the class definition* of Spam, because the class doesn't exist yet: class Spam: x = Spam() # doesn't work > In > contrast, actual Python is far simpler... all variables are lexically local > when defined, unless explicitly declared to have a different and specific > *lexical* scope. That's not correct. If it were, this would fail with NameError or UnboundLocalError: s = "hello world" def func(): print(s) func() -- Steve From steve at pearwood.info Sat Jul 11 19:29:29 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 12 Jul 2015 03:29:29 +1000 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: References: <85zj33w0ts.fsf@benfinney.id.au> Message-ID: <20150711172929.GJ27268@ando.pearwood.info> On Sat, Jul 11, 2015 at 01:16:41PM -0400, Ron Adam wrote: > What would this do? > > def __init__(self, red, blue, green): > self red, blue green > red = red > blue = blue > green = green I would expect it to raise a SyntaxError, just like this: py> def f(a): ... global a ... File "", line 1 SyntaxError: name 'a' is local and global -- Steve From Nikolaus at rath.org Sat Jul 11 21:10:15 2015 From: Nikolaus at rath.org (Nikolaus Rath) Date: Sat, 11 Jul 2015 12:10:15 -0700 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: <20150711161819.GG27268@ando.pearwood.info> (Steven D'Aprano's message of "Sun, 12 Jul 2015 02:18:22 +1000") References: <85zj33w0ts.fsf@benfinney.id.au> <20150711150756.GE27268@ando.pearwood.info> <20150711161819.GG27268@ando.pearwood.info> Message-ID: <877fq67boo.fsf@vostro.rath.org> On Jul 11 2015, Steven D'Aprano wrote: > On Sun, Jul 12, 2015 at 01:07:56AM +1000, Steven D'Aprano wrote: > >> So what makes attributes so special that we have to break the rules that >> apply to all other variables? >> >> No tags needed: builtins, globals, nonlocals, locals. >> >> Tags needed: attributes. >> >> >> This isn't a rhetorical question > > The short answer is, sometimes they are special, and sometimes they > aren't. [...] > > For these people, I believe, all those explicit selfs do is add visual > noise to the code, especially if the code is the least bit mathematical > or if there are well-known conventions that are being followed: > > class Vector: > def abs(self): > return sqrt(x**2 + y**2 + z**2) > # versus > return sqrt(self.x**2 + self.y**2 + self.z**2) > > > We're told that programs are written firstly for the human readers, and > only incidentally for the computer. I expect that most people, with any > knowledge of vectors, would have understood that the x, y, z in the > first return must refer to coordinates of the vector. What else could > they be? The selfs are just noise. If we are writing for human readers, > sometimes we can be too explicit. > > I don't intend to defend "Do What I Mean" attribute inference. I don't > believe that is practical or desirable in Python. But there's a middle > ground: Michael's suggestion, where we can explicitly declare that > certain names are attributes of self, and thereby decrease the visual > noise in that method. [...] Thanks for that excellent post. I hope it'll cause some people to reconsider. Although there seems to be a flood of objection mails, most of them seem more instintive than rational to me. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F ?Time flies like an arrow, fruit flies like a Banana.? From mertz at gnosis.cx Sat Jul 11 21:25:44 2015 From: mertz at gnosis.cx (David Mertz) Date: Sat, 11 Jul 2015 12:25:44 -0700 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: <20150711172116.GI27268@ando.pearwood.info> References: <20150711172116.GI27268@ando.pearwood.info> Message-ID: On Sat, Jul 11, 2015 at 10:21 AM, Steven D'Aprano wrote: > On Sat, Jul 11, 2015 at 08:23:39AM -0700, David Mertz wrote: > > You are confusing scoping with attribute access. 'self' is not a lexical > > scope. In a way, if you squint just right, it resembles a dynamic scope > (or > > would under your proposal). But Python isn't elisp, and we don't want to > > have dynamic scoping. > > I believe you are misunderstanding either dynamic scoping or Michael's > proposal. > It's not *exactly* dynamic scope, hence "resembling." > In dynamic scoping, the scope of variables depends on the call chain. > class Foo(): def __init__(self, val): self.val = val if random() < .5: a, b = Foo(1), Foo(2) else: a, b = Foo(2), Foo(1) What's the value of the attribute .val in the namespace 'a' after this code runs? You're right that this is always true of instances, that they are *dynamically* created at runtime. C.f. "resembling". I didn't claim that the proposal changes the scoping *semantics* of Python, per se. But right now, we have this convenient *syntactic* marker that .val is going to live in the namespace whose place is held in the definition by the name 'self'. Under his proposal, readability suffers because we no longer have that marker on the variable itself. > > contrast, actual Python is far simpler... all variables are lexically > local > > when defined, unless explicitly declared to have a different and specific > > *lexical* scope. > > That's not correct. If it were, this would fail with NameError or > UnboundLocalError: > > s = "hello world" > def func(): > print(s) > func() > > Take a look at what I wrote. 's' is *defined* in the enclosing scope of the function definition (i.e. probably the module scope; although perhaps your code is copied from some other nested scope, which would make 's' nonlocal rather than global). Module scope is also "lexical locality". -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Sat Jul 11 21:41:46 2015 From: mertz at gnosis.cx (David Mertz) Date: Sat, 11 Jul 2015 12:41:46 -0700 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: <877fq67boo.fsf@vostro.rath.org> References: <85zj33w0ts.fsf@benfinney.id.au> <20150711150756.GE27268@ando.pearwood.info> <20150711161819.GG27268@ando.pearwood.info> <877fq67boo.fsf@vostro.rath.org> Message-ID: On Sat, Jul 11, 2015 at 12:10 PM, Nikolaus Rath wrote: > > class Vector: > > def abs(self): > > return sqrt(x**2 + y**2 + z**2) > > # versus > > return sqrt(self.x**2 + self.y**2 + self.z**2) > > > > > > I expect that most people, with any > > knowledge of vectors, would have understood that the x, y, z in the > > first return must refer to coordinates of the vector. > Thanks for that excellent post. I hope it'll cause some people to > reconsider. Although there seems to be a flood of objection mails, most > of them seem more instintive than rational to me. > This is a great example of why the proposal is a bad one. Yes, not using magic makes a one-line function slightly longer (and even slightly less readable). But I also don't want to have to guess about WHICH 'x' or 'y' or 'z' is the one being used in calculation of Cartesian distance. Sure, an obvious implementation of a Vector class probably has x, y, z attributes of self. The example was rigged to make that seem obvious. But even then, I don't really know. Maybe the z direction of the vector is stored as a class attribute. Maybe the class expects to find a global definition of the z direction. Sure, keeping the z direction somewhere other than in self is probably foolish if you assume "Vector" means "generic vector". What if the class was called "RestrictedVector"? Would you know without reading the full class and docstring exactly in what respect it is "restricted"? E.g. are you sure it's not "restricted in the z dimension"? As someone else points out, if you WANT local variables in a method, you are welcome to use them. E.g.: def times_matrix(self, matrix): x, y, z = self.x, self.y, self.z # Many more than one line of implementation # ... stuff with matrix indexing and referencing x # etc. return result_matrix No one stops you now from giving local names to values stored in the instance within a method body. But when you do so, you KNOW 30 lines later that they are local names, even after the "local-looking name is actually an attribute" declaration has scrolled away. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gokoproject at gmail.com Sat Jul 11 22:17:28 2015 From: gokoproject at gmail.com (John Wong) Date: Sat, 11 Jul 2015 16:17:28 -0400 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: References: <85zj33w0ts.fsf@benfinney.id.au> <20150711150756.GE27268@ando.pearwood.info> <20150711161819.GG27268@ando.pearwood.info> <877fq67boo.fsf@vostro.rath.org> Message-ID: On Sat, Jul 11, 2015 at 12:18 PM, Steven D'Aprano wrote: > > *Understanding the code* is harder > than writing it in the first place, so having to write a few extra > selfs is a negligible cost with a big benefit. Attributes are special > because they are state, and you have lots of state. > > Actually I was taught to use "this" everywhere if I can. But look at Michael's example code. That's real code too, it's just not > enterprisey, and there are many Python programmers who don't write > enterprisey code. They're beginners, or sys admins hacking together a > short script, or editing one that already exists. For them, or at least > for some of them, they have only a few classes with a little bit of > state. Their methods tend to be short. Although they don't have much > state, they refer to it over and over again. > > For these people, I believe, all those explicit selfs do is add visual > noise to the code, especially if the code is the least bit mathematical > or if there are well-known conventions that are being followed: > I disagree that we really need to distinguish what consituttes enterprisey code. In fact it is a bad idea to think there is even such notion because people should express code in the cleanest and most intuitive way possible. Even a beginner can and should benefit from explicitness. I am sure everyone here have read enterprisey code and know that enterprisey code is not really enterprisey code. Complexity is a different discussion, and does not justify why someone would want to be less explicit or more explicit. That's same argument why can't we have make init a reserved class method, why have __init__? I can come up with a bunch of other arguments why Python language should do Y instead of X to make code less visually noisy. We cannot appeal to everyone's need if the visual noise is such a no deal in performance and in learning a language. Sure people can write "self" like this: def regular_function(elf, self): # self is just some Python object... elf.name = 'foo' self.name = 'bar' But this is still clear to beginner that hey this is just a regular function, self in this case is some kind of object.... Anyway, when I was a student, scoping was a very popular question among my peers when they first learn a new language. On Sat, Jul 11, 2015 at 12:04 PM, Michael Hewitt wrote: > I am officially dropping my original proposal in favor of Nick's. Using a > single '.' prefix to reference class variables is brilliant. It has the > added benefit of scoping auto-completion within IDEs, which is incredibly > helpful for beginners and veterans alike. I still -1 on the dot prefix. We might as well adopt $ in that case. "dot" is a poor choice. It is one of the most easily missed character to type. And again I don't see the benefit of having self. and $ co-exist. We should choose one way, but obviously this cannot be the case because we need backward compatibility. On Sat, Jul 11, 2015 at 3:41 PM, David Mertz wrote: > On Sat, Jul 11, 2015 at 12:10 PM, Nikolaus Rath wrote: > >> > class Vector: >> > def abs(self): >> > return sqrt(x**2 + y**2 + z**2) >> > # versus >> > return sqrt(self.x**2 + self.y**2 + self.z**2) >> > >> > >> > I expect that most people, with any >> > knowledge of vectors, would have understood that the x, y, z in the >> > first return must refer to coordinates of the vector. > > > >> Thanks for that excellent post. I hope it'll cause some people to >> reconsider. Although there seems to be a flood of objection mails, most >> of them seem more instintive than rational to me. >> > > This is a great example of why the proposal is a bad one. Yes, not using > magic makes a one-line function slightly longer (and even slightly less > readable). > > But I also don't want to have to guess about WHICH 'x' or 'y' or 'z' is > the one being used in calculation of Cartesian distance. Sure, an obvious > implementation of a Vector class probably has x, y, z attributes of self. > The example was rigged to make that seem obvious. But even then, I don't > really know. Maybe the z direction of the vector is stored as a class > attribute. Maybe the class expects to find a global definition of the z > direction. > > Sure, keeping the z direction somewhere other than in self is probably > foolish if you assume "Vector" means "generic vector". What if the class > was called "RestrictedVector"? Would you know without reading the full > class and docstring exactly in what respect it is "restricted"? E.g. are > you sure it's not "restricted in the z dimension"? > > As someone else points out, if you WANT local variables in a method, you > are welcome to use them. E.g.: > > def times_matrix(self, matrix): > x, y, z = self.x, self.y, self.z > # Many more than one line of implementation > # ... stuff with matrix indexing and referencing x > # etc. > return result_matrix > > No one stops you now from giving local names to values stored in the > instance within a method body. But when you do so, you KNOW 30 lines later > that they are local names, even after the "local-looking name is actually > an attribute" declaration has scrolled away. > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sat Jul 11 22:56:31 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 11 Jul 2015 13:56:31 -0700 Subject: [Python-ideas] Concurrency Modules In-Reply-To: <87615qivsd.fsf@vostro.rath.org> References: <559EFB73.5050606@mail.de> <87615qivsd.fsf@vostro.rath.org> Message-ID: <7524CDDA-6E8D-4FE0-A541-11A5EEFA7BD1@yahoo.com> On Jul 11, 2015, at 08:00, Nikolaus Rath wrote: > >> On Jul 10 2015, Nick Coghlan wrote: >>> On 10 July 2015 at 12:09, Chris Angelico wrote: >>>> On Fri, Jul 10, 2015 at 8:53 AM, Sven R. Kunze wrote: >>>> After discussing the whole topic and reading it up further, it became clear >>>> to me what's actually missing in Python. That is a definitive guide of >>>> why/when a certain concurrency module is supposed to be used >>> >>> I'm not sure how easy the decisions will be in all cases, but >>> certainly some broad guidelines would be awesome. (The exact analysis >>> of "when should I use threads and when should I use processes" is a >>> big enough one that there've been a few million blog posts on the >>> subject, and I doubt that asyncio will shrink that.) A basic summary >>> would be hugely helpful. "Here's four similar modules, and why they >>> all exist in the standard library." >> >> Q: Why are there four different modules >> A: Because they solve different problems >> Q: What are those problems? >> A: How long have you got? >> >> Choosing an appropriate concurrency model for a problem is one of the >> hardest tasks in software architecture design. The only way to make it >> appear simple is to focus in on a specific class of problems where >> there *is* a single clearly superior answer for that problem domain :) > > But even just documenting this subset would already provide a lot of > improvement over the status quo. > > If for each module there were an example of a problem that's clearly > best solved with this module rather than any of the others, that's a > perfectly good anwser to the question why they all exist. Assuming coroutines/asyncio are not the answer for your problem, it's not really a choice between 3 modules; rather, there are 3 separate binary decisions to make, which lead to 6 different possibilities (not 8, because 2 of them are less useful and therefore Python doesn't have them): futures.ProcessPoolExecutor, futures.ThreadPoolExecutor, multiprocessing.Pool, multiprocessing.dummy.Pool (unfortunately, this is where thread pools lie...), multiprocessing.Process, or threading.Thread. Explaining pools vs. separate threads is pretty easy. If you're doing a whole bunch of similar things (download 1000 files, do this computation on every row of a giant matrix), you want pools; if you're doing distinctly different things (update the backup for this file, send that file to the printer, and download the updated version from the net), you don't. Explaining plain pools vs. executors is a little trickier, because for the simplest cases there's no obvious difference. Coming up with a case where you need to compose futures isn't that hard; coming up with a case where you need one of the lower-level pool features (like explicitly managing batching) without getting too artificial to be meaningful or too complicated to serve as an example is a bit harder. But still not that big of a problem. Explaining threads vs. processes is two questions in itself. First, if you're looking at concurrency to speed up your code, and your code is CPU-bound, then your answer to the other question doesn't matter; you need processes. (Unless you're using a C extension that release the GIL, or using Jython instead of CPython, or ...) So finally we get to the big problem: shared state. Even ignoring the Python- and CPython-specific issues (forking, what the GIL makes atomic, ...), just explaining the basic ideas of what shared state means, when you need it, why you're wrong, what races are, how to synchronize, why mutability matters... Is that really something that can be fit into a HOWTO? But if you punt on that and just say "until you know what you're doing, everything should be written in the message-passing-tasks style", you might as well skip the whole HOWTO and say "always use concurrent.futures.ProcessPoolExecutor". From ncoghlan at gmail.com Sun Jul 12 02:04:47 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 12 Jul 2015 10:04:47 +1000 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: References: <85zj33w0ts.fsf@benfinney.id.au> <20150711150756.GE27268@ando.pearwood.info> <20150711161819.GG27268@ando.pearwood.info> <877fq67boo.fsf@vostro.rath.org> Message-ID: On 12 July 2015 at 05:41, David Mertz wrote: > As someone else points out, if you WANT local variables in a method, you are > welcome to use them. E.g.: > > def times_matrix(self, matrix): > x, y, z = self.x, self.y, self.z > # Many more than one line of implementation > # ... stuff with matrix indexing and referencing x > # etc. > return result_matrix Perhaps that's an answer that could be pushed more heavily? Q: My method is drowning in self references, how can I make it more readable? A: Bind referenced attributes to local names at the beginning of the method, and write them back to instance attributes at the end of the method. For example, rather than writing: class ObjectInMotion: def keep_moving_gravity(self): self.y += self.gravity self.y = max(self.y, 0) self.y = min(self.y, height - 1) We can write: class ObjectInMotion: def keep_moving_gravity(self): y = self.y + self.gravity y = max(y, 0) y = min(y, height - 1) self.y = y This approach is then conveniently amenable to factoring out helper functions that are independent of the original class definition: def calculate_next_location(current, gravity, height): next_location = current + gravity next_location = max(next_location, 0) next_location = min(next_location, height - 1) return next_location class ObjectInMotion: def keep_moving_gravity(self): self.y = calculate_next_location(self.y, self.gravity, height) I like this angle, as it encourages thinking about mutating operations on instances as a "read, transform, write" cycle, rather than multiple in-place mutations (although the latter may of course still happen when the attributes being manipulated are themselves mutable objects). Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From greg.ewing at canterbury.ac.nz Sun Jul 12 02:43:07 2015 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 12 Jul 2015 12:43:07 +1200 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: <55A0E7ED.8070209@egenix.com> References: <55A0E7ED.8070209@egenix.com> Message-ID: <55A1B81B.6070005@canterbury.ac.nz> M.-A. Lemburg wrote: > the global definition is usually written as first > statement in a function/method). The same thing applies to the proposed declaration. However, globals are usually given longish and descriptive names, making it easy to spot them and know what they refer to. Both locals and instance attributes, on the other hand, often have very short and cryptic names, so having them both appear with no prefix would be very confusing. -- Greg From tjreedy at udel.edu Sun Jul 12 03:21:57 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 11 Jul 2015 21:21:57 -0400 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: References: <85zj33w0ts.fsf@benfinney.id.au> <20150711150756.GE27268@ando.pearwood.info> <20150711161819.GG27268@ando.pearwood.info> <877fq67boo.fsf@vostro.rath.org> Message-ID: On 7/11/2015 3:41 PM, David Mertz wrote: > On Sat, Jul 11, 2015 at 12:10 PM, Nikolaus Rath > > wrote: > > > class Vector: > > def abs(self): > > return sqrt(x**2 + y**2 + z**2) > > # versus > > return sqrt(self.x**2 + self.y**2 + self.z**2) or def abs(v): return sqrt(v.x**2 + v.y**2 + v.z**2) > But I also don't want to have to guess about WHICH 'x' or 'y' or 'z' is > the one being used in calculation of Cartesian distance. Sure, an > obvious implementation of a Vector class probably has x, y, z attributes > of self. The example was rigged to make that seem obvious. But even > then, I don't really know. Maybe the z direction of the vector is > stored as a class attribute. Maybe the class expects to find a global > definition of the z direction. > > Sure, keeping the z direction somewhere other than in self is probably > foolish if you assume "Vector" means "generic vector". What if the > class was called "RestrictedVector"? Would you know without reading the > full class and docstring exactly in what respect it is "restricted"? > E.g. are you sure it's not "restricted in the z dimension"? > > As someone else points out, if you WANT local variables in a method, you > are welcome to use them. E.g.: > > def times_matrix(self, matrix): > x, y, z = self.x, self.y, self.z > # Many more than one line of implementation > # ... stuff with matrix indexing and referencing x > # etc. > return result_matrix Localization, which is not limited to self attributes, does two things: it removes the need to continually type 'object.x'; it makes the remaining code run faster. I would be interested to know how many faster local references are needed to make up for the overhead of localization. -- Terry Jan Reedy From ncoghlan at gmail.com Sun Jul 12 04:48:14 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 12 Jul 2015 12:48:14 +1000 Subject: [Python-ideas] Learning from the shell in supporting asyncio background calls In-Reply-To: References: Message-ID: On 11 July 2015 at 20:17, Nick Coghlan wrote: > I'll sleep on that, and if I still like that structure in the morning, > I'll look at revising my coroutine posts. I've revised both of my asyncio posts to use this three part helper API to work with coroutines and the event loop from the interactive prompt: * run_in_foreground * schedule_coroutine * call_in_background I think the revised TCP echo client and server post is the better of the two descriptions, since it uses actual network IO operations as its underlying example, rather than toy background timers: http://www.curiousefficiency.org/posts/2015/07/asyncio-tcp-echo-server.html As with most of the main asyncio API, "run" in this revised setup now refers specifically to running the event loop. ("run_in_executor" is still an anomaly, which I now believe might have been better named "call_in_executor" to align with the call_soon, call_soon_threadsafe and call_later callback management APIs, rather than the run_* event loop invocation APIs) The foreground/background split is now intended to refer primarily to "main thread in the main process" (e.g. the interactive prompt, the GUI thread in a desktop application, the main server process in a network application) vs "worker threads and processes" (whether managed by the default executor, or another executor passed in specifically to "call_in_background"). This is much closer in spirit to the shell meaning. The connection that "call_in_background" has to asyncio over using concurrent.futures directly is that, just like schedule_coroutine, it's designed to be used in tandem with run_in_foreground (either standalone, or in combination with asyncio.wait, or asyncio.wait_for) to determine if the results are available yet. Both schedule_coroutine and call_in_background are deliberately restricted in the kinds of objects they accept - unlike ensure_future, schedule_coroutine will complain if given an existing future, while call_in_background will complain immediately if given something that isn't some kind of callable. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From oscar.j.benjamin at gmail.com Sun Jul 12 14:56:39 2015 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Sun, 12 Jul 2015 13:56:39 +0100 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: References: <85zj33w0ts.fsf@benfinney.id.au> <20150711150756.GE27268@ando.pearwood.info> <20150711161819.GG27268@ando.pearwood.info> <877fq67boo.fsf@vostro.rath.org> Message-ID: On 12 July 2015 at 01:04, Nick Coghlan wrote: > On 12 July 2015 at 05:41, David Mertz wrote: >> As someone else points out, if you WANT local variables in a method, you are >> welcome to use them. E.g.: >> >> def times_matrix(self, matrix): >> x, y, z = self.x, self.y, self.z >> # Many more than one line of implementation >> # ... stuff with matrix indexing and referencing x >> # etc. >> return result_matrix > > > I like this angle, as it encourages thinking about mutating operations > on instances as a "read, transform, write" cycle, rather than multiple > in-place mutations (although the latter may of course still happen > when the attributes being manipulated are themselves mutable objects). I prefer this particular style. Following object-oriented code means keeping track of state. In any non-trivial method this approach makes it easier to see what state serves as additional input to the method and what state is changed as output of the method. This is part of the implicit "signature" of a method on top of its input arguments and return values so it's good to separate it out from any internal calculations. Also modifying state in place is inappropriate for objects being used in a "physics" situation as the motivating examples does: def keep_moving_gravity(self): self.y += self.gravity self.y = max(self.y, 0) self.y = min(self.y, height - 1) The problem is that this method simultaneously calculates the new position and overwrites the old one (in a non-invertible way). In a program where there is only one variable that might be fine. However if we need to detect collisions with other objects then this won't work. Also if the velocity/acceleration of this object depends on e.g. the position of some other object then we end up with order-dependent physics: if I update ball1 first and then ball2 I get different results than I would if I call the update methods in the reverse order. Similarly if you write something like this: def update(self, dt): self.vy += self.gravity * dt self.y += self.vy * dt Then the result depends on the order of the two lines of code in the method. It should really be something like: def update(self, dt): # read y, vy = self.y, self.vy # calculate new values from old new_vy = vy + self.gravity * dt new_y = y + vy * dt # Using vy not new_vy on RHS # update state atomically self.y, self.vy = new_y, new_vy A better general approach is that each object reports what velocity it thinks it should have: def get_velocity(self, system_state): return (0, self.gravity) # Assuming 2D Then a controller can pass the current state of the system to each object and ask what velocity it has and then update all objects simultaneously checking for and resolving collisions etc. as it does so. Building the update logic into every class means that every class needs to be changed in order to redesign anything in the main physics loop (to add collision detection, variable timestep etc). -- Oscar From 2015 at jmunch.dk Sun Jul 12 17:11:38 2015 From: 2015 at jmunch.dk (Anders J. Munch) Date: Sun, 12 Jul 2015 17:11:38 +0200 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: References: <85zj33w0ts.fsf@benfinney.id.au> <20150711150756.GE27268@ando.pearwood.info> <20150711161819.GG27268@ando.pearwood.info> <877fq67boo.fsf@vostro.rath.org> Message-ID: <55A283AA.4070207@jmunch.dk> Terry Reedy wrote: > Localization, which is not limited to self attributes, does two > things: it removes the need to continually type 'object.x'; it makes > the remaining code run faster. I would be interested to know how > many faster local references are needed to make up for the overhead > of localization. If you access the attribute just twice, that's enough for localisation to yield a speedup. Local variable access is just that much faster than attribute lookup. $ python3 -V Python 3.4.3Terry Reedy wrote: > Localization, which is not limited to self attributes, does two things: it > removes the need to continually type 'object.x'; it makes the remaining code > run faster. I would be interested to know how many faster local references > are needed to make up for the overhead of localization. If you access the attribute just twice, that's enough for localisation to yield a speedup. Local variable access is just that much faster than attribute lookup. $ python3 -V Python 3.4.3 $ python3 -m timeit -s "class C: pass" -s "c = C();c.x=1" "c.x;c.x" 10000000 loops, best of 3: 0.106 usec per loop $ python3 -m timeit -s "class C: pass" -s "c = C();c.x=1" "x=c.x;x;x" 10000000 loops, best of 3: 0.0702 usec per loop $ python -V Python 2.7.3 $ python -m timeit -s "class C(object): pass" -s "c = C();c.x=1" "c.x;c.x" 10000000 loops, best of 3: 0.105 usec per loop $ python -m timeit -s "class C(object): pass" -s "c = C();c.x=1" "x=c.x;x;x" 10000000 loops, best of 3: 0.082 usec per loop regards, Anders From Nikolaus at rath.org Sun Jul 12 20:28:04 2015 From: Nikolaus at rath.org (Nikolaus Rath) Date: Sun, 12 Jul 2015 11:28:04 -0700 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: (David Mertz's message of "Sat, 11 Jul 2015 12:41:46 -0700") References: <85zj33w0ts.fsf@benfinney.id.au> <20150711150756.GE27268@ando.pearwood.info> <20150711161819.GG27268@ando.pearwood.info> <877fq67boo.fsf@vostro.rath.org> Message-ID: <87io9ps023.fsf@vostro.rath.org> On Jul 11 2015, David Mertz wrote: > On Sat, Jul 11, 2015 at 12:10 PM, Nikolaus Rath wrote: > >> > class Vector: >> > def abs(self): >> > return sqrt(x**2 + y**2 + z**2) >> > # versus >> > return sqrt(self.x**2 + self.y**2 + self.z**2) >> > >> > >> > I expect that most people, with any >> > knowledge of vectors, would have understood that the x, y, z in the >> > first return must refer to coordinates of the vector. > > This is a great example of why the proposal is a bad one. Yes, not using > magic makes a one-line function slightly longer (and even slightly less > readable). > > But I also don't want to have to guess about WHICH 'x' or 'y' or 'z' is the > one being used in calculation of Cartesian distance. Sure, an obvious > implementation of a Vector class probably has x, y, z attributes of self. > The example was rigged to make that seem obvious. But even then, I don't > really know. Maybe the z direction of the vector is stored as a class > attribute. So how do you know if z is a class or an instance attribute if you write "self.z" instead of "z"? > Maybe the class expects to find a global definition of the z direction. With the proposal, the above would actually read def abs(self): self x, y, z return sqrt(x**2 + y**2 + z**2) so it is obvious that z is not a global but an attribute of self. > No one stops you now from giving local names to values stored in the > instance within a method body. But when you do so, you KNOW 30 lines later > that they are local names, even after the "local-looking name is actually > an attribute" declaration has scrolled away. As Steven has so nicely written in the mail that I responded to: ,---- | If you are writing "enterprisey" heavily object oriented code, chances | are you are dealing with lots of state, lots of methods, lots of | classes, and you'll need all the help you can get to keep track of where | that state lives. You will probably want explicit self for every | attribute access, or a pseudo-self naming convention like m_ in C++, to | help keep it straight in your head. *Understanding the code* is harder | than writing it in the first place, so having to write a few extra | selfs is a negligible cost with a big benefit. Attributes are special | because they are state, and you have lots of state. | | But look at Michael's example code. That's real code too, it's just not | enterprisey, and there are many Python programmers who don't write | enterprisey code. They're beginners, or sys admins hacking together a | short script, or editing one that already exists. For them, or at least | for some of them, they have only a few classes with a little bit of | state. Their methods tend to be short. Although they don't have much | state, they refer to it over and over again. | [...] | If you're thinking about enterprisy 50- or 100-line methods, this | solution probably sounds awful. Every time you see a variable name in | the method, you have to scroll back to the top of the method to see | whether it is declared as a self attribute or not. And there are too | many potential attributes to keep track of them all in your head. | | And I agree. But that sort of code is not the code that would benefit | from this. Instead, think of small methods, say, ten or fifteen lines at | most, few enough that you can keep the whole method in view at once. | Think of classes with only a few attributes, but where you refer to them | repeatedly. Maybe *you* don't feel the need to remove those in-your-face | selfs, but can you understand why some people do? `---- Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F ?Time flies like an arrow, fruit flies like a Banana.? From mertz at gnosis.cx Sun Jul 12 21:10:09 2015 From: mertz at gnosis.cx (David Mertz) Date: Sun, 12 Jul 2015 12:10:09 -0700 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: <87io9ps023.fsf@vostro.rath.org> References: <85zj33w0ts.fsf@benfinney.id.au> <20150711150756.GE27268@ando.pearwood.info> <20150711161819.GG27268@ando.pearwood.info> <877fq67boo.fsf@vostro.rath.org> <87io9ps023.fsf@vostro.rath.org> Message-ID: > > With the proposal, the above would actually read > > def abs(self): > self x, y, z > return sqrt(x**2 + y**2 + z**2) > > so it is obvious that z is not a global but an attribute of self. > Sure, and this exists right now as an option: def abs(self): x, y, z = self.x, self.y, self.z return sqrt(x**2 + y**2 + z**2) The small number of characters saved on one first line isn't worth the extra conceptual complexity of learning and reading the next construct. > > No one stops you now from giving local names to values stored in the > > instance within a method body. But when you do so, you KNOW 30 lines > later > > that they are local names, even after the "local-looking name is actually > > an attribute" declaration has scrolled away. > > As Steven has so nicely written in the mail that I responded to: > | If you are writing "enterprisey" heavily object oriented code, chances > | are you are dealing with lots of state, lots of methods, lots of > | classes, I don't buy it. I don't want a language construct that is meant to self-destruct as soon as people's methods grow past 30 lines (yes, I know it wouldn't literally do so, but there is a pretty small threshold beyond which it turns from making "code golf" easier to actively harming readability). -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Mon Jul 13 04:49:15 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 13 Jul 2015 11:49:15 +0900 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: References: Message-ID: <87y4ikg4b8.fsf@uwakimon.sk.tsukuba.ac.jp> Chris Angelico writes: > Alternate proposal: > > # You could in-line this if you want to. > def limit(low, n, high): > """Limit a number to be within certain bounds""" > return min(low, max(n, high)) > > def keep_moving_gravity(self): > self.y = limit(0, self.y + self.gravity, height - 1) > > > There are precisely two references to self.y (one reading, one > writing), and one reference to self.gravity. That was my first reaction. > Bad code is not itself an argument for a language change. Sometimes > there's an even better alternative :) If you're a language wonk, it's obviously better, true (well, I'm a wannabe wonk so I say that and cross my fingers for luck :-). But if you're a very new and possibly very young programmer (both meaning you have little experience abstracting in this way) who's basically scripting your way through the problem, defining an "internal" function to be used once looks like throwing the garbage in the garage through a window onto the lawn. I think this is going to be be one of those features that "scripters"[1] and "programmers" evaluate very differently. Personally, I'm -1 but that says more about my proclivities than about the proposal. :-) Steve Footnotes: [1] Sorry if that sounds pejorative. It's not intended to. I tried to find a better term but failed. From stephen at xemacs.org Mon Jul 13 05:05:11 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 13 Jul 2015 12:05:11 +0900 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: References: <85zj33w0ts.fsf@benfinney.id.au> <20150711150756.GE27268@ando.pearwood.info> <20150711161819.GG27268@ando.pearwood.info> <877fq67boo.fsf@vostro.rath.org> Message-ID: <87wpy4g3ko.fsf@uwakimon.sk.tsukuba.ac.jp> John Wong writes: > I disagree that we really need to distinguish what consituttes > enterprisey code. In fact it is a bad idea to think there is even > such notion because people should express code in the cleanest and > most intuitive way possible. I don't think "unintuitive" is what is meant by "enterprisey code". I suspect that what is meant is large objects that have slews of attributes that need to be initialized, with a large variety of conventions for constructing defaults. If your code needs to consist mostly of long sequences of attribute assignments followed by returning the result of a method call, I see nothing wrong with writing it as a long sequence of assignments followed by a method call. From jacob.niehus at gmail.com Wed Jul 15 17:40:59 2015 From: jacob.niehus at gmail.com (Jacob Niehus) Date: Wed, 15 Jul 2015 08:40:59 -0700 Subject: [Python-ideas] Should a single/double quote followed by a left parenthesis raise SyntaxError? Message-ID: I recently forgot the '%' between a format string and its tuple of values and got "TypeError: 'str' object is not callable." The error makes sense, of course, because function calling has higher precedence than anything else in the expression, so while: >>> '%s' 'abc' yields '%sabc', >>> '%s' ('abc') yields a TypeError. My question is whether this should be caught as a syntax error instead of a runtime error. I can't think of any possible case where a single/double quote followed by '(' (optionally separated by whitespace) would not raise an exception when the '(' is not in a string or comment. I even wrote a script (below) to check every Python file on my computer for occurrences of the regular expression r'[''"]\s*\(' occurring outside of a string or comment. The sole result of my search, a file in a wx library, is a definite runtime error by inspection: raise IndexError, "Index out of range: %d > %d" (idx, len(self._items)) Can anyone think of a counter-example? -Jake -------------------------------------------------------------------------------- #!/usr/bin/env python2 import re import sys import tokenize from py_compile import PyCompileError, compile ttypes = {v: k for k, v in tokenize.__dict__.items() if isinstance(v, int) and k == k.upper()} for filename in sys.argv[1:]: lines = file(filename, 'r').readlines() matches = [] for lnum, line in enumerate(lines, 1): for match in re.finditer(r'[''"]\s*\(', line): matches.append((lnum, match.span(), match.group(0))) try: assert(matches) compile(filename, doraise=True) except (AssertionError, PyCompileError, IOError): continue matchdict = {k: [] for k in [m[0] for m in matches]} for match in matches: matchdict[match[0]].append(match) with open(filename, 'r') as f: gen = tokenize.generate_tokens(f.readline) for ttype, tstr, (srow, scol), (erow, ecol), line in gen: if srow == erow: for mrow, (mscol, mecol), mstr in matchdict.get(srow, []): pcols = [mscol + i for i, x in enumerate(mstr) if x == '('] for p in pcols: if (p in range(scol, ecol) and ttypes[ttype] not in ['COMMENT', 'STRING']): print filename print srow print ttypes[ttype] From p.f.moore at gmail.com Wed Jul 15 17:48:33 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 15 Jul 2015 16:48:33 +0100 Subject: [Python-ideas] Should a single/double quote followed by a left parenthesis raise SyntaxError? In-Reply-To: References: Message-ID: On 15 July 2015 at 16:40, Jacob Niehus wrote: > I recently forgot the '%' between a format string and its tuple of > values and got "TypeError: 'str' object is not callable." [...] > My question is whether this should be caught as a syntax error instead > of a runtime error. To be honest, I'm not sure I see why this would be an improvement, given that syntax errors ("SyntaxError: invalid syntax") have less informative error messages than the TypeError you quote? Being caught at compile time doesn't seem like it's a huge benefit (and in any case, compile-time type checking is not something Python offers, in general...) Paul From skip.montanaro at gmail.com Wed Jul 15 17:54:59 2015 From: skip.montanaro at gmail.com (Skip Montanaro) Date: Wed, 15 Jul 2015 10:54:59 -0500 Subject: [Python-ideas] Should a single/double quote followed by a left parenthesis raise SyntaxError? In-Reply-To: References: Message-ID: On Wed, Jul 15, 2015 at 10:40 AM, Jacob Niehus wrote: >>>> '%s' ('abc') > > yields a TypeError. > > My question is whether this should be caught as a syntax error instead > of a runtime error. It would only partially solve the problem (I think) you are trying to solve. Consider this syntactically valid case: >>> s = "%s" >>> s ("abc") TypeError 'str' object is not callable The TypeError tells you exactly what the problem is. A SyntaxError would catch the case where you try to call a string literal, but not a string object. Skip From ron3200 at gmail.com Wed Jul 15 19:27:10 2015 From: ron3200 at gmail.com (Ron Adam) Date: Wed, 15 Jul 2015 13:27:10 -0400 Subject: [Python-ideas] Should a single/double quote followed by a left parenthesis raise SyntaxError? In-Reply-To: References: Message-ID: On 07/15/2015 11:40 AM, Jacob Niehus wrote: > I recently forgot the '%' between a format string and its tuple of > values and got "TypeError: 'str' object is not callable." The error > makes sense, of course, because function calling has higher precedence > than anything else in the expression, so while: > >>>> '%s' 'abc' > > yields '%sabc', > >>>> '%s' ('abc') > > yields a TypeError. > > My question is whether this should be caught as a syntax error instead > of a runtime error. No, it's a runtime error. If it's changed there, it would need to be changed in the following places as well. >>> 123 (4) Traceback (most recent call last): File "", line 1, in TypeError: 'int' object is not callable >>> [1, 2, 3] (4) Traceback (most recent call last): File "", line 1, in TypeError: 'list' object is not callable >>> (1, 2, 3) (4) Traceback (most recent call last): File "", line 1, in TypeError: 'tuple' object is not callable But I can see why you would think syntax error would work. >>> s = 'efg' >>> 'abc' s File "", line 1 'abc' s ^ SyntaxError: invalid syntax Making the other cases match that could break existing working code. And only change one of them introduces an inconsistency. Cheers, Ron From tjreedy at udel.edu Thu Jul 16 01:03:50 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 15 Jul 2015 19:03:50 -0400 Subject: [Python-ideas] In call grammar, replace primary with possible_call? (was Re: ...quote followed by a left parenthesis...?) In-Reply-To: References: Message-ID: On 7/15/2015 11:40 AM, Jacob Niehus wrote: > I recently forgot the '%' between a format string and its tuple of > values and got "TypeError: 'str' object is not callable." The error > makes sense, of course, because function calling has higher precedence > than anything else in the expression, so while: > >>>> '%s' 'abc' > > yields '%sabc', > >>>> '%s' ('abc') > > yields a TypeError. > > My question is whether this should be caught as a syntax error instead > of a runtime error. I can't think of any possible case where a > single/double quote followed by '(' (optionally separated by whitespace) > would not raise an exception when the '(' is not in a string or comment. The lexical (character) level is the wrong one for approaching this issue. Python, like most formal language interpreters, first groups characters into tokens. The above becomes 4 tokens: '%s', (, 'abc', and ). These tokens are parsed into an ast.Expr with an ast.Call object. The .fun attribute is an ast.Str and the .args attribute has the other string. The simplified version of the current grammar for a call is call ::= primary '(' args ')' Your are, in effect, pointing out that 'primary' seems too broad, and asking whether it could sensibly be replaced by something narrower. Lets see how we might define a new non-terminal 'possible_call(able)' by pulling apart 'primary': primary ::= atom | attributeref | subscription | slicing | call Each of these is a possible callable, though a callable slicing would be unusual. The latter 4 are singular categories and must be included in possible_call. But atom is itself a composite category: atom ::= identifier | literal | enclosure enclosure ::= parenth_form | list_display | dict_display | set_display | generator_expression | yield_atom Identifiers are the most common form of callable. Number and string literals never are. Parenth_forms include ( identifier ) and so possibly are*. Displays and G-Es never are (as noted by Ron Adam). I believe a yield_atom could be with gen.send(callable). * Parenth_forms include both tuples -- parentheses empty or include a comma -- and parenthesized expression -- everything else, which is to say, non-empty parentheses without a comma. It seems that these could be separate grammatical categories, in which case tuples would be excluded and parenth_expr included. So here is the proposal: in the call definition, change primary to possible_call ::= identifier | parenth_form | yield_atom | attributeref | subscription | slicing | call As for error messages, there is another thread suggesting that the messages for SyntexError might be vastly improved. -- Terry Jan Reedy From steve at pearwood.info Thu Jul 16 06:18:21 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 16 Jul 2015 14:18:21 +1000 Subject: [Python-ideas] In call grammar, replace primary with possible_call? (was Re: ...quote followed by a left parenthesis...?) In-Reply-To: References: Message-ID: <20150716041821.GF21874@ando.pearwood.info> On Wed, Jul 15, 2015 at 07:03:50PM -0400, Terry Reedy wrote: > So here is the proposal: in the call definition, change primary to > possible_call ::= identifier | parenth_form | yield_atom > | attributeref | subscription | slicing | call > > As for error messages, there is another thread suggesting that the > messages for SyntexError might be vastly improved. This sounds like a good proposal. +1 -- Steve From p.f.moore at gmail.com Thu Jul 16 11:00:28 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 16 Jul 2015 10:00:28 +0100 Subject: [Python-ideas] In call grammar, replace primary with possible_call? (was Re: ...quote followed by a left parenthesis...?) In-Reply-To: <20150716041821.GF21874@ando.pearwood.info> References: <20150716041821.GF21874@ando.pearwood.info> Message-ID: On 16 July 2015 at 05:18, Steven D'Aprano wrote: > On Wed, Jul 15, 2015 at 07:03:50PM -0400, Terry Reedy wrote: > >> So here is the proposal: in the call definition, change primary to >> possible_call ::= identifier | parenth_form | yield_atom >> | attributeref | subscription | slicing | call >> >> As for error messages, there is another thread suggesting that the >> messages for SyntexError might be vastly improved. > > > This sounds like a good proposal. +1 Note that the proposal should also include a change to "call": call ::= possible_call '(' args ')' replacing call ::= primary '(' args ')' (I was initially confused by the fact that possible_call included call as an option, until I remembered how it fitted into the larger picture). This seems to me like more complexity than is warranted, particularly as the error message quality drops dramatically (unless we improve the syntax error message at the same time). So I'm -0 on this. Paul From mistersheik at gmail.com Thu Jul 16 12:15:28 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Thu, 16 Jul 2015 03:15:28 -0700 (PDT) Subject: [Python-ideas] Disallow "00000" as a synonym for "0" Message-ID: As per this question: http://stackoverflow.com/questions/31447694/why-does-python-3-allow-00-as-a-literal-for-0-but-not-allow-01-as-a-literal It seems like Python accepts "000000000" to mean "0". Whatever the historical reason, should this be deprecated? -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Thu Jul 16 13:28:30 2015 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 16 Jul 2015 21:28:30 +1000 Subject: [Python-ideas] In call grammar, replace primary with possible_call? (was Re: ...quote followed by a left parenthesis...?) In-Reply-To: References: <20150716041821.GF21874@ando.pearwood.info> Message-ID: On Thu, Jul 16, 2015 at 7:00 PM, Paul Moore wrote: > On 16 July 2015 at 05:18, Steven D'Aprano wrote: >> On Wed, Jul 15, 2015 at 07:03:50PM -0400, Terry Reedy wrote: >> >>> So here is the proposal: in the call definition, change primary to >>> possible_call ::= identifier | parenth_form | yield_atom >>> | attributeref | subscription | slicing | call >>> >>> As for error messages, there is another thread suggesting that the >>> messages for SyntexError might be vastly improved. >> >> >> This sounds like a good proposal. +1 > > Note that the proposal should also include a change to "call": > > call ::= possible_call '(' args ')' > > replacing > > call ::= primary '(' args ')' I presume that was the intent :) > (I was initially confused by the fact that possible_call included call > as an option, until I remembered how it fitted into the larger > picture). Calling the result of a call is the easiest way to demonstrate nested functions, closures, etc: def adder(x): def add(y): return x + y return add adder(5)(7) # == 12 Even if this is never used in real-world code, permitting it is a great way to show that a thing is a thing, no matter how you obtain it - you can use "adder(5).__name__" for attribute lookup on function return values, "lst[4][7]" to subscript a list item (very common!), "(yield 5)()" to call whatever function someone send()s you... worth keeping! (Also, I have unpleasant memories of PHP functions that return arrays - assigning the return value to a name and subscripting the name works, but subscripting the function return directly didn't work at the time. It has subsequently been fixed, but while I was doing that particular work, the restriction was there.) > This seems to me like more complexity than is warranted, particularly > as the error message quality drops dramatically (unless we improve the > syntax error message at the same time). > > So I'm -0 on this. There's another thread around the place for improving the error messages. But the message is unlikely to be accurate to the real problem anyway, as most people do not consciously seek to call string literals. It's like getting told "NoneType object has no attribute X" - the problem isn't that None lacks attributes, the problem is "why is my thing None instead of the object I thought it was". And hey. If anyone *really* wants to try to call a string lit, they can always write it thus: # Call me! response = ("0406 650 430")() That said, though, there is a backward-compatibility problem. Code which would have blown up at run-time will now fail at compile-time. It's still broken code, but people will need to check before running stuff in production. +1 on the proposal. Catching errors earlier is a good thing if it doesn't make things too complicated. ChrisA From abarnert at yahoo.com Thu Jul 16 13:32:44 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 16 Jul 2015 04:32:44 -0700 Subject: [Python-ideas] In call grammar, replace primary with possible_call? (was Re: ...quote followed by a left parenthesis...?) In-Reply-To: References: Message-ID: On Jul 15, 2015, at 16:03, Terry Reedy wrote: > > So here is the proposal: in the call definition, change primary to > possible_call ::= identifier | parenth_form | yield_atom > | attributeref | subscription | slicing | call It would be fun to explain why [1,2]() is a syntax error but (1,2)() a runtime type error, and likewise for ""() vs ("")()... Anyway, I honestly still don't get the motivation here. What makes this kind of type error so special that it should be caught at compile time? We don't expect [1,2]['spam'] to fail to compile; what makes this different? It's a lot simpler to say you can try to call any primary, and you'll get a type error if the primary evaluates to something that's not callable. From rosuav at gmail.com Thu Jul 16 13:54:01 2015 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 16 Jul 2015 21:54:01 +1000 Subject: [Python-ideas] In call grammar, replace primary with possible_call? (was Re: ...quote followed by a left parenthesis...?) In-Reply-To: References: Message-ID: On Thu, Jul 16, 2015 at 9:32 PM, Andrew Barnert via Python-ideas wrote: > On Jul 15, 2015, at 16:03, Terry Reedy wrote: >> >> So here is the proposal: in the call definition, change primary to >> possible_call ::= identifier | parenth_form | yield_atom >> | attributeref | subscription | slicing | call > > It would be fun to explain why [1,2]() is a syntax error but (1,2)() a runtime type error, and likewise for ""() vs ("")()... > > Anyway, I honestly still don't get the motivation here. What makes this kind of type error so special that it should be caught at compile time? We don't expect [1,2]['spam'] to fail to compile; what makes this different? It's a lot simpler to say you can try to call any primary, and you'll get a type error if the primary evaluates to something that's not callable. > It makes good sense to subscript a literal list, though: month_abbr = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"][month] Calling a literal list cannot possibly succeed unless you've hacked something. The only reason that calling a tuple might survive until runtime is because it's a bit harder to detect, but Terry's original post included a footnote about further grammar tweaks that would make that possible. ChrisA From p.f.moore at gmail.com Thu Jul 16 17:53:03 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 16 Jul 2015 16:53:03 +0100 Subject: [Python-ideas] In call grammar, replace primary with possible_call? (was Re: ...quote followed by a left parenthesis...?) In-Reply-To: References: <20150716041821.GF21874@ando.pearwood.info> Message-ID: On 16 July 2015 at 12:28, Chris Angelico wrote: >> Note that the proposal should also include a change to "call": >> >> call ::= possible_call '(' args ')' >> >> replacing >> >> call ::= primary '(' args ')' > > I presume that was the intent :) Yes, I understood that - I just wanted to make it explicit (because it confused me briefly). >> (I was initially confused by the fact that possible_call included call >> as an option, until I remembered how it fitted into the larger >> picture). > > Calling the result of a call is the easiest way to demonstrate nested > functions, closures, etc: Absolutely - my point wasn't that calling the result of a call is wrong, but that I had misunderstood possible_call as a replacement for call, and hence got in a muddle over the proposed grammar change - adding the explicit explanation that call is defined in terms of possible_call helped me, so I thought it might help others too. Maybe it didn't :-) Paul From rosuav at gmail.com Thu Jul 16 18:46:07 2015 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 17 Jul 2015 02:46:07 +1000 Subject: [Python-ideas] In call grammar, replace primary with possible_call? (was Re: ...quote followed by a left parenthesis...?) In-Reply-To: References: <20150716041821.GF21874@ando.pearwood.info> Message-ID: On Fri, Jul 17, 2015 at 1:53 AM, Paul Moore wrote: > Absolutely - my point wasn't that calling the result of a call is > wrong, but that I had misunderstood possible_call as a replacement for > call, and hence got in a muddle over the proposed grammar change - > adding the explicit explanation that call is defined in terms of > possible_call helped me, so I thought it might help others too. Maybe > it didn't :-) Fair enough, nothing wrong with being more explicit in describing a proposal! ChrisA From russell.j.kaplan at gmail.com Thu Jul 16 22:03:50 2015 From: russell.j.kaplan at gmail.com (Russell Kaplan) Date: Thu, 16 Jul 2015 13:03:50 -0700 Subject: [Python-ideas] namedtuple fields with default values Message-ID: I'm using a namedtuple to keep track of several fields, only some of which ever need to be specified during instantiation. However, there is no Pythonic way to create a namedtuple with fields that have default values. >>> Foo = namedtuple('Foo', ['bar', 'optional_baz']) >>> f = Foo('barValue') # Not passing an argument for every field will cause a TypeError Traceback (most recent call last): File "", line 1, in TypeError: __new__() takes exactly 3 arguments (2 given) If you do want default parameters for a namedtuple, the workaround right now involves modifying Foo.__new__'s defaults: >>> Foo = namedtuple('Foo', ['bar', 'optional_baz']) >>> Foo.__new__.__defaults__ = (None, None) Then you can call Foo's constructor without specifying each field: >>> f = Foo('barValue') >>> f Foo(bar='barValue', optional_baz=None) Having to assign to Foo.__new__.__defaults__ is a bit ugly. I think it would be easier and more readable to support syntax like: >>> Foo = namedtuple('Foo', ['optional_bar=None', 'optional_baz=None']) This suggestion is fully backwards compatible and allows for cleaner definitions of nametuples with default-value fields. Thanks for considering. Russell -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Thu Jul 16 22:33:27 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 16 Jul 2015 16:33:27 -0400 Subject: [Python-ideas] In call grammar, replace primary with possible_call? (was Re: ...quote followed by a left parenthesis...?) In-Reply-To: References: <20150716041821.GF21874@ando.pearwood.info> Message-ID: On 7/16/2015 5:00 AM, Paul Moore wrote: > On 16 July 2015 at 05:18, Steven D'Aprano wrote: >> On Wed, Jul 15, 2015 at 07:03:50PM -0400, Terry Reedy wrote: >> >>> So here is the proposal: in the call definition, change primary to >>> possible_call ::= identifier | parenth_form | yield_atom >>> | attributeref | subscription | slicing | call >>> >>> As for error messages, there is another thread suggesting that the >>> messages for SyntexError might be vastly improved. >> >> >> This sounds like a good proposal. +1 > > Note that the proposal should also include a change to "call": > > call ::= possible_call '(' args ')' > > replacing > > call ::= primary '(' args ')' This was the intended meaning of "in the call definition, change primary to \n possible_call...". I should have added 'possible_call, where' to make 2 clauses like so "...to possible_call, where \n possible_call ::= ". Sorry for the confusing sentence. -- Terry Jan Reedy From tjreedy at udel.edu Fri Jul 17 00:27:34 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 16 Jul 2015 18:27:34 -0400 Subject: [Python-ideas] In call grammar, replace primary with possible_call? (was Re: ...quote followed by a left parenthesis...?) In-Reply-To: References: Message-ID: On 7/16/2015 7:32 AM, Andrew Barnert via Python-ideas wrote: > On Jul 15, 2015, at 16:03, Terry Reedy wrote: >> >> So here is the proposal: in the call definition, change primary to possible_call, where >> possible_call ::= identifier | parenth_form | yield_atom >> | attributeref | subscription | slicing | call > > It would be fun to explain why [1,2]() is a syntax error but (1,2)() a runtime type error, and likewise for ""() vs ("")()... Because parenthesis forms hide their contents from surrounding code. That is their purpose. In the examples above, where the content is a single object, there is no effect on the resulting ast. > Anyway, I honestly still don't get the motivation here. Given that I switched from C to Python 18 years ago, I am obviously comfortable with runtime typing. However, I consider the delay in error detection to be a cost, not a feature. Unnoticed runtime errors sometimes sit for days to years. Earlier *is* better. The more-than-compensating benefit of dynamic typing is being able to write generic code and have user class instances participate in syntax on the same basic as with builtin classes. > What makes this kind of type error so special that it should be > caught at compile time? 'SyntaxError' is defined by the grammar + a few extra compile-time checks. There are actually two versions of the grammar -- the one in the docs and the one actually used for the parser. If the grammars are tweaked to better define possibly_callable, then the invalid token sequences will be redefined as SyntaxErrors. The boundary between SyntaxError and runtime errors is somewhat arbitrary. On the other hand, one might argue that once the boundary is set, it should not change lest code be broken. Example of something that runs now and would not with the change. print 1 if False: ''() I am not sure that we have never moved the boundary, but it should be rare. Since the disallowed forms are literals and displays, I cannot think of sensible running code that that would be broken by this proposal. But the above might be enough to scuttle the idea. We don't expect [1,2]['spam'] to fail to compile; what makes this different? The compiler has a constant folder to optimize code. I currently does tuple subscriptions. >>> from dis import dis >>> dis('(1,2)[0]') 1 0 LOAD_CONST 4 (1) 3 RETURN_VALUE >>> dis('(1,2)[1]') 1 0 LOAD_CONST 3 (2) 3 RETURN_VALUE >>> dis('(1,2)[2]') 1 0 LOAD_CONST 2 ((1, 2)) 3 LOAD_CONST 1 (2) 6 BINARY_SUBSCR 7 RETURN_VALUE It must be that the folder discovers the error, but rather than raise immediately, it puts the 'rotten fish' back in the stream to raise a stink when encountered. I started Python with -Wall and entered the code itself and no SyntaxWarning was issued. Given that the error has been detected, but might sit silently for awhile, this seems like a shame. 'Constant' lists are not checked. They are too rare to bother checking for them. >>> dis('[1,2][0]') 1 0 LOAD_CONST 0 (1) 3 LOAD_CONST 1 (2) 6 BUILD_LIST 2 9 LOAD_CONST 2 (0) 12 BINARY_SUBSCR 13 RETURN_VALUE It's a lot simpler to say you can try to call any primary, 'primary' is vague and to me way overly broad. When expanded to basic nonterminals, about half are never callable. Adding one production rule is a minor change. Leaving error detection aside, a benefit for readers would be to better define what is actually a possible callable. Of course, this could be done with a new sentence in the doc without changing the grammer. Maybe this should be the outcome of this issue. Another arguments against the proposal is that code checkers should also be able to detect such non-callable primaries. -- Terry Jan Reedy From abarnert at yahoo.com Fri Jul 17 03:44:26 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 16 Jul 2015 18:44:26 -0700 Subject: [Python-ideas] In call grammar, replace primary with possible_call? (was Re: ...quote followed by a left parenthesis...?) In-Reply-To: References: Message-ID: On Jul 16, 2015, at 15:27, Terry Reedy wrote: > > 'primary' is vague and to me way overly broad. To a normal end-user, the things you syntactically call are basically all values (or, if you prefer, all expressions). Calling a value that's not callable is the same error as adding a thing that's not addable or indexing a thing that's not subscriptable, and that's a TypeError. The existing rules makes sense, and fit in with everything else in Python. For example, we don't consider {1}[1] a syntax error even though it's never valid; how is {1}(1) any different? Even with your suggested change, you still need to teach people the same rule, because s() where s is an int or (1)() are still going to be valid syntax and still going to be guaranteed type errors. All you're doing is catching the least common cause of this type error early, and I don't see who's going to benefit from that. If you're thinking there might be some benefit to teaching, or to self-taught novices, I don't see one; in act, I think it's slightly worse. They still have to learn the type rule to understand why x(0) raises at runtime--and that's far more common than 0(0). (And the current type error is pretty nice.) And if a novice can understand why x(0) is an error, they can understand why 0(0) is an error--and, in fact, they'll understand more easily if they're the same error than if they're different. And, the more someone looks into the current error, the more they'll understand. Why is 0(0) a TypeError? Because a call expression looks up the __call__ method on the object's type, and type(0) is int, and int doesn't have a __call__ method. Well, that makes sense--what would it mean to call an integer? But hey, does that mean I can create my own classes that act like functions? Cool! If you're just looking to catch errors earlier, a static type checker will catch 0(0) in exactly the same way that it will catch x(0) when x has been declared or inferred to be an int. Finally, the existing rule also has the very minor benefit that you can ast.parse 0(0) if you have a reason to do so (whether you're writing a DSL with an import hook, or writing a dev-helper tool of some kind, or even hacking up Python to add an int.__call__ method). I'm not sure why you'd want to do that (that's why it's a very minor benefit...), but I don't see any reason to make the language less consistent and stop you from doing that. > When expanded to basic nonterminals, about half are never callable. Adding one production rule is a minor change. Leaving error detection aside, a benefit for readers would be to better define what is actually a possible callable. But why does a reader need to know what lexical forms are possibly callable? Again, the current rule is dead simple, and totally consistent with subscripting and so on. What's a possible callable? Any value is a possible callable, just as any value is a possible subscriptable and a possible addable and so on. > Of course, this could be done with a new sentence in the doc without changing the grammer. Maybe this should be the outcome of this issue. What sentence are you envisioning here? Do we also need a sentence explaining that subscripting an int literal is always illegal despite not being a syntax error? (And that one, I can imagine a newcomer to Python actually writing, because 0[p] is valid in C...) Or that [0][0](0) is always illegal, or (0)(0)? If not, why do we need a sentence explaining that 0(0) is always illegal? From rosuav at gmail.com Fri Jul 17 03:54:19 2015 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 17 Jul 2015 11:54:19 +1000 Subject: [Python-ideas] In call grammar, replace primary with possible_call? (was Re: ...quote followed by a left parenthesis...?) In-Reply-To: References: Message-ID: On Fri, Jul 17, 2015 at 11:44 AM, Andrew Barnert via Python-ideas wrote: > To a normal end-user, the things you syntactically call are basically all values (or, if you prefer, all expressions). Calling a value that's not callable is the same error as adding a thing that's not addable or indexing a thing that's not subscriptable, and that's a TypeError. The existing rules makes sense, and fit in with everything else in Python. For example, we don't consider {1}[1] a syntax error even though it's never valid; how is {1}(1) any different? > There are other odd cases in the grammar, too, though. Try explaining this one: >>> x = 1 >>> x.to_bytes(4,"little") b'\x01\x00\x00\x00' >>> 1.to_bytes(4,"little") File "", line 1 1.to_bytes(4,"little") ^ SyntaxError: invalid syntax >>> (1).to_bytes(4,"little") b'\x01\x00\x00\x00' If the grammar can catch an error, great! If it can't, it'll get dealt with at run-time. Syntax errors don't have to be reserved for situations that make it impossible to proceed at all. ChrisA From abarnert at yahoo.com Fri Jul 17 04:10:33 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 16 Jul 2015 19:10:33 -0700 Subject: [Python-ideas] In call grammar, replace primary with possible_call? (was Re: ...quote followed by a left parenthesis...?) In-Reply-To: References: Message-ID: <0FF2D363-006D-4335-A70D-025125EBA86D@yahoo.com> On Jul 16, 2015, at 18:54, Chris Angelico wrote: > > On Fri, Jul 17, 2015 at 11:44 AM, Andrew Barnert via Python-ideas > wrote: >> To a normal end-user, the things you syntactically call are basically all values (or, if you prefer, all expressions). Calling a value that's not callable is the same error as adding a thing that's not addable or indexing a thing that's not subscriptable, and that's a TypeError. The existing rules makes sense, and fit in with everything else in Python. For example, we don't consider {1}[1] a syntax error even though it's never valid; how is {1}(1) any different? > > There are other odd cases in the grammar, too, though. Try explaining this one: >>>> x = 1 >>>> x.to_bytes(4,"little") > b'\x01\x00\x00\x00' >>>> 1.to_bytes(4,"little") > File "", line 1 > 1.to_bytes(4,"little") > ^ > SyntaxError: invalid syntax >>>> (1).to_bytes(4,"little") > b'\x01\x00\x00\x00' > If the grammar can catch an error, great! If it can't, it'll get dealt > with at run-time. This is the exact opposite. It's not the grammar sometimes catching an obvious type error earlier, it's the grammar catching something that's perfectly sensible and preventing us from writing it in the obvious way. It's something we're unfortunately forced to do because attribution syntax and float literal syntax are ambiguous. That's not something we'd want to emulate or expand on. If you really want to catch type errors at compile time, that's exactly what static type checkers (whether embedded in the compiler or not) are for; trying to hack up the grammar to do typing without doing typing is only going to catch a handful of very simple cases that nobody really cares about. > Syntax errors don't have to be reserved for > situations that make it impossible to proceed at all. No, but they should be reserved for syntactic errors, and the syntax should be as simple as possible. Making the rules more complicated and less consistent has a cost (not so much for the implementation, as for the person trying to understand the language and keep it in their head). From eric at trueblade.com Fri Jul 17 16:28:19 2015 From: eric at trueblade.com (Eric V. Smith) Date: Fri, 17 Jul 2015 10:28:19 -0400 Subject: [Python-ideas] Disallow "00000" as a synonym for "0" In-Reply-To: References: Message-ID: <55A91103.2080609@trueblade.com> On 07/16/2015 06:15 AM, Neil Girdhar wrote: > As per this > question: http://stackoverflow.com/questions/31447694/why-does-python-3-allow-00-as-a-literal-for-0-but-not-allow-01-as-a-literal > > It seems like Python accepts "000000000" to mean "0". Whatever the > historical reason, should this be deprecated? No. It would needlessly break working code. Eric. From barry at python.org Fri Jul 17 16:44:15 2015 From: barry at python.org (Barry Warsaw) Date: Fri, 17 Jul 2015 10:44:15 -0400 Subject: [Python-ideas] namedtuple fields with default values References: Message-ID: <20150717104415.553a89d2@anarchist.wooz.org> On Jul 16, 2015, at 01:03 PM, Russell Kaplan wrote: >I'm using a namedtuple to keep track of several fields, only some of which >ever need to be specified during instantiation. However, there is no >Pythonic way to create a namedtuple with fields that have default values. I don't know about "Pythonic" but there's a not too horrible way to do it: _Record = namedtuple('Record', 'url destination checksum')('', '', '') def Record(url, destination, checksum=''): return _Record._replace( url=url, destination=destination, checksum=checksum) Now you only need to provide 'url', and 'destination' when you create a Record. Okay, sure _Record is the actual namedtuple, but I usually don't care. >Having to assign to Foo.__new__.__defaults__ is a bit ugly. I think it >would be easier and more readable to support syntax like: >>>> Foo = namedtuple('Foo', ['optional_bar=None', 'optional_baz=None']) That would mean you couldn't ever have an actual parameter called 'optional_bar'. >This suggestion is fully backwards compatible and allows for cleaner >definitions of nametuples with default-value fields. Thanks for considering. Not that I think anything really needs to be done, but a few other approaches could be: * Allow for arbitrary keyword arguments at the end of the signature to define default values. Record = namedtuple('Record', 'url destination checksum', checksum='') * Extend the semantics of field_names to allow for a dictionary of attributes mapping to their default values, though you'd need a marker to be able to specify a required field: Record = namedtuple('Record', dict(url=Required, destination=Required, checksum='')) I suppose I'd prefer the former, although that might cut off the ability to add other controlling arguments to the namedtuple() API. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From eric at trueblade.com Fri Jul 17 16:59:05 2015 From: eric at trueblade.com (Eric V. Smith) Date: Fri, 17 Jul 2015 10:59:05 -0400 Subject: [Python-ideas] namedtuple fields with default values In-Reply-To: <20150717104415.553a89d2@anarchist.wooz.org> References: <20150717104415.553a89d2@anarchist.wooz.org> Message-ID: <55A91839.3050406@trueblade.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 07/17/2015 10:44 AM, Barry Warsaw wrote: > On Jul 16, 2015, at 01:03 PM, Russell Kaplan wrote: > >> I'm using a namedtuple to keep track of several fields, only >> some of which ever need to be specified during instantiation. >> However, there is no Pythonic way to create a namedtuple with >> fields that have default values. > > I don't know about "Pythonic" but there's a not too horrible way > to do it: > > _Record = namedtuple('Record', 'url destination checksum')('', '', > '') > > def Record(url, destination, checksum=''): return _Record._replace( > url=url, destination=destination, checksum=checksum) > > Now you only need to provide 'url', and 'destination' when you > create a Record. Okay, sure _Record is the actual namedtuple, but > I usually don't care. > >> Having to assign to Foo.__new__.__defaults__ is a bit ugly. I >> think it would be easier and more readable to support syntax >> like: >>>>> Foo = namedtuple('Foo', ['optional_bar=None', >>>>> 'optional_baz=None']) > > That would mean you couldn't ever have an actual parameter called > 'optional_bar'. > >> This suggestion is fully backwards compatible and allows for >> cleaner definitions of nametuples with default-value fields. >> Thanks for considering. > > Not that I think anything really needs to be done, but a few other > approaches could be: > > * Allow for arbitrary keyword arguments at the end of the > signature to define default values. > > Record = namedtuple('Record', 'url destination checksum', > checksum='') > > * Extend the semantics of field_names to allow for a dictionary of > attributes mapping to their default values, though you'd need a > marker to be able to specify a required field: > > Record = namedtuple('Record', dict(url=Required, > destination=Required, checksum='')) > > I suppose I'd prefer the former, although that might cut off the > ability to add other controlling arguments to the namedtuple() > API. I've implemented default parameters to namedtuples (and namedlists a mutable version) in https://pypi.python.org/pypi/namedlist The syntax is not great, but none of the options are awesome. >>> Point = namedlist.namedtuple('Point', [('x', 0), ('y', 100)]) >>> p = Point() assert p.x == 0 assert p.y == 100 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iQEcBAEBAgAGBQJVqRg5AAoJENxauZFcKtNxLSYH/iC2Twf2xYhAbtKdMF7nSvG2 xOhhePDOBWl4K9cUVgg0z4jRsnGgL8O8FPCGVmsrBu8arseVAQbsFiAcOsmlKy3k XZGYQ61KPlDUcqxIO661cOWDiRJG/I9ltQHlafODEs4qGJfox2BeM5aCo+cIpyhk uC7wJ/t9mJ9uKvv6mG/e1GixzKtgcCO86NfYwIqiwaZWsKnjkqNzQ1peKm6pGRTK 34u7oCu6EGbqNCUdAJx9Re6Umcs37FH/f3uyc2luJyP9z1p1zz2Ndb00NwOvsGKK NJw5enVB6/qC+sGZDicE1lJmij/MwdTbu/+Fl+JxjZeMGeRWJ9PEyHaxmuZAzw8= =ZXBa -----END PGP SIGNATURE----- From steve at pearwood.info Fri Jul 17 17:22:45 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 18 Jul 2015 01:22:45 +1000 Subject: [Python-ideas] Disallow "00000" as a synonym for "0" In-Reply-To: <55A91103.2080609@trueblade.com> References: <55A91103.2080609@trueblade.com> Message-ID: <20150717152241.GG21874@ando.pearwood.info> On Fri, Jul 17, 2015 at 10:28:19AM -0400, Eric V. Smith wrote: > On 07/16/2015 06:15 AM, Neil Girdhar wrote: > > As per this > > question: http://stackoverflow.com/questions/31447694/why-does-python-3-allow-00-as-a-literal-for-0-but-not-allow-01-as-a-literal > > > > It seems like Python accepts "000000000" to mean "0". Whatever the > > historical reason, should this be deprecated? > > No. It would needlessly break working code. I wonder what working code uses 00 when 0 is wanted? Do you have any examples? I believe that anyone writing 00 is more likely to have made a typo than to actually intend to get 0. In Python 2, 00 has an obvious and correct interpretation: it is zero in octal. But in Python 3, octal is written with the prefix 0o not 0. py> 0o10 8 py> 010 File "", line 1 010 ^ SyntaxError: invalid token (The 0o prefix also works in Python 2.7.) In Python 3, 00 has no sensible meaning. It's not octal, binary or hex, and it shouldn't be decimal. Decimal integers are explicitly prohibited from beginning with a leading zero: https://docs.python.org/3/reference/lexical_analysis.html#integers so the mystery is why *zero* is a special case permitted to have leading zeroes. The lexical definition of "decimal integer" is: decimalinteger ::= nonzerodigit digit* | "0"+ Why was it defined that way? The more obvious: decimalinteger ::= nonzerodigit digit* | "0" was the definition in Python 2. As the Stackoverflow post above points out, the definition of decimalinteger actually in use seems to violate PEP 3127, and supporting "0"+ was added as a special case by Georg Brandl. Since leading 0 digits in decimal int literals are prohibited, we cannot write 0001, 0023 etc. Why would we write 0000 to get zero? Unless somebody can give a good explanation for why leading zeroes are permitted for zero, I think it was a mistake to allow them, and is an ugly wart on the language. I think that should be deprecated and eventually removed. Since it only affects int literals, any deprecation warning will occur at compile-time, so it shouldn't have any impact on runtime performance. -- Steve From solipsis at pitrou.net Fri Jul 17 17:33:57 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 17 Jul 2015 17:33:57 +0200 Subject: [Python-ideas] namedtuple fields with default values References: <20150717104415.553a89d2@anarchist.wooz.org> Message-ID: <20150717173357.379427f4@fsol> On Fri, 17 Jul 2015 10:44:15 -0400 Barry Warsaw wrote: > > I don't know about "Pythonic" but there's a not too horrible way to do it: > > _Record = namedtuple('Record', 'url destination checksum')('', '', '') > > def Record(url, destination, checksum=''): > return _Record._replace( > url=url, destination=destination, checksum=checksum) My usual pattern is to subclass the namedtuple class and override the constructor (especially if I want to add behaviour): class Record(namedtuple('_Record', ('url', 'dest', 'checksum'))): __slots__ = () def __new__(...): #etc Regards Antoine. From antony.lee at berkeley.edu Fri Jul 17 18:21:20 2015 From: antony.lee at berkeley.edu (Antony Lee) Date: Fri, 17 Jul 2015 09:21:20 -0700 Subject: [Python-ideas] namedtuple fields with default values In-Reply-To: <20150717173357.379427f4@fsol> References: <20150717104415.553a89d2@anarchist.wooz.org> <20150717173357.379427f4@fsol> Message-ID: The problem of subclassing is the somewhat tedious repetition in the argument list of __new__. You can abuse the fact that function definition allows you to 1. create an object who knows its name and 2. list arguments, with or without default values to write a decorator @Namedtuple, such that @Namedtuple def Record(foo, bar=default): pass (I have an implementation but it's nothing really special, just introspecting the signature object.) Antony 2015-07-17 8:33 GMT-07:00 Antoine Pitrou : > On Fri, 17 Jul 2015 10:44:15 -0400 > Barry Warsaw wrote: > > > > I don't know about "Pythonic" but there's a not too horrible way to do > it: > > > > _Record = namedtuple('Record', 'url destination checksum')('', '', '') > > > > def Record(url, destination, checksum=''): > > return _Record._replace( > > url=url, destination=destination, checksum=checksum) > > My usual pattern is to subclass the namedtuple class and override the > constructor (especially if I want to add behaviour): > > class Record(namedtuple('_Record', ('url', 'dest', 'checksum'))): > __slots__ = () > > def __new__(...): #etc > > > Regards > > Antoine. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Fri Jul 17 18:59:21 2015 From: guido at python.org (Guido van Rossum) Date: Fri, 17 Jul 2015 18:59:21 +0200 Subject: [Python-ideas] namedtuple fields with default values In-Reply-To: <55A91839.3050406@trueblade.com> References: <20150717104415.553a89d2@anarchist.wooz.org> <55A91839.3050406@trueblade.com> Message-ID: FWIW, PEP 484 (and hence typing.py) uses this syntax with a different meaning: * NamedTuple, used as ``NamedTuple(type_name, [(field_name, field_type), ...])`` and equivalent to ``collections.namedtuple(type_name, [field_name, ...])``. This is useful to declare the types of the fields of a a named tuple type. On Fri, Jul 17, 2015 at 4:59 PM, Eric V. Smith wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 07/17/2015 10:44 AM, Barry Warsaw wrote: > > On Jul 16, 2015, at 01:03 PM, Russell Kaplan wrote: > > > >> I'm using a namedtuple to keep track of several fields, only > >> some of which ever need to be specified during instantiation. > >> However, there is no Pythonic way to create a namedtuple with > >> fields that have default values. > > > > I don't know about "Pythonic" but there's a not too horrible way > > to do it: > > > > _Record = namedtuple('Record', 'url destination checksum')('', '', > > '') > > > > def Record(url, destination, checksum=''): return _Record._replace( > > url=url, destination=destination, checksum=checksum) > > > > Now you only need to provide 'url', and 'destination' when you > > create a Record. Okay, sure _Record is the actual namedtuple, but > > I usually don't care. > > > >> Having to assign to Foo.__new__.__defaults__ is a bit ugly. I > >> think it would be easier and more readable to support syntax > >> like: > >>>>> Foo = namedtuple('Foo', ['optional_bar=None', > >>>>> 'optional_baz=None']) > > > > That would mean you couldn't ever have an actual parameter called > > 'optional_bar'. > > > >> This suggestion is fully backwards compatible and allows for > >> cleaner definitions of nametuples with default-value fields. > >> Thanks for considering. > > > > Not that I think anything really needs to be done, but a few other > > approaches could be: > > > > * Allow for arbitrary keyword arguments at the end of the > > signature to define default values. > > > > Record = namedtuple('Record', 'url destination checksum', > > checksum='') > > > > * Extend the semantics of field_names to allow for a dictionary of > > attributes mapping to their default values, though you'd need a > > marker to be able to specify a required field: > > > > Record = namedtuple('Record', dict(url=Required, > > destination=Required, checksum='')) > > > > I suppose I'd prefer the former, although that might cut off the > > ability to add other controlling arguments to the namedtuple() > > API. > > I've implemented default parameters to namedtuples (and namedlists a > mutable version) in https://pypi.python.org/pypi/namedlist > > The syntax is not great, but none of the options are awesome. > > >>> Point = namedlist.namedtuple('Point', [('x', 0), ('y', 100)]) > >>> p = Point() assert p.x == 0 assert p.y == 100 > > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.14 (GNU/Linux) > > iQEcBAEBAgAGBQJVqRg5AAoJENxauZFcKtNxLSYH/iC2Twf2xYhAbtKdMF7nSvG2 > xOhhePDOBWl4K9cUVgg0z4jRsnGgL8O8FPCGVmsrBu8arseVAQbsFiAcOsmlKy3k > XZGYQ61KPlDUcqxIO661cOWDiRJG/I9ltQHlafODEs4qGJfox2BeM5aCo+cIpyhk > uC7wJ/t9mJ9uKvv6mG/e1GixzKtgcCO86NfYwIqiwaZWsKnjkqNzQ1peKm6pGRTK > 34u7oCu6EGbqNCUdAJx9Re6Umcs37FH/f3uyc2luJyP9z1p1zz2Ndb00NwOvsGKK > NJw5enVB6/qC+sGZDicE1lJmij/MwdTbu/+Fl+JxjZeMGeRWJ9PEyHaxmuZAzw8= > =ZXBa > -----END PGP SIGNATURE----- > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.brandl at gmx.net Fri Jul 17 20:33:55 2015 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 17 Jul 2015 20:33:55 +0200 Subject: [Python-ideas] Disallow "00000" as a synonym for "0" In-Reply-To: <20150717152241.GG21874@ando.pearwood.info> References: <55A91103.2080609@trueblade.com> <20150717152241.GG21874@ando.pearwood.info> Message-ID: On 07/17/2015 05:22 PM, Steven D'Aprano wrote: > Why was it defined that way? The more obvious: > > decimalinteger ::= nonzerodigit digit* | "0" > > was the definition in Python 2. As the Stackoverflow post above points > out, the definition of decimalinteger actually in use seems to violate > PEP 3127, and supporting "0"+ was added as a special case by Georg > Brandl. > > Since leading 0 digits in decimal int literals are prohibited, we cannot > write 0001, 0023 etc. Why would we write 0000 to get zero? I could tell you, but then I'd have to kill you. Georg From random832 at fastmail.us Sat Jul 18 19:35:49 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Sat, 18 Jul 2015 13:35:49 -0400 Subject: [Python-ideas] Disallow "00000" as a synonym for "0" In-Reply-To: <55A91103.2080609@trueblade.com> References: <55A91103.2080609@trueblade.com> Message-ID: <1437240949.1225730.327053993.390D6337@webmail.messagingengine.com> On Fri, Jul 17, 2015, at 10:28, Eric V. Smith wrote: > On 07/16/2015 06:15 AM, Neil Girdhar wrote: > > As per this > > question: http://stackoverflow.com/questions/31447694/why-does-python-3-allow-00-as-a-literal-for-0-but-not-allow-01-as-a-literal > > > > It seems like Python accepts "000000000" to mean "0". Whatever the > > historical reason, should this be deprecated? > > No. It would needlessly break working code. Counter-proposal - allow 00001 through 00009 since they are equally unambiguous. From ron3200 at gmail.com Sat Jul 18 21:40:28 2015 From: ron3200 at gmail.com (Ron Adam) Date: Sat, 18 Jul 2015 15:40:28 -0400 Subject: [Python-ideas] Disallow "00000" as a synonym for "0" In-Reply-To: <20150717152241.GG21874@ando.pearwood.info> References: <55A91103.2080609@trueblade.com> <20150717152241.GG21874@ando.pearwood.info> Message-ID: On 07/17/2015 11:22 AM, Steven D'Aprano wrote: > On Fri, Jul 17, 2015 at 10:28:19AM -0400, Eric V. Smith wrote: >> >On 07/16/2015 06:15 AM, Neil Girdhar wrote: >>> > >As per this >>> > >question:http://stackoverflow.com/questions/31447694/why-does-python-3-allow-00-as-a-literal-for-0-but-not-allow-01-as-a-literal >>> > > >>> > >It seems like Python accepts "000000000" to mean "0". Whatever the >>> > >historical reason, should this be deprecated? >> > >> >No. It would needlessly break working code. > I wonder what working code uses 00 when 0 is wanted? Do you have any > examples? I believe that anyone writing 00 is more likely to have made > a typo than to actually intend to get 0. > > In Python 2, 00 has an obvious and correct interpretation: it is zero in > octal. But in Python 3, octal is written with the prefix 0o not 0. And then there is this... >>> 000.0 0.0 >>> 000.1 0.1 Cheers, Ron From steve at pearwood.info Sun Jul 19 05:17:57 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 19 Jul 2015 13:17:57 +1000 Subject: [Python-ideas] Disallow "00000" as a synonym for "0" In-Reply-To: References: <55A91103.2080609@trueblade.com> <20150717152241.GG21874@ando.pearwood.info> Message-ID: <20150719031756.GI21874@ando.pearwood.info> On Sat, Jul 18, 2015 at 03:40:28PM -0400, Ron Adam wrote: > And then there is this... > > >>> 000.0 > 0.0 > >>> 000.1 > 0.1 The parsing rules for floats are not the same as int, and since floats always use decimal, never octal, there's no ambiguity or confusion from writing "0000.0000". I don't propose changing floats. -- Steve From steve at pearwood.info Sun Jul 19 13:03:22 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 19 Jul 2015 21:03:22 +1000 Subject: [Python-ideas] Disallow "00000" as a synonym for "0" In-Reply-To: <1437240949.1225730.327053993.390D6337@webmail.messagingengine.com> References: <55A91103.2080609@trueblade.com> <1437240949.1225730.327053993.390D6337@webmail.messagingengine.com> Message-ID: <20150719110321.GA25179@ando.pearwood.info> On Sat, Jul 18, 2015 at 01:35:49PM -0400, random832 at fastmail.us wrote: > On Fri, Jul 17, 2015, at 10:28, Eric V. Smith wrote: > > On 07/16/2015 06:15 AM, Neil Girdhar wrote: > > > As per this > > > question: http://stackoverflow.com/questions/31447694/why-does-python-3-allow-00-as-a-literal-for-0-but-not-allow-01-as-a-literal > > > > > > It seems like Python accepts "000000000" to mean "0". Whatever the > > > historical reason, should this be deprecated? > > > > No. It would needlessly break working code. > > Counter-proposal - allow 00001 through 00009 since they are equally > unambiguous. Do you mean up to 007, since they are the same in oct and dec? In either case, that introduces even more special cases. Whether it is one special case, 000, or eight, or ten, 000 through 009, the questions remain: - why does (let's say) `n = 02` work, but `n = 012` fail? - why would you intentionally write `n = 00' when you could simply write `n = 0`? Backwards compatibilty aside, I don't think there's any reason to keep this feature. I suspect that it can only mask typos, not be used for any sensible reason. Back when 000 meant octal zero, it might have made sense to write it that way to align with a bunch of other octal numbers, but now you would surely write 0o00 to align with 0o10. http://bugs.python.org/issue24668 -- Steve From ron3200 at gmail.com Sun Jul 19 18:22:49 2015 From: ron3200 at gmail.com (Ron Adam) Date: Sun, 19 Jul 2015 12:22:49 -0400 Subject: [Python-ideas] Disallow "00000" as a synonym for "0" In-Reply-To: <20150719031756.GI21874@ando.pearwood.info> References: <55A91103.2080609@trueblade.com> <20150717152241.GG21874@ando.pearwood.info> <20150719031756.GI21874@ando.pearwood.info> Message-ID: On 07/18/2015 11:17 PM, Steven D'Aprano wrote: > On Sat, Jul 18, 2015 at 03:40:28PM -0400, Ron Adam wrote: > >> >And then there is this... >> > >>>>> > >>>000.0 >> >0.0 >>>>> > >>>000.1 >> >0.1 > The parsing rules for floats are not the same as int, Umm.. yes. But are they intentional different? > and since floats > always use decimal, never octal, there's no ambiguity or confusion from > writing "0000.0000". I don't propose changing floats. Me either. It seems to me int's should be relaxed to allow for leading zero's instead. The most common use of leading zero's is when numbers in strings are used and those numbers are sorted. Without the leading zero's the sort order may not be in numerical order. The int class supports using leading zero's in strings, but not in literal form. (Although it may use a C string to int function when parsing it.) >>> int("0000") 0 >>> int("007") 7 >>> int(0000) 0 >>> int(007) File "", line 1 int(007) ^ SyntaxError: invalid token >>> int(007.0) 7 >>> float("0001") 1.0 It's common to cut and paste numerical data from text data. If you are cutting numerical data from a table, then you may also need to remove all the leading zeros. A macro can do that, but it complicates a simple cut and paste operation. Note that this does not effect the internal representation of ints, only how python interprets the string literal during the compiling process. A social reason for this limitation is that a number of other languages do use a leading digit 0 to define octal numbers. And this helps catch silent errors due to copying numbers directly in that case. (Python uses "0o" and not just "0".) In python, it just catches a possible silent error, when a 0onnn is mistyped as 00nnn. So this looks like it's a quick BDFL judgement call to me. (or other core developer number specialist call.) I think the status quo wins by default otherwise. Cheers, Ron From rosuav at gmail.com Sun Jul 19 18:37:44 2015 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 20 Jul 2015 02:37:44 +1000 Subject: [Python-ideas] Disallow "00000" as a synonym for "0" In-Reply-To: References: <55A91103.2080609@trueblade.com> <20150717152241.GG21874@ando.pearwood.info> <20150719031756.GI21874@ando.pearwood.info> Message-ID: On Mon, Jul 20, 2015 at 2:22 AM, Ron Adam wrote: > The most common use of leading zero's is when numbers in strings are used > and those numbers are sorted. Without the leading zero's the sort order may > not be in numerical order. > > The int class supports using leading zero's in strings, but not in literal > form. (Although it may use a C string to int function when parsing it.) > >>>> int("0000") > 0 >>>> int("007") > 7 > >>>> int(0000) > 0 > >>>> int(007) > File "", line 1 > int(007) > ^ > SyntaxError: invalid token > >>>> int(007.0) > 7 > >>>> float("0001") > 1.0 Yes, because int() can take an extra parameter to specify the base. Source code can't. > A social reason for this limitation is that a number of other languages do > use a leading digit 0 to define octal numbers. And this helps catch silent > errors due to copying numbers directly in that case. (Python uses "0o" and > not just "0".) Python 2 also supported C-style "0777" to mean 511, and it's all through Unix software (eg "0777" is a common notation for a world-writable directory's permission bits; since there are three bits (r/w/x) for each of the three categories (u/g/o), it makes sense to use octal). Supporting "0777" in source code and having it mean 777 decimal would be extremely confusing, which is why it's a straight-up error in Python 3. ChrisA From python-ideas at mgmiller.net Mon Jul 20 01:12:31 2015 From: python-ideas at mgmiller.net (Mike Miller) Date: Sun, 19 Jul 2015 16:12:31 -0700 Subject: [Python-ideas] Briefer string format In-Reply-To: References: Message-ID: <55AC2EDF.7040205@mgmiller.net> Have long wished python could format strings easily like bash or perl do, ... and then it hit me: csstext += f'{nl}{selector}{space}{{{nl}' (This script included whitespace vars to provide a minification option.) I've seen others make similar suggestions, but to my knowledge they didn't include this pleasing brevity aspect. -Mike From eric at trueblade.com Mon Jul 20 01:27:42 2015 From: eric at trueblade.com (Eric V. Smith) Date: Sun, 19 Jul 2015 19:27:42 -0400 Subject: [Python-ideas] Briefer string format In-Reply-To: <55AC2EDF.7040205@mgmiller.net> References: <55AC2EDF.7040205@mgmiller.net> Message-ID: On Jul 19, 2015, at 7:12 PM, Mike Miller wrote: > > Have long wished python could format strings easily like bash or perl do, ... > and then it hit me: > > csstext += f'{nl}{selector}{space}{{{nl}' > > (This script included whitespace vars to provide a minification option.) > > I've seen others make similar suggestions, but to my knowledge they didn't > include this pleasing brevity aspect. What would this do? It's not clear from your description. Eric. From python-ideas at mgmiller.net Mon Jul 20 01:35:01 2015 From: python-ideas at mgmiller.net (Mike Miller) Date: Sun, 19 Jul 2015 16:35:01 -0700 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> Message-ID: <55AC3425.5010509@mgmiller.net> Hi, Ok, I kept the message brief because I thought this subject had previously been discussed often. I've expanded it to explain better for those that are interested. --- Needed to whip-up some css strings, took a look at the formatting I had done and thought it was pretty ugly. I started with the printf style, and had pulled out the whitespace as vars in order to have a minification option: csstext += '%s%s%s{%s' % (nl, key, space, nl) Decent but not great, a bit hard on the eyes. So I decided to try .format(): csstext += '{nl}{key}{space}{{{nl}'.format(**locals()) This looks a bit better if you ignore the right half, but it is longer and not as simple as one might hope. It is much longer still if you type out the variables needed as kewword params! The '{}' option is not much improvement either. csstext += '{nl}{key}{space}{{{nl}'.format(nl=nl, key=key, ... # uggh csstext += '{}{}{}{{{}'.format(nl, key, space, nl) I've long wished python could format strings easily like bash or perl do, ... and then it hit me: csstext += f'{nl}{key}{space}{{{nl}' An "f-formatted" string could automatically format with the locals dict. Not yet sure about globals, and unicode only suggested for now. Perhaps could be done directly to avoid the .format() function call, which adds some overhead and tends to double the length of the line? I remember a GvR talk a few years ago giving a 'meh' on .format() and have agreed, using it only when I have a very large or complicated string-building need, at the point where it begins to overlap Jinja territory. Perhaps this is one way to make it more comfortable for everyday usage. I've seen others make similar suggestions, but to my knowledge they didn't include this pleasing brevity aspect. -Mike On 07/19/2015 04:27 PM, Eric V. Smith wrote: > On Jul 19, 2015, at 7:12 PM, Mike Miller wrote: >> >> Have long wished python could format strings easily like bash or perl do, ... >> and then it hit me: >> >> csstext += f'{nl}{selector}{space}{{{nl}' >> >> (This script included whitespace vars to provide a minification option.) >> >> I've seen others make similar suggestions, but to my knowledge they didn't >> include this pleasing brevity aspect. > > What would this do? It's not clear from your description. > > Eric. > > From rosuav at gmail.com Mon Jul 20 01:44:09 2015 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 20 Jul 2015 09:44:09 +1000 Subject: [Python-ideas] Briefer string format In-Reply-To: <55AC3425.5010509@mgmiller.net> References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> Message-ID: On Mon, Jul 20, 2015 at 9:35 AM, Mike Miller wrote: > I've long wished python could format strings easily like bash or perl do, > ... > and then it hit me: > > csstext += f'{nl}{key}{space}{{{nl}' > > An "f-formatted" string could automatically format with the locals dict. Not > yet sure about globals, and unicode only suggested for now. Perhaps could > be > done directly to avoid the .format() function call, which adds some overhead > and tends to double the length of the line? Point to note: Currently, all the string prefixes are compile-time directives only. A b"bytes" or u"unicode" prefix affects what kind of object is produced, and all the others are just syntactic differences. In all cases, a string literal is a single immutable object which can be stashed away as a constant. What you're suggesting here is a thing that looks like a literal, but is actually a run-time operation. As such, I'm pretty dubious; coupled with the magic of dragging values out of the enclosing namespace, it's going to be problematic as regards code refactoring. Also, you're going to have heaps of people arguing that this should be a shorthand for str.format(**locals()), and about as many arguing that it should follow the normal name lookups (locals, nonlocals, globals, builtins). I'm -1 on the specific idea, though definitely sympathetic to the broader concept of simplified formatting of strings. Python's printf-style formatting has its own warts (mainly because of the cute use of an operator, rather than doing it as a function call), and still has the problem of having percent markers with no indication of what they'll be interpolating in. Anything that's explicit is excessively verbose, anything that isn't is cryptic. There's no easy fix. ChrisA From bruce at leban.us Mon Jul 20 01:46:06 2015 From: bruce at leban.us (Bruce Leban) Date: Sun, 19 Jul 2015 16:46:06 -0700 Subject: [Python-ideas] Briefer string format In-Reply-To: <55AC3425.5010509@mgmiller.net> References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> Message-ID: Automatically injecting from the locals or globals is a nice source of bugs. Explicit is better than implicit, especially in case where it can lead to security bugs. -1 --- Bruce Check out my new puzzle book: http://J.mp/ingToConclusions Get it free here: http://J.mp/ingToConclusionsFree (available on iOS) On Sun, Jul 19, 2015 at 4:35 PM, Mike Miller wrote: > Hi, > > Ok, I kept the message brief because I thought this subject had previously > been discussed often. I've expanded it to explain better for those that > are interested. > > --- > > Needed to whip-up some css strings, took a look at the formatting I had > done > and thought it was pretty ugly. I started with the printf style, and had > pulled out the whitespace as vars in order to have a minification option: > > csstext += '%s%s%s{%s' % (nl, key, space, nl) > > Decent but not great, a bit hard on the eyes. So I decided to try > .format(): > > csstext += '{nl}{key}{space}{{{nl}'.format(**locals()) > > This looks a bit better if you ignore the right half, but it is longer and > not > as simple as one might hope. It is much longer still if you type out the > variables needed as kewword params! The '{}' option is not much > improvement > either. > > csstext += '{nl}{key}{space}{{{nl}'.format(nl=nl, key=key, ... # uggh > csstext += '{}{}{}{{{}'.format(nl, key, space, nl) > > I've long wished python could format strings easily like bash or perl do, > ... > and then it hit me: > > csstext += f'{nl}{key}{space}{{{nl}' > > An "f-formatted" string could automatically format with the locals dict. > Not > yet sure about globals, and unicode only suggested for now. Perhaps could > be > done directly to avoid the .format() function call, which adds some > overhead > and tends to double the length of the line? > > I remember a GvR talk a few years ago giving a 'meh' on .format() and have > agreed, using it only when I have a very large or complicated > string-building > need, at the point where it begins to overlap Jinja territory. Perhaps > this is > one way to make it more comfortable for everyday usage. > > I've seen others make similar suggestions, but to my knowledge they didn't > include this pleasing brevity aspect. > > -Mike > > > > On 07/19/2015 04:27 PM, Eric V. Smith wrote: > >> On Jul 19, 2015, at 7:12 PM, Mike Miller >> wrote: >> >>> >>> Have long wished python could format strings easily like bash or perl >>> do, ... >>> and then it hit me: >>> >>> csstext += f'{nl}{selector}{space}{{{nl}' >>> >>> (This script included whitespace vars to provide a minification option.) >>> >>> I've seen others make similar suggestions, but to my knowledge they >>> didn't >>> include this pleasing brevity aspect. >>> >> >> What would this do? It's not clear from your description. >> >> Eric. >> >> >> _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Mon Jul 20 01:50:52 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 19 Jul 2015 19:50:52 -0400 Subject: [Python-ideas] Briefer string format In-Reply-To: <55AC2EDF.7040205@mgmiller.net> References: <55AC2EDF.7040205@mgmiller.net> Message-ID: On 7/19/2015 7:12 PM, Mike Miller wrote: > Have long wished python could format strings easily like bash or perl > do, ... > and then it hit me: > > csstext += f'{nl}{selector}{space}{{{nl}' Are the unbalanced braces here and in the followup intentional? -- Terry Jan Reedy From pmiscml at gmail.com Mon Jul 20 01:59:19 2015 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Mon, 20 Jul 2015 02:59:19 +0300 Subject: [Python-ideas] Briefer string format In-Reply-To: <55AC3425.5010509@mgmiller.net> References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> Message-ID: <20150720025919.3652fb80@x230> Hello, On Sun, 19 Jul 2015 16:35:01 -0700 Mike Miller wrote: [] > csstext += f'{nl}{key}{space}{{{nl}' > > An "f-formatted" string could automatically format with the locals > dict. Not yet sure about globals, and unicode only suggested for > now. "Not sure" sounds convincing. Deal - let's keep being explicit rather than implicit. Brevity? def _(fmt, dict): return fmt.format(**dict) __ = globals() ___ = locals() foo = 42 _("{foo}", __()) If that's not terse enough, you can take Python3, and go thru Unicode planes looking for funky-looking letters, then you hopefully can reduce to .("{foo}", .()) Where dots aren't dots, but funky-looking letters. -- Best regards, Paul mailto:pmiscml at gmail.com From omalsa04 at gmail.com Mon Jul 20 02:00:41 2015 From: omalsa04 at gmail.com (Sam O'Malley) Date: Mon, 20 Jul 2015 00:00:41 +0000 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> Message-ID: What about: csstext += '{nl}{key}{space}{nl}'.format() Format called with no arguments could default to reading from the locals dict. On Mon, 20 Jul 2015 at 9:22 am Terry Reedy wrote: > On 7/19/2015 7:12 PM, Mike Miller wrote: > > Have long wished python could format strings easily like bash or perl > > do, ... > > and then it hit me: > > > > csstext += f'{nl}{selector}{space}{{{nl}' > > Are the unbalanced braces here and in the followup intentional? > > > -- > Terry Jan Reedy > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From joejev at gmail.com Mon Jul 20 02:02:40 2015 From: joejev at gmail.com (Joseph Jevnik) Date: Sun, 19 Jul 2015 20:02:40 -0400 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> Message-ID: That is no better than: str.format(**locals(), **globals()) On Sun, Jul 19, 2015 at 8:00 PM, Sam O'Malley wrote: > What about: > > csstext += '{nl}{key}{space}{nl}'.format() > > Format called with no arguments could default to reading from the locals > dict. > On Mon, 20 Jul 2015 at 9:22 am Terry Reedy wrote: > >> On 7/19/2015 7:12 PM, Mike Miller wrote: >> > Have long wished python could format strings easily like bash or perl >> > do, ... >> > and then it hit me: >> > >> > csstext += f'{nl}{selector}{space}{{{nl}' >> >> Are the unbalanced braces here and in the followup intentional? >> >> >> -- >> Terry Jan Reedy >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Mon Jul 20 02:04:13 2015 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 20 Jul 2015 10:04:13 +1000 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> Message-ID: On Mon, Jul 20, 2015 at 9:50 AM, Terry Reedy wrote: > On 7/19/2015 7:12 PM, Mike Miller wrote: >> >> Have long wished python could format strings easily like bash or perl >> do, ... >> and then it hit me: >> >> csstext += f'{nl}{selector}{space}{{{nl}' > > > Are the unbalanced braces here and in the followup intentional? I expect they are - compare the percent formatting example: > csstext += '%s%s%s{%s' % (nl, key, space, nl) The double open brace makes for a literal open brace in the end result. It's the same ugliness as trying to craft a regular expression to match Windows path names without raw string literals, so I completely sympathize with the desire for something better. But I don't think f"fmt" is it :) ChrisA From python-ideas at mgmiller.net Mon Jul 20 02:23:10 2015 From: python-ideas at mgmiller.net (Mike Miller) Date: Sun, 19 Jul 2015 17:23:10 -0700 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> Message-ID: <55AC3F6E.1030301@mgmiller.net> There's a trade-off, but you will need to make sure your formatting strings/vars match regardless of using this idea or not. -Mike On 07/19/2015 04:46 PM, Bruce Leban wrote: > Automatically injecting from the locals or globals is a nice source of bugs. > Explicit is better than implicit, especially in case where it can lead to > security bugs. > > -1 > From ncoghlan at gmail.com Mon Jul 20 02:38:02 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 20 Jul 2015 10:38:02 +1000 Subject: [Python-ideas] Disallow "00000" as a synonym for "0" In-Reply-To: References: <55A91103.2080609@trueblade.com> <20150717152241.GG21874@ando.pearwood.info> <20150719031756.GI21874@ando.pearwood.info> Message-ID: On 20 Jul 2015 2:38 am, "Chris Angelico" wrote: > > On Mon, Jul 20, 2015 at 2:22 AM, Ron Adam wrote: > > > A social reason for this limitation is that a number of other languages do > > use a leading digit 0 to define octal numbers. And this helps catch silent > > errors due to copying numbers directly in that case. (Python uses "0o" and > > not just "0".) > > Python 2 also supported C-style "0777" to mean 511, and it's all > through Unix software (eg "0777" is a common notation for a > world-writable directory's permission bits; since there are three bits > (r/w/x) for each of the three categories (u/g/o), it makes sense to > use octal). Supporting "0777" in source code and having it mean 777 > decimal would be extremely confusing, which is why it's a straight-up > error in Python 3. Exactly - the special case here is *dis*allowing leading zeroes for non-zero integer literals, since we can't know if they're supposed to be octal values or not, or if the "b" or "x" was left out of a binary or hex literal that has been padded out to a fixed number of bits, or if the decimal point was left out of a floating point literal. The one integer literal case that *could* be reliably kept consistent with the general mathematical notation of "leading zeroes are permitted, but have no significance" was zero itself, which is what was done. Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Steve.Dower at microsoft.com Mon Jul 20 02:43:29 2015 From: Steve.Dower at microsoft.com (Steve Dower) Date: Mon, 20 Jul 2015 00:43:29 +0000 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net>, Message-ID: "Point to note: Currently, all the string prefixes are compile-time directives only. A b"bytes" or u"unicode" prefix affects what kind of object is produced, and all the others are just syntactic differences. In all cases, a string literal is a single immutable object which can be stashed away as a constant. What you're suggesting here is a thing that looks like a literal, but is actually a run-time operation." Why wouldn't this be a compile time transform from f"string with braces" into "string with braces".format(x=x, y=y, ...) where x, y, etc are the names in each pair of braces (with an error if it can't get a valid identifier out of each format code)? It's syntactic sugar for a simple function call with perfectly well defined semantics - you don't even have to modify the string literal. Defined as a compile time transform like this, I'm +1. As soon as any suggestion mentions "locals()" or "globals()" I'm -1. Cheers, Steve Top-posted from my Windows Phone ________________________________ From: Chris Angelico Sent: ?7/?19/?2015 16:44 Cc: python-ideas at python.org Subject: Re: [Python-ideas] Briefer string format On Mon, Jul 20, 2015 at 9:35 AM, Mike Miller wrote: > I've long wished python could format strings easily like bash or perl do, > ... > and then it hit me: > > csstext += f'{nl}{key}{space}{{{nl}' > > An "f-formatted" string could automatically format with the locals dict. Not > yet sure about globals, and unicode only suggested for now. Perhaps could > be > done directly to avoid the .format() function call, which adds some overhead > and tends to double the length of the line? Point to note: Currently, all the string prefixes are compile-time directives only. A b"bytes" or u"unicode" prefix affects what kind of object is produced, and all the others are just syntactic differences. In all cases, a string literal is a single immutable object which can be stashed away as a constant. What you're suggesting here is a thing that looks like a literal, but is actually a run-time operation. As such, I'm pretty dubious; coupled with the magic of dragging values out of the enclosing namespace, it's going to be problematic as regards code refactoring. Also, you're going to have heaps of people arguing that this should be a shorthand for str.format(**locals()), and about as many arguing that it should follow the normal name lookups (locals, nonlocals, globals, builtins). I'm -1 on the specific idea, though definitely sympathetic to the broader concept of simplified formatting of strings. Python's printf-style formatting has its own warts (mainly because of the cute use of an operator, rather than doing it as a function call), and still has the problem of having percent markers with no indication of what they'll be interpolating in. Anything that's explicit is excessively verbose, anything that isn't is cryptic. There's no easy fix. ChrisA _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Mon Jul 20 02:51:28 2015 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 20 Jul 2015 10:51:28 +1000 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> Message-ID: On Mon, Jul 20, 2015 at 10:43 AM, Steve Dower wrote: > "Point to note: Currently, all the string prefixes are compile-time > directives only. A b"bytes" or u"unicode" prefix affects what kind of > object is produced, and all the others are just syntactic differences. > In all cases, a string literal is a single immutable object which can > be stashed away as a constant. What you're suggesting here is a thing > that looks like a literal, but is actually a run-time operation." > > Why wouldn't this be a compile time transform from f"string with braces" > into "string with braces".format(x=x, y=y, ...) where x, y, etc are the > names in each pair of braces (with an error if it can't get a valid > identifier out of each format code)? It's syntactic sugar for a simple > function call with perfectly well defined semantics - you don't even have to > modify the string literal. > > Defined as a compile time transform like this, I'm +1. As soon as any > suggestion mentions "locals()" or "globals()" I'm -1. It'd obviously have to be a compile-time transformation. My point is that it would, unlike all other forms of literal, translate into a function call. How is your "x=x, y=y" version materially different from explicitly mentioning locals() or globals()? The only significant difference is that your version follows the scope order outward, where locals() and globals() call up a specific scope each. Will an f"..." format string be mergeable with other strings? All the other types of literal can be (apart, of course, from mixing bytes and unicode), but this would have to be something somehow different. In every way that I can think of, this is not a literal - it is a construct that results in a run-time operation. A context-dependent operation, at that. That's why I'm -1 on this looking like a literal. ChrisA From python-ideas at mgmiller.net Mon Jul 20 02:53:12 2015 From: python-ideas at mgmiller.net (Mike Miller) Date: Sun, 19 Jul 2015 17:53:12 -0700 Subject: [Python-ideas] Briefer string format In-Reply-To: <20150720025919.3652fb80@x230> References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <20150720025919.3652fb80@x230> Message-ID: <55AC4678.4040302@mgmiller.net> Hmm, I prefer this recipe sent to me directly by joejev: >>> import sys >>> from collections import ChainMap >>> def f(cs): ... """Format a string with the local scope of the caller ... ... Parameters ... ---------- ... cs : str ... The string to format. ... ... Returns ... ------- ... formatted : str ... The string with the format units filled in. ... """ ... frame = sys._getframe(1) ... return cs.format(**dict(ChainMap(frame.f_locals, frame.f_globals))) ... >>> a = 1 >>> f('{a}') '1' For yours I'd use the "pile of poo" character: ;) ?("{foo}", _()) Both of these might be slower and a bit more awkward than the f'' idea, though I like them. As to the original post, a pyflakes-type script might be able to look for name errors to assuage concerns, but as I mentioned before I believe the task of matching string/vars is still necessary. -Mike On 07/19/2015 04:59 PM, Paul Sokolovsky wrote: > Hello, > > On Sun, 19 Jul 2015 16:35:01 -0700 > Mike Miller wrote: > > [] > >> csstext += f'{nl}{key}{space}{{{nl}' >> >> An "f-formatted" string could automatically format with the locals >> dict. Not yet sure about globals, and unicode only suggested for >> now. > > "Not sure" sounds convincing. Deal - let's keep being explicit rather > than implicit. Brevity? > > def _(fmt, dict): > return fmt.format(**dict) > __ = globals() > ___ = locals() > > foo = 42 > > _("{foo}", __()) > > > If that's not terse enough, you can take Python3, and go thru Unicode > planes looking for funky-looking letters, then you hopefully can reduce > to > > .("{foo}", .()) > > Where dots aren't dots, but funky-looking letters. > > From breamoreboy at yahoo.co.uk Mon Jul 20 03:31:31 2015 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Mon, 20 Jul 2015 02:31:31 +0100 Subject: [Python-ideas] Briefer string format In-Reply-To: <20150720025919.3652fb80@x230> References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <20150720025919.3652fb80@x230> Message-ID: On 20/07/2015 00:59, Paul Sokolovsky wrote: > Hello, > > On Sun, 19 Jul 2015 16:35:01 -0700 > Mike Miller wrote: > > [] > >> csstext += f'{nl}{key}{space}{{{nl}' >> >> An "f-formatted" string could automatically format with the locals >> dict. Not yet sure about globals, and unicode only suggested for >> now. > > "Not sure" sounds convincing. Deal - let's keep being explicit rather > than implicit. Brevity? > > def _(fmt, dict): > return fmt.format(**dict) > __ = globals() > ___ = locals() > > foo = 42 > > _("{foo}", __()) > > > If that's not terse enough, you can take Python3, and go thru Unicode > planes looking for funky-looking letters, then you hopefully can reduce > to > > .("{foo}", .()) > > Where dots aren't dots, but funky-looking letters. > Good grief, April 1st already? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From Steve.Dower at microsoft.com Mon Jul 20 03:33:48 2015 From: Steve.Dower at microsoft.com (Steve Dower) Date: Mon, 20 Jul 2015 01:33:48 +0000 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> , Message-ID: Chris Angelico wrote: > On Mon, Jul 20, 2015 at 10:43 AM, Steve Dower wrote: > It'd obviously have to be a compile-time transformation. My point is > that it would, unlike all other forms of literal, translate into a > function call. Excluding dictionary literals, of course. And class definitions. Decorators too, and arguably the descriptor protocol and __getattribute__ make things that look like attribute lookups into function calls. Python is littered with these, so I'm not sure that your point has any historical support. > How is your "x=x, y=y" version materially different from explicitly > mentioning locals() or globals()? The only significant difference is > that your version follows the scope order outward, where locals() and > globals() call up a specific scope each. Yes, it follows normal scoping rules and doesn't invent/define/describe new ones for this particular case. There is literally no difference between the function call version and the prefix version wrt scoping. As an example of why "normal rules" are better than "locals()/globals()", how would you implement this using just locals() and globals()? >>> def f(): ... x = 123 ... return [f'{x}' for _ in range(1)] ... >>> f() ['123'] Given that this is the current behaviour: >>> def f(): ... return [locals()[x] for _ in range(1)] ... >>> f() Traceback (most recent call last): File "", line 1, in File "", line 1, in f File "", line 1, in KeyError: 'x' > Will an f"..." format string be mergeable with other strings? All the > other types of literal can be (apart, of course, from mixing bytes and > unicode), but this would have to be something somehow different. I don't mind saying "no" here, especially since the merging is done while compiling, but it would be possible to generate a runtime concatentation here. Again, you only "know" that code (currently) has no runtime effect because, well, because you know it. It's a change, but it isn't world ending. > In every way that I can think of, this is not a literal - it is a > construct that results in a run-time operation. Most new Python developers (with backgrounds in other languages) are surprised that "class" is a construct that results in a run-time operation, and would be surprised that writing a dictionary literal also results in a run-time operation if they ever had reason to notice. I believe the same would apply here. > A context-dependent operation, at that. You'll need to explain this one for me - how is it "context-dependent" when you are required to provide a string prefix? > That's why I'm -1 on this looking like a literal. I hope you'll reconsider, because I think you're working off some incorrect or over-simplified beliefs. (Though this reply isn't just intended for Chris, but for everyone following the discussion, so I hope *everyone* considers both sides.) Cheers, Steve From steve at pearwood.info Mon Jul 20 04:01:55 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 20 Jul 2015 12:01:55 +1000 Subject: [Python-ideas] Disallow "00000" as a synonym for "0" In-Reply-To: References: <55A91103.2080609@trueblade.com> <20150717152241.GG21874@ando.pearwood.info> <20150719031756.GI21874@ando.pearwood.info> Message-ID: <20150720020155.GD25179@ando.pearwood.info> On Mon, Jul 20, 2015 at 10:38:02AM +1000, Nick Coghlan wrote: > Exactly - the special case here is *dis*allowing leading zeroes for > non-zero integer literals, That's a ... different ... definition of "special case". You're saying that, out of the literally infinite number of possible ints one might attempt to write, *all of them except zero* are the special case, while zero itself is the non-special case. O-kay. I get the argument that allowing people to write 000000000 when they want a int zero is harmless, and the code churn required to prevent that outweighs any benefit gained. The task I created on the tracker has been closed as a "won't fix" or "rejected" (I forget which one), and I'm not going to argue with that. But I did find your perspective above funny enough that I had to reply. But for the record, and this will be my last word on the subject: > since we can't know if they're supposed to be > octal values or not, or if the "b" or "x" was left out of a binary or hex > literal that has been padded out to a fixed number of bits, or if the > decimal point was left out of a floating point literal. Right. So if you see somebody has written 00 as a literal in Python 3, it's *more likely to be an error than deliberate*. E.g. we can align octal or hex numbers using a fixed width string with leading zeroes where needed: nums = [0x00001234, 0x000000A3, 0x00000005, 0x000027F3, 0x00000000, # this is okay 00000000, # breaks the alignment 0000000000, # aligned but where's the X? 0x0000C21D, ] but if you see a plain, unprefixed 0 mixed in there, that smells of an possible error. Even if it isn't an error, to me it hints of an error enough that I'd want to question the code author's intention. "Did you really mean zero, or is that a typo?" Regardless of everything else, to me this is an aesthetic question. Allowing 00 when 01, 02, 03, ... are (rightly!) forbidden feels ugly and a wart. But until and unless somebody actually gets bitten by this in real code, and a typo hides in plain view disguised as zero, I can't honestly say it is outrightly harmful. -- Steve From Steve.Dower at microsoft.com Mon Jul 20 03:53:47 2015 From: Steve.Dower at microsoft.com (Steve Dower) Date: Mon, 20 Jul 2015 01:53:47 +0000 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> , , Message-ID: "return [locals()[x] for _ in range(1)]" I lost some quotes here around the x, but it doesn't affect the behavior - you still can't get outside the comprehension scope here. Cheers, Steve Top-posted from my Windows Phone ________________________________ From: Steve Dower Sent: ?7/?19/?2015 18:49 To: Chris Angelico Cc: python-ideas at python.org Subject: Re: [Python-ideas] Briefer string format Chris Angelico wrote: > On Mon, Jul 20, 2015 at 10:43 AM, Steve Dower wrote: > It'd obviously have to be a compile-time transformation. My point is > that it would, unlike all other forms of literal, translate into a > function call. Excluding dictionary literals, of course. And class definitions. Decorators too, and arguably the descriptor protocol and __getattribute__ make things that look like attribute lookups into function calls. Python is littered with these, so I'm not sure that your point has any historical support. > How is your "x=x, y=y" version materially different from explicitly > mentioning locals() or globals()? The only significant difference is > that your version follows the scope order outward, where locals() and > globals() call up a specific scope each. Yes, it follows normal scoping rules and doesn't invent/define/describe new ones for this particular case. There is literally no difference between the function call version and the prefix version wrt scoping. As an example of why "normal rules" are better than "locals()/globals()", how would you implement this using just locals() and globals()? >>> def f(): ... x = 123 ... return [f'{x}' for _ in range(1)] ... >>> f() ['123'] Given that this is the current behaviour: >>> def f(): ... return [locals()[x] for _ in range(1)] ... >>> f() Traceback (most recent call last): File "", line 1, in File "", line 1, in f File "", line 1, in KeyError: 'x' > Will an f"..." format string be mergeable with other strings? All the > other types of literal can be (apart, of course, from mixing bytes and > unicode), but this would have to be something somehow different. I don't mind saying "no" here, especially since the merging is done while compiling, but it would be possible to generate a runtime concatentation here. Again, you only "know" that code (currently) has no runtime effect because, well, because you know it. It's a change, but it isn't world ending. > In every way that I can think of, this is not a literal - it is a > construct that results in a run-time operation. Most new Python developers (with backgrounds in other languages) are surprised that "class" is a construct that results in a run-time operation, and would be surprised that writing a dictionary literal also results in a run-time operation if they ever had reason to notice. I believe the same would apply here. > A context-dependent operation, at that. You'll need to explain this one for me - how is it "context-dependent" when you are required to provide a string prefix? > That's why I'm -1 on this looking like a literal. I hope you'll reconsider, because I think you're working off some incorrect or over-simplified beliefs. (Though this reply isn't just intended for Chris, but for everyone following the discussion, so I hope *everyone* considers both sides.) Cheers, Steve _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Mon Jul 20 04:14:45 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 19 Jul 2015 22:14:45 -0400 Subject: [Python-ideas] Disallow "00000" as a synonym for "0" In-Reply-To: <20150720020155.GD25179@ando.pearwood.info> References: <55A91103.2080609@trueblade.com> <20150717152241.GG21874@ando.pearwood.info> <20150719031756.GI21874@ando.pearwood.info> <20150720020155.GD25179@ando.pearwood.info> Message-ID: On Sun, Jul 19, 2015 at 10:01 PM, Steven D'Aprano wrote: > Allowing 00 when 01, 02, 03, ... are (rightly!) forbidden feels ugly and > a wart. > I agree in general, but there is one case where I am on the fence: dates = [ date(2005, 07, 01), date(2005, 11, 15), ..] looks marginally better than the valid alternative. I often see this form written by 2.7 users, and it requires a medium size lecture to explain why they should not write code like this even if it seemingly works. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Mon Jul 20 04:42:04 2015 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 20 Jul 2015 12:42:04 +1000 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> Message-ID: On Mon, Jul 20, 2015 at 11:33 AM, Steve Dower wrote: > Chris Angelico wrote: >> On Mon, Jul 20, 2015 at 10:43 AM, Steve Dower wrote: >> It'd obviously have to be a compile-time transformation. My point is >> that it would, unlike all other forms of literal, translate into a >> function call. > > Excluding dictionary literals, of course. And class definitions. Decorators too, and arguably the descriptor protocol and __getattribute__ make things that look like attribute lookups into function calls. Python is littered with these, so I'm not sure that your point has any historical support. > Dictionary/list display isn't a literal, and every time it's evaluated, you get a brand new object, not another reference to the same literal. Compare: >>> def make_list(): return [1,2,3] >>> make_list() is make_list() False Class and function definitions are also not literals, although people coming from other languages are often confused by this. (I've seen people write functions down the bottom of the file that are needed by top-level code higher up. It's just something you have to learn - Python doesn't "declare" functions, it "defines" them.) Going the other direction, there are a few things that you might think are literals but aren't technically so, such as "2+3j", which is actually two literals (int and imaginary) and a binary operation; but constant folding makes them functionally identical to constants. The nearest equivalent to this proposal is tuple display, which can sometimes function almost like a literal: >>> def make_tuple(): return (1,2,3) >>> make_tuple() is make_tuple() True This disassembles to a simple fetching of a constant. However, it's really just like list display plus constant folding - the compiler notices that it'll always produce the same tuple, so it optimizes it down to a constant. In none of these cases is a string ever anything other than a simple constant. That's why this proposal is a distinct change; all of the cases where Python has non-constants that might be thought of as constants, they contain expressions (or even statements - class/function definitions), and are syntactically NOT single entities. Now, that's not to say that it cannot possibly be done. But I personally am not in favour of it. >> How is your "x=x, y=y" version materially different from explicitly >> mentioning locals() or globals()? The only significant difference is >> that your version follows the scope order outward, where locals() and >> globals() call up a specific scope each. > > Yes, it follows normal scoping rules and doesn't invent/define/describe new ones for this particular case. There is literally no difference between the function call version and the prefix version wrt scoping. > > As an example of why "normal rules" are better than "locals()/globals()", how would you implement this using just locals() and globals()? > >>>> def f(): > ... x = 123 > ... return [f'{x}' for _ in range(1)] > ... >>>> f() > ['123'] > > Given that this is the current behaviour: > >>>> def f(): > ... return [locals()[x] for _ in range(1)] > ... >>>> f() > Traceback (most recent call last): > File "", line 1, in > File "", line 1, in f > File "", line 1, in > KeyError: 'x' Sure, that's where following the scoping rules is better than explicitly calling up locals(). On the flip side, going for locals() alone means you can easily and simply protect your code against grabbing the "wrong" things, by simply putting it inside a nested function (which is what your list comp there is doing), and it's also easier to explain what this construct does in terms of locals() than otherwise (what if there are attribute lookups or subscripts?). >> Will an f"..." format string be mergeable with other strings? All the >> other types of literal can be (apart, of course, from mixing bytes and >> unicode), but this would have to be something somehow different. > > I don't mind saying "no" here, especially since the merging is done while compiling, but it would be possible to generate a runtime concatentation here. Again, you only "know" that code (currently) has no runtime effect because, well, because you know it. It's a change, but it isn't world ending. > Fair enough. I wouldn't mind saying "no" here too - in the same way that it's a SyntaxError to write u"hello" b"world", it would be a SyntaxError to mix either with f"format string". >> In every way that I can think of, this is not a literal - it is a >> construct that results in a run-time operation. > > Most new Python developers (with backgrounds in other languages) are surprised that "class" is a construct that results in a run-time operation, and would be surprised that writing a dictionary literal also results in a run-time operation if they ever had reason to notice. I believe the same would apply here. > That's part of learning the language (which things are literals and which aren't). Expanding the scope of potential confusion is a definite cost; I'm open to the argument that the benefit justifies that cost, but it is undoubtedly a cost. >> A context-dependent operation, at that. > > You'll need to explain this one for me - how is it "context-dependent" when you are required to provide a string prefix? def func1(): x = "world" return f"Hello, {x}!" def func2(): return f"Hello, {x}!" They both return what looks like a simple string, but in one, it grabs a local x, and in the other, it goes looking for a global. This is one of the potential risks of such things as decimal.Decimal literals, because literals normally aren't context-dependent, but the Decimal constructor can be affected by precision controls and such. Again, not a killer, but another cost. >> That's why I'm -1 on this looking like a literal. > > I hope you'll reconsider, because I think you're working off some incorrect or over-simplified beliefs. (Though this reply isn't just intended for Chris, but for everyone following the discussion, so I hope *everyone* considers both sides.) > Having read your above responses, I'm now -0.5 on this proposal. There is definite merit to it, but I'm a bit dubious that it'll end up with one of the problems of PHP code: the uncertainty of whether something is a string or a piece of code. Some editors syntax-highlight all strings as straight-forward strings, same color all the way, while others will change color inside there to indicate interpolation sites. Which is correct? At least here, the prefix on the string makes it clear that this is a piece of code; but it'll take editors a good while to catch up and start doing the right thing - for whatever definition of "right thing" the authors choose. Maybe my beliefs are over-simplified, in which case I'll be happy to be proven wrong by some immensely readable and clear real-world code examples. :) ChrisA From rosuav at gmail.com Mon Jul 20 04:46:31 2015 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 20 Jul 2015 12:46:31 +1000 Subject: [Python-ideas] Disallow "00000" as a synonym for "0" In-Reply-To: References: <55A91103.2080609@trueblade.com> <20150717152241.GG21874@ando.pearwood.info> <20150719031756.GI21874@ando.pearwood.info> <20150720020155.GD25179@ando.pearwood.info> Message-ID: On Mon, Jul 20, 2015 at 12:14 PM, Alexander Belopolsky wrote: > On Sun, Jul 19, 2015 at 10:01 PM, Steven D'Aprano > wrote: >> >> Allowing 00 when 01, 02, 03, ... are (rightly!) forbidden feels ugly and >> a wart. > > > I agree in general, but there is one case where I am on the fence: > > dates = [ date(2005, 07, 01), > date(2005, 11, 15), > ..] > > looks marginally better than the valid alternative. I often see this form > written by 2.7 users, > and it requires a medium size lecture to explain why they should not write > code like this even > if it seemingly works. > The lecture should be fairly simple. Just put 08 in there and you'll see why you shouldn't do it. :) If they then ask "Why is 07 allowed but 08 not?", then you can go into detail about octal, why C uses 0755 to mean 493 (and why it makes a LOT more sense to key some things in using octal - nobody would understand what permissions 493 would mean), and then it becomes pretty clear that there are two viable interpretations for "0755". ChrisA From stephen at xemacs.org Mon Jul 20 06:23:00 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 20 Jul 2015 13:23:00 +0900 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> Message-ID: <21932.30628.340500.24858@uwakimon.sk.tsukuba.ac.jp> Chris Angelico writes: > I'm -1 on the specific idea, though definitely sympathetic to the > broader concept of simplified formatting of strings. So does everybody. But we've seen many iterations: Perl/shell-style implicit interpolation apparently was right out from the very beginning of Python. The magic print statement was then deprecated in favor of a function. So I suppose it will be very hard to convince the BDFL (and anything implicit would surely need his approval) of anything but a function or an operator. We have the % operator taking a printf-style format string and a tuple of values to interpolate. It's compact and easy to use with position indexes into the tuple for short formats and few values, but is nearly unreadable and not easy to write for long formats with many interpolations, especially if they are repeated. > Python's printf-style formatting has its own warts (mainly because > of the cute use of an operator, rather than doing it as a function > call), I think the operator is actually a useful feature, not merely "cute". It directs the focus to the format string, rather than the function call. > and still has the problem of having percent markers with no > indication of what they'll be interpolating in. Not so. We have the more modern (?) % operator that takes a format string with named format sequences and a dictionary. This seems to be close to what the OP wants: val = "readable simple formatting method" print("This is a %(val)s." % locals()) (which actually works at module level as well as within a function). I suppose the OP will claim that an explicit call to locals() is verbose and redundant, but if that really is a problem: def format_with_locals(fmtstr): return fmtstr % locals() (of course with a nice short name, mnemonic to the author). Or for format strings to be used repeatedly with different (global -- the "locals" you want are actually nonlocal relative to a method, so there's no way to get at them AFAICS) values, there's this horrible hack: >>> class autoformat_with_globals(str): ... def __pos__(self): ... return self % globals() ... >>> a = autoformat_with_globals("This is a %(description)s.") >>> description = "autoformatted string" >>> +a 'This is a autoformatted string.' with __neg__ and __invert__ as alternative horrible hacks. We have str.format. I've gotten used to str.format but for most of my uses mapped %-formatting would work fine. We have an older proposal for a more flexible form of templating using the Perl/shell-ish $ operator in format strings. And we have a large number of templating languages from web frameworks (Django, Jinja, etc). None of these seem universally applicable. It's ugly in one sense (TOOWTDI violation), but ISTM that positional % for short interactive use, mapped % for templating where the conventional format operators suffice, and str.format for maximum explicit flexibility in programs, with context-sensitive formatting of new types, is an excellent combination. From ncoghlan at gmail.com Mon Jul 20 06:34:40 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 20 Jul 2015 14:34:40 +1000 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> Message-ID: On 20 July 2015 at 10:43, Steve Dower wrote: > "Point to note: Currently, all the string prefixes are compile-time > directives only. A b"bytes" or u"unicode" prefix affects what kind of > object is produced, and all the others are just syntactic differences. > In all cases, a string literal is a single immutable object which can > be stashed away as a constant. What you're suggesting here is a thing > that looks like a literal, but is actually a run-time operation." > > Why wouldn't this be a compile time transform from f"string with braces" > into "string with braces".format(x=x, y=y, ...) where x, y, etc are the > names in each pair of braces (with an error if it can't get a valid > identifier out of each format code)? It's syntactic sugar for a simple > function call with perfectly well defined semantics - you don't even have to > modify the string literal. > > Defined as a compile time transform like this, I'm +1. As soon as any > suggestion mentions "locals()" or "globals()" I'm -1. I'm opposed to a special case compile time transformation for string formatting in particular, but in favour of clearly-distinct-from-anything-else syntax for such utilities: https://mail.python.org/pipermail/python-ideas/2015-June/033920.html It would also need a "compile time import" feature for namespacing purposes, so you might be able to write something like: from !string import format # Compile time import # Compile time transformation that emits a syntax error for a malformed format string formatted = !format("string with braces for {name} {lookups} transformed to a runtime .format(name=name, lookups=lookups) call") # Equivalent explicit code (but without any compile time checking of format string validity) formatted = "string with braces for {name} {lookups} transformed to a runtime .format(name=name, lookups=lookups) call".format(name=name, lookups=lookups) The key for me is that any such operation *must* be transparent to the compiler, so it knows exactly what names you're looking up and can generate the appropriate references for them (including capturing closure variables if necessary). If it's opaque to the compiler, then it's no better than just using a string, which we can already do today. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From python-ideas at mgmiller.net Mon Jul 20 06:42:11 2015 From: python-ideas at mgmiller.net (Mike Miller) Date: Sun, 19 Jul 2015 21:42:11 -0700 Subject: [Python-ideas] Briefer string format In-Reply-To: References: Message-ID: <55AC7C23.8040803@mgmiller.net> Thanks for thinking about it, Chris. After several mentions of u'' and b'' I was waiting patiently for '' and r''. I guess r'' does nothing but what does '' or "" do at runtime? "\tcommand line --params\n" --> " command line --params " It looks like there is some runtime conversion happening there, though I don't know the implementation details. On Mon, 20 Jul 2015 10:51:28 +1000, Chris Angelico wrote: > Will an f"..." format string be mergeable with other strings? All the > other types of literal can be (apart, of course, from mixing bytes and > unicode), but this would have to be something somehow different. In > every way that I can think of, this is not a literal - it is a > construct that results in a run-time operation. A context-dependent > operation, at that. That's why I'm -1 on this looking like a literal. > ChrisA Also, Re: below, this a subtle but important improvement Steve, thanks! Yes, a transformation from: f"{x} {y}" to "string with braces".format(x=x, y=y) This is very much what I was thinking. If the second form is ok, the first should be too, syntax aside, if it does the same thing. The drawback being it might take a tool like an upgraded pyflakes/pycharm to notice y hadn't been defined yet. (Might a further optimization in C be possible to skip the function call, but basically work like this? No locals/globals()?) -Mike On Mon, Jul 20, 2015 at 10:43 AM, Steve Dower wrote: > Why wouldn't this be a compile time transform from f"string with braces" > into "string with braces".format(x=x, y=y, ...) where x, y, etc are... From Steve.Dower at microsoft.com Mon Jul 20 06:46:32 2015 From: Steve.Dower at microsoft.com (Steve Dower) Date: Mon, 20 Jul 2015 04:46:32 +0000 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> , Message-ID: So, macros basically? The real ones, not #define. What's wrong with special casing text strings (a little bit more than they already have been)? Top-posted from my Windows Phone ________________________________ From: Nick Coghlan Sent: ?7/?19/?2015 21:34 To: Steve Dower Cc: Chris Angelico; python-ideas at python.org Subject: Re: [Python-ideas] Briefer string format On 20 July 2015 at 10:43, Steve Dower wrote: > "Point to note: Currently, all the string prefixes are compile-time > directives only. A b"bytes" or u"unicode" prefix affects what kind of > object is produced, and all the others are just syntactic differences. > In all cases, a string literal is a single immutable object which can > be stashed away as a constant. What you're suggesting here is a thing > that looks like a literal, but is actually a run-time operation." > > Why wouldn't this be a compile time transform from f"string with braces" > into "string with braces".format(x=x, y=y, ...) where x, y, etc are the > names in each pair of braces (with an error if it can't get a valid > identifier out of each format code)? It's syntactic sugar for a simple > function call with perfectly well defined semantics - you don't even have to > modify the string literal. > > Defined as a compile time transform like this, I'm +1. As soon as any > suggestion mentions "locals()" or "globals()" I'm -1. I'm opposed to a special case compile time transformation for string formatting in particular, but in favour of clearly-distinct-from-anything-else syntax for such utilities: https://mail.python.org/pipermail/python-ideas/2015-June/033920.html It would also need a "compile time import" feature for namespacing purposes, so you might be able to write something like: from !string import format # Compile time import # Compile time transformation that emits a syntax error for a malformed format string formatted = !format("string with braces for {name} {lookups} transformed to a runtime .format(name=name, lookups=lookups) call") # Equivalent explicit code (but without any compile time checking of format string validity) formatted = "string with braces for {name} {lookups} transformed to a runtime .format(name=name, lookups=lookups) call".format(name=name, lookups=lookups) The key for me is that any such operation *must* be transparent to the compiler, so it knows exactly what names you're looking up and can generate the appropriate references for them (including capturing closure variables if necessary). If it's opaque to the compiler, then it's no better than just using a string, which we can already do today. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Jul 20 06:48:02 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 20 Jul 2015 14:48:02 +1000 Subject: [Python-ideas] Disallow "00000" as a synonym for "0" In-Reply-To: <20150720020155.GD25179@ando.pearwood.info> References: <55A91103.2080609@trueblade.com> <20150717152241.GG21874@ando.pearwood.info> <20150719031756.GI21874@ando.pearwood.info> <20150720020155.GD25179@ando.pearwood.info> Message-ID: On 20 July 2015 at 12:01, Steven D'Aprano wrote: > On Mon, Jul 20, 2015 at 10:38:02AM +1000, Nick Coghlan wrote: > >> Exactly - the special case here is *dis*allowing leading zeroes for >> non-zero integer literals, > > That's a ... different ... definition of "special case". You're saying > that, out of the literally infinite number of possible ints one might > attempt to write, *all of them except zero* are the special case, while > zero itself is the non-special case. O-kay. The integer literal zero only looks like the special case if you first exclude all the *other* ways of entering numeric values into Python applications and only consider "integer literal zero" and "non-zero integer literals". The fact that "0", "0.0" and "0j" produce values of different types is an implementation detail of Python's representation of abstract mathematical concepts, so the general case is actually "you can write numbers in Python the same way you write them in mathematical notation, including with leading zeroes if you want" (with the caveat that we use the electrical engineering "j" for imaginary numbers, rather than the more confusable mathematical "i") The Python 2 special case is then "unlike mathematical notation, using leading zeroes on an integer literal will result in it being interpreted in base 8 rather than base 10". The corresponding Python 3 special case is "unlike mathematical notation, you can't use leading zeroes on non-zero integers in Python 3, because of historical reasons relating to the syntax previously used for octal numbers in Python 2, and still used in C/C++, and various other languages". Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Mon Jul 20 06:58:22 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 20 Jul 2015 14:58:22 +1000 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> Message-ID: On 20 July 2015 at 14:46, Steve Dower wrote: > So, macros basically? The real ones, not #define. > > What's wrong with special casing text strings (a little bit more than they > already have been)? I've wished for a cleaner shell command invocation syntax many more times than I've wished for easier string formatting, but I *have* wished for both. Talking to the scientific Python folks, they've often wished for a cleaner syntax to create deferred expressions with the full power of Python's statement level syntax. Explicitly named macros could deliver all three of those, without the downsides of implicit globally installed macros that are indistinguishable from regular syntax. By contrast, the string prefix system is inherently cryptic (being limited to single letters only) and not open to extension and experimentation outside the reference interpreter. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From rosuav at gmail.com Mon Jul 20 06:59:52 2015 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 20 Jul 2015 14:59:52 +1000 Subject: [Python-ideas] Briefer string format In-Reply-To: <55AC7C23.8040803@mgmiller.net> References: <55AC7C23.8040803@mgmiller.net> Message-ID: On Mon, Jul 20, 2015 at 2:42 PM, Mike Miller wrote: > Thanks for thinking about it, Chris. After several mentions of u'' and b'' > I was waiting patiently for '' and r''. I guess r'' does nothing but what > does '' or "" do at runtime? > > "\tcommand line --params\n" --> " command line --params > " > > It looks like there is some runtime conversion happening there, though I > don't know the implementation details. > No, it's not runtime conversion; it's an alternative syntax. Aside from unicode vs bytes, all types of string literals are just different ways of embedding a string in your source code; this is why it's sometimes important to talk about a "raw string literal" rather than abbreviating it to "raw string", as if the string itself were different. You can play around with it in the interactive interpreter. All strings show the same form in repr, namely the unprefixed form (again, apart from a b"..." prefix on bytes); once the compiler has turned source code notation into constants, they're just string objects. ChrisA From abarnert at yahoo.com Mon Jul 20 07:18:47 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 19 Jul 2015 22:18:47 -0700 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> Message-ID: On Jul 19, 2015, at 21:58, Nick Coghlan wrote: > >> On 20 July 2015 at 14:46, Steve Dower wrote: >> So, macros basically? The real ones, not #define. >> >> What's wrong with special casing text strings (a little bit more than they >> already have been)? > > I've wished for a cleaner shell command invocation syntax many more > times than I've wished for easier string formatting, but I *have* > wished for both. Talking to the scientific Python folks, they've often > wished for a cleaner syntax to create deferred expressions with the > full power of Python's statement level syntax. > > Explicitly named macros could deliver all three of those, without the > downsides of implicit globally installed macros that are > indistinguishable from regular syntax. MacroPy already gives you macros that are explicitly imported, and explicitly marked on use, and nicely readable. And it already works, with no changes to Python, and it includes a ton of little features that you'd never want to add to core Python. There are definitely changes to Python that could make it easier to improve MacroPy or start a competing project, but I think it would be more useful to identify and implement those changes than to try to build a macro system into Python itself. (Making it possible to work on the token level, or to associate bytes/text/tokens/trees/code with each other more easily, or to hook syntax errors and reprocess the bytes/text/tokens, etc. are some such ideas. But I think the most important stuff wouldn't be new features, but removing the annoyances that get in the way of trying to build the simplest possible new macro system from scratch for 3.5, and we probably can't know what those are until someone attempts to build such a thing.) From abarnert at yahoo.com Mon Jul 20 07:23:02 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 19 Jul 2015 22:23:02 -0700 Subject: [Python-ideas] Briefer string format In-Reply-To: <55AC7C23.8040803@mgmiller.net> References: <55AC7C23.8040803@mgmiller.net> Message-ID: On Jul 19, 2015, at 21:42, Mike Miller wrote: > > (Might a further optimization in C be possible to skip the function call, but > basically work like this? No locals/globals()?) Why do you keep bringing up optimization? Do you actually have code where the call to str.format or str.format or str.__mod__ or locals is actually a relevant cost? It seems to me that whenever that call is too slow to bear, calling __str__ or __format__ on all of the substitutes variables must be far more of a problem, so your optimized version would still be useless. From ncoghlan at gmail.com Mon Jul 20 07:43:31 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 20 Jul 2015 15:43:31 +1000 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> Message-ID: On 20 July 2015 at 15:18, Andrew Barnert wrote: > On Jul 19, 2015, at 21:58, Nick Coghlan wrote: >> >>> On 20 July 2015 at 14:46, Steve Dower wrote: >>> So, macros basically? The real ones, not #define. >>> >>> What's wrong with special casing text strings (a little bit more than they >>> already have been)? >> >> I've wished for a cleaner shell command invocation syntax many more >> times than I've wished for easier string formatting, but I *have* >> wished for both. Talking to the scientific Python folks, they've often >> wished for a cleaner syntax to create deferred expressions with the >> full power of Python's statement level syntax. >> >> Explicitly named macros could deliver all three of those, without the >> downsides of implicit globally installed macros that are >> indistinguishable from regular syntax. > > MacroPy already gives you macros that are explicitly imported, and explicitly marked on use, and nicely readable. And it already works, with no changes to Python, and it includes a ton of little features that you'd never want to add to core Python. I see nothing explicit about https://pypi.python.org/pypi/MacroPy or the examples at https://github.com/lihaoyi/macropy#macropy, as it looks just like normal Python code to me, with no indication that compile time modifications are taking place. That's not MacroPy's fault - it *can't* readily be explicit the way I would want it to be if it's going to reuse the existing AST compiler to do the heavy lifting. However, I agree the MacroPy approach to tree transformations could be a good backend concept. I'd previously wondered how you'd go about embedding third party syntax like shell expressions or format strings, but eventually realised that combining an AST transformation syntax with string quoting works just fine there. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From phd at phdru.name Mon Jul 20 08:30:12 2015 From: phd at phdru.name (Oleg Broytman) Date: Mon, 20 Jul 2015 08:30:12 +0200 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> Message-ID: <20150720063012.GA6269@phdru.name> Hi! On Sun, Jul 19, 2015 at 07:50:52PM -0400, Terry Reedy wrote: > On 7/19/2015 7:12 PM, Mike Miller wrote: > >Have long wished python could format strings easily like bash or perl > >do, ... > >and then it hit me: > > > > csstext += f'{nl}{selector}{space}{{{nl}' > > Are the unbalanced braces here and in the followup intentional? I'm sure they are. The code is supposed to generate something equivalent to csstext += ''' p.italic { ''' to be extended later with something like csstext += ''' font-style: italic; } ''' > -- > Terry Jan Reedy Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From abarnert at yahoo.com Mon Jul 20 10:01:04 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 20 Jul 2015 01:01:04 -0700 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> Message-ID: <1955CB27-6315-4A82-8654-4620B7440C22@yahoo.com> On Jul 19, 2015, at 22:43, Nick Coghlan wrote: > >> On 20 July 2015 at 15:18, Andrew Barnert wrote: >>> On Jul 19, 2015, at 21:58, Nick Coghlan wrote: >>> >>>> On 20 July 2015 at 14:46, Steve Dower wrote: >>>> So, macros basically? The real ones, not #define. >>>> >>>> What's wrong with special casing text strings (a little bit more than they >>>> already have been)? >>> >>> I've wished for a cleaner shell command invocation syntax many more >>> times than I've wished for easier string formatting, but I *have* >>> wished for both. Talking to the scientific Python folks, they've often >>> wished for a cleaner syntax to create deferred expressions with the >>> full power of Python's statement level syntax. >>> >>> Explicitly named macros could deliver all three of those, without the >>> downsides of implicit globally installed macros that are >>> indistinguishable from regular syntax. >> >> MacroPy already gives you macros that are explicitly imported, and explicitly marked on use, and nicely readable. And it already works, with no changes to Python, and it includes a ton of little features that you'd never want to add to core Python. > > I see nothing explicit about https://pypi.python.org/pypi/MacroPy or > the examples at https://github.com/lihaoyi/macropy#macropy, as it > looks just like normal Python code to me, with no indication that > compile time modifications are taking place. I suppose what I meant by explicit is things like using [] instead of () for macro calls and quick lambda definitions, s[""] for string interpolation, etc. Once you get used to it, it's usually obvious at a glance where code is using MacroPy. > That's not MacroPy's fault - it *can't* readily be explicit the way I > would want it to be if it's going to reuse the existing AST compiler > to do the heavy lifting. > > However, I agree the MacroPy approach to tree transformations could be > a good backend concept. I'd previously wondered how you'd go about > embedding third party syntax like shell expressions or format strings, > but eventually realised that combining an AST transformation syntax > with string quoting works just fine there. I think you want to be able to hook the tokenizer here as well. If you want f"..." of !f"...", that's hard to do at the tree level or the text level; you'd have to do something like f("...") or "f..." instead. But at the token level, it should be trivial. (Well, the second one may not be _quite_ trivial, because I believe there are some cases where you get a !f error instead of a ! error and an f name; I'd have to check.) I'm pretty sure I could turn my user literal suffix hack into an f-prefix hack in 15 minutes or so. (Obviously it would be nicer if it were doable in a more robust way, and using a framework rather than rewriting all the boilerplate. But my point is that, even without any support at all, it's still not that hard.) I think you could also use token transformations to do a lot of useful shell-type expressions without quoting, although not full shell syntax; you'd have to play around with the limitations to see if they're worth the benefit of not needing to quote the whole thing. But as I said before, the real test would be trying to build the framework mentioned parenthetically above and see where it gets annoying and what could change between 3.5 and 3.6 to unannoyingize the code. From pmiscml at gmail.com Mon Jul 20 10:15:41 2015 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Mon, 20 Jul 2015 11:15:41 +0300 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <20150720025919.3652fb80@x230> Message-ID: <20150720111541.04153eb6@x230> Hello, On Mon, 20 Jul 2015 02:31:31 +0100 Mark Lawrence wrote: > On 20/07/2015 00:59, Paul Sokolovsky wrote: > > Hello, > > > > On Sun, 19 Jul 2015 16:35:01 -0700 > > Mike Miller wrote: > > > > [] > > > >> csstext += f'{nl}{key}{space}{{{nl}' > >> > >> An "f-formatted" string could automatically format with the locals > >> dict. Not yet sure about globals, and unicode only suggested for > >> now. > > > > "Not sure" sounds convincing. Deal - let's keep being explicit > > rather than implicit. Brevity? > > > > def _(fmt, dict): > > return fmt.format(**dict) > > __ = globals() > > ___ = locals() > > > > foo = 42 > > > > _("{foo}", __()) > > > > > > If that's not terse enough, you can take Python3, and go thru > > Unicode planes looking for funky-looking letters, then you > > hopefully can reduce to > > > > .("{foo}", .()) > > > > Where dots aren't dots, but funky-looking letters. > > > > Good grief, April 1st already? Why, the original poster referered to bash and perl as inspiration, so certainly he was asking for something which would confuse people? -- Best regards, Paul mailto:pmiscml at gmail.com From ncoghlan at gmail.com Mon Jul 20 10:16:24 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 20 Jul 2015 18:16:24 +1000 Subject: [Python-ideas] Briefer string format In-Reply-To: <1955CB27-6315-4A82-8654-4620B7440C22@yahoo.com> References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <1955CB27-6315-4A82-8654-4620B7440C22@yahoo.com> Message-ID: On 20 July 2015 at 18:01, Andrew Barnert wrote: > On Jul 19, 2015, at 22:43, Nick Coghlan wrote: >> I see nothing explicit about https://pypi.python.org/pypi/MacroPy or >> the examples at https://github.com/lihaoyi/macropy#macropy, as it >> looks just like normal Python code to me, with no indication that >> compile time modifications are taking place. > > I suppose what I meant by explicit is things like using [] instead of () for macro calls and quick lambda definitions, s[""] for string interpolation, etc. Once you get used to it, it's usually obvious at a glance where code is using MacroPy. Is this code using MacroPy for compile time transformations? data = source[lookup] You have no idea, and neither do I. Instead, we'd be relying on our rote memory to recognise certain *names* as being typical MacroPy operations - if someone defines a new transformation, our pattern recognition isn't going to trigger properly. It doesn't help that my rote memory is awful, so I *detest* APIs that expect me to have a good one and hence "just know" when certain operations are special and don't work the same way as other operations. My suggested "!(expr)" notation is based on the idea of providing an inline syntactic marker to say "magic happening here" (with the default anonymous transformation being to return the AST object itself). >> That's not MacroPy's fault - it *can't* readily be explicit the way I >> would want it to be if it's going to reuse the existing AST compiler >> to do the heavy lifting. >> >> However, I agree the MacroPy approach to tree transformations could be >> a good backend concept. I'd previously wondered how you'd go about >> embedding third party syntax like shell expressions or format strings, >> but eventually realised that combining an AST transformation syntax >> with string quoting works just fine there. > > I think you want to be able to hook the tokenizer here as well. If you want f"..." of !f"...", that's hard to do at the tree level or the text level; you'd have to do something like f("...") or "f..." instead. I figured out that AST->AST is fine, as anything else can be handled as quoted string transformations, which then gets you all the nice benefits of strings literals (choice of single or double quotes, triple-quoting for multi-line strings, escape sequences with the option of raw strings, etc), plus a clear inline marker alerting the reader to the fact that you've dropped out of Python's normal syntactic restrictions. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From eric at trueblade.com Mon Jul 20 15:42:46 2015 From: eric at trueblade.com (Eric V. Smith) Date: Mon, 20 Jul 2015 09:42:46 -0400 Subject: [Python-ideas] Briefer string format In-Reply-To: <55AC3425.5010509@mgmiller.net> References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> Message-ID: <55ACFAD6.50201@trueblade.com> On 07/19/2015 07:35 PM, Mike Miller wrote: > Decent but not great, a bit hard on the eyes. So I decided to try > .format(): > > csstext += '{nl}{key}{space}{{{nl}'.format(**locals()) > > This looks a bit better if you ignore the right half, but it is longer > and not > as simple as one might hope. Better would be: csstext += '{nl}{key}{space}{{{nl}'.format_map(locals()) Eric. From rymg19 at gmail.com Mon Jul 20 15:53:44 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Mon, 20 Jul 2015 08:53:44 -0500 Subject: [Python-ideas] Briefer string format In-Reply-To: <21932.30628.340500.24858@uwakimon.sk.tsukuba.ac.jp> References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <21932.30628.340500.24858@uwakimon.sk.tsukuba.ac.jp> Message-ID: <069443CB-ACD5-44B6-BE84-0F53786A4F25@gmail.com> On July 19, 2015 11:23:00 PM CDT, "Stephen J. Turnbull" wrote: >Chris Angelico writes: > > > I'm -1 on the specific idea, though definitely sympathetic to the > > broader concept of simplified formatting of strings. > >So does everybody. But we've seen many iterations: Perl/shell-style >implicit interpolation apparently was right out from the very >beginning of Python. The magic print statement was then deprecated in >favor of a function. So I suppose it will be very hard to convince >the BDFL (and anything implicit would surely need his approval) of >anything but a function or an operator. > >We have the % operator taking a printf-style format string and a tuple >of values to interpolate. It's compact and easy to use with position >indexes into the tuple for short formats and few values, but is nearly >unreadable and not easy to write for long formats with many >interpolations, especially if they are repeated. > > > Python's printf-style formatting has its own warts (mainly because > > of the cute use of an operator, rather than doing it as a function > > call), > >I think the operator is actually a useful feature, not merely "cute". >It directs the focus to the format string, rather than the function >call. > > > and still has the problem of having percent markers with no > > indication of what they'll be interpolating in. > >Not so. We have the more modern (?) % operator that takes a format >string with named format sequences and a dictionary. This seems to be >close to what the OP wants: > > val = "readable simple formatting method" > print("This is a %(val)s." % locals()) > >(which actually works at module level as well as within a function). >I suppose the OP will claim that an explicit call to locals() is >verbose and redundant, but if that really is a problem: > > def format_with_locals(fmtstr): > return fmtstr % locals() > Won't this use the locals of the function format_with_locals over its caller? >(of course with a nice short name, mnemonic to the author). Or for >format strings to be used repeatedly with different (global -- the >"locals" you want are actually nonlocal relative to a method, so >there's no way to get at them AFAICS) values, there's this horrible >hack: > > >>> class autoformat_with_globals(str): > ... def __pos__(self): > ... return self % globals() > ... > >>> a = autoformat_with_globals("This is a %(description)s.") > >>> description = "autoformatted string" > >>> +a > 'This is a autoformatted string.' > >with __neg__ and __invert__ as alternative horrible hacks. > >We have str.format. I've gotten used to str.format but for most of my >uses mapped %-formatting would work fine. > >We have an older proposal for a more flexible form of templating using >the Perl/shell-ish $ operator in format strings. And we have a large >number of templating languages from web frameworks (Django, Jinja, >etc). > >None of these seem universally applicable. It's ugly in one sense >(TOOWTDI violation), but ISTM that positional % for short interactive >use, mapped % for templating where the conventional format operators >suffice, and str.format for maximum explicit flexibility in programs, >with context-sensitive formatting of new types, is an excellent >combination. > >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >https://mail.python.org/mailman/listinfo/python-ideas >Code of Conduct: http://python.org/psf/codeofconduct/ -- Sent from my Android device with K-9 Mail. Please excuse my brevity. From eric at trueblade.com Mon Jul 20 15:56:54 2015 From: eric at trueblade.com (Eric V. Smith) Date: Mon, 20 Jul 2015 09:56:54 -0400 Subject: [Python-ideas] Briefer string format In-Reply-To: <55AC3425.5010509@mgmiller.net> References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> Message-ID: <55ACFE26.4000303@trueblade.com> On 07/19/2015 07:35 PM, Mike Miller wrote: > Hi, > > Ok, I kept the message brief because I thought this subject had > previously been discussed often. I've expanded it to explain better for > those that are interested. > > --- > > Needed to whip-up some css strings, took a look at the formatting I had > done > and thought it was pretty ugly. I started with the printf style, and had > pulled out the whitespace as vars in order to have a minification option: > > csstext += '%s%s%s{%s' % (nl, key, space, nl) > > Decent but not great, a bit hard on the eyes. So I decided to try > .format(): > > csstext += '{nl}{key}{space}{{{nl}'.format(**locals()) > > This looks a bit better if you ignore the right half, but it is longer > and not > as simple as one might hope. It is much longer still if you type out the > variables needed as kewword params! The '{}' option is not much > improvement > either. > > csstext += '{nl}{key}{space}{{{nl}'.format(nl=nl, key=key, ... # uggh > csstext += '{}{}{}{{{}'.format(nl, key, space, nl) Disclaimer: not well tested code. This code basically does what you want. It eval's the variables in the caller's frame. Of course you have to be able to stomach the use of sys._getframe() and eval(): ####################################### import sys import string class Formatter(string.Formatter): def __init__(self, globals, locals): self.globals = globals self.locals = locals def get_value(self, key, args, kwargs): return eval(key, self.globals, self.locals) # default to looking at the parent's frame def f(str, level=1): frame = sys._getframe(level) formatter = Formatter(frame.f_globals, frame.f_locals) return formatter.format(str) ####################################### Usage: foo = 42 print(f('{foo}')) def get_closure(foo): def _(): foo # hack: else we see the global 'foo' when calling f() return f('{foo}:{sys}') return _ print(get_closure('c')()) def test(value): print(f('value:{value:^20}, open:{open}')) value = 7 open = 3 test(4+3j) del(open) test(4+5j) Produces: 42 c: value: (4+3j) , open:3 value: (4+5j) , open: Eric. From rymg19 at gmail.com Mon Jul 20 16:08:54 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Mon, 20 Jul 2015 09:08:54 -0500 Subject: [Python-ideas] Briefer string format In-Reply-To: <55ACFE26.4000303@trueblade.com> References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> Message-ID: <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> I would prefer something more like: def f(s): caller = inspect.stack()[1][0] return s.format(dict(caller.f_globals, **caller.f_locals)) On July 20, 2015 8:56:54 AM CDT, "Eric V. Smith" wrote: >On 07/19/2015 07:35 PM, Mike Miller wrote: >> Hi, >> >> Ok, I kept the message brief because I thought this subject had >> previously been discussed often. I've expanded it to explain better >for >> those that are interested. >> >> --- >> >> Needed to whip-up some css strings, took a look at the formatting I >had >> done >> and thought it was pretty ugly. I started with the printf style, and >had >> pulled out the whitespace as vars in order to have a minification >option: >> >> csstext += '%s%s%s{%s' % (nl, key, space, nl) >> >> Decent but not great, a bit hard on the eyes. So I decided to try >> .format(): >> >> csstext += '{nl}{key}{space}{{{nl}'.format(**locals()) >> >> This looks a bit better if you ignore the right half, but it is >longer >> and not >> as simple as one might hope. It is much longer still if you type out >the >> variables needed as kewword params! The '{}' option is not much >> improvement >> either. >> >> csstext += '{nl}{key}{space}{{{nl}'.format(nl=nl, key=key, ... # >uggh >> csstext += '{}{}{}{{{}'.format(nl, key, space, nl) > >Disclaimer: not well tested code. > >This code basically does what you want. It eval's the variables in the >caller's frame. Of course you have to be able to stomach the use of >sys._getframe() and eval(): > >####################################### >import sys >import string > >class Formatter(string.Formatter): > def __init__(self, globals, locals): > self.globals = globals > self.locals = locals > > def get_value(self, key, args, kwargs): > return eval(key, self.globals, self.locals) > > ># default to looking at the parent's frame >def f(str, level=1): > frame = sys._getframe(level) > formatter = Formatter(frame.f_globals, frame.f_locals) > return formatter.format(str) >####################################### > >Usage: >foo = 42 >print(f('{foo}')) > >def get_closure(foo): > def _(): > foo # hack: else we see the global 'foo' when calling f() > return f('{foo}:{sys}') > return _ > >print(get_closure('c')()) > >def test(value): > print(f('value:{value:^20}, open:{open}')) > >value = 7 >open = 3 >test(4+3j) >del(open) >test(4+5j) > >Produces: >42 >c: >value: (4+3j) , open:3 >value: (4+5j) , open: > >Eric. > > >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >https://mail.python.org/mailman/listinfo/python-ideas >Code of Conduct: http://python.org/psf/codeofconduct/ -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Mon Jul 20 16:19:34 2015 From: eric at trueblade.com (Eric V. Smith) Date: Mon, 20 Jul 2015 10:19:34 -0400 Subject: [Python-ideas] Briefer string format In-Reply-To: <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> Message-ID: <55AD0376.7020000@trueblade.com> On 07/20/2015 10:08 AM, Ryan Gonzalez wrote: > I would prefer something more like: > > def f(s): > caller = inspect.stack()[1][0] > return s.format(dict(caller.f_globals, **caller.f_locals)) You need to use format_map (or **dict(...)). And ChainMap might be a better choice, it would take some benchmarking to know. Also, you don't get builtins using this approach. I'm using eval to exactly match what evaluating the variable in the parent context would give you. That might not matter depending on the actual requirements. But I agree there are multiple ways to do this, and several of them could be made to work. Mine might have fatal flaws that more testing would show. Eric. > > > On July 20, 2015 8:56:54 AM CDT, "Eric V. Smith" wrote: > > On 07/19/2015 07:35 PM, Mike Miller wrote: > > Hi, > > Ok, I kept the message brief because I thought this subject had > previously been discussed often. I've expanded it to explain > better for > those that are interested. > > --- > > Needed to whip-up some css strings, took a look at the > formatting I had > done > and thought it was pretty ugly. I started with the printf style, > and had > pulled out the whitespace as vars in order to have a > minification option: > > csstext += '%s%s%s{%s' % (nl, key, space, nl) > > Decent but not great, a bit hard on the eyes. So I decided to try > .format(): > > csstext += '{nl}{key}{space}{{{nl}'.format(**locals()) > > This looks a bit better if you ignore the right half, but it is > longer > and not > as simple as one might hope. It is much longer still if you type > out the > variables needed as kewword params! The '{}' option is not much > improvement > either. > > csstext += '{nl}{key}{space}{{{nl}'.format(nl=nl, key=key, ... # > uggh > csstext += '{}{}{}{{{}'.format(nl, key, space, nl) > > > Disclaimer: not well tested code. > > This code basically does what you want. It eval's the variables in the > caller's frame. Of course you have to be able to stomach the use of > sys._getframe() and eval(): > > ####################################### > import sys > import string > > class Formatter(string.Formatter): > def __init__(self, globals, locals): > self.globals = globals > self.locals = locals > > def get_value(self, key, args, kwargs): > return eval(key, self.globals, self.locals) > > > # default to looking at the parent's frame > def f(str, > level=1): > frame = sys._getframe(level) > formatter = Formatter(frame.f_globals, frame.f_locals) > return formatter.format(str) > ####################################### > > Usage: > foo = 42 > print(f('{foo}')) > > def get_closure(foo): > def _(): > foo # hack: else we see the global 'foo' when calling f() > return f('{foo}:{sys}') > return _ > > print(get_closure('c')()) > > def test(value): > print(f('value:{value:^20}, open:{open}')) > > value = 7 > open = 3 > test(4+3j) > del(open) > test(4+5j) > > Produces: > 42 > c: > value: (4+3j) , open:3 > value: (4+5j) , open: > > Eric. > > > ------------------------------------------------------------------------ > > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > > -- > Sent from my Android device with K-9 Mail. Please excuse my brevity. From eric at trueblade.com Mon Jul 20 19:08:37 2015 From: eric at trueblade.com (Eric V. Smith) Date: Mon, 20 Jul 2015 13:08:37 -0400 Subject: [Python-ideas] Briefer string format In-Reply-To: <55AD0376.7020000@trueblade.com> References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> Message-ID: <55AD2B15.1080909@trueblade.com> On 07/20/2015 10:19 AM, Eric V. Smith wrote: > On 07/20/2015 10:08 AM, Ryan Gonzalez wrote: >> I would prefer something more like: >> >> def f(s): >> caller = inspect.stack()[1][0] >> return s.format(dict(caller.f_globals, **caller.f_locals)) > > You need to use format_map (or **dict(...)). And ChainMap might be a > better choice, it would take some benchmarking to know. > > Also, you don't get builtins using this approach. I'm using eval to > exactly match what evaluating the variable in the parent context would > give you. That might not matter depending on the actual requirements. > > But I agree there are multiple ways to do this, and several of them > could be made to work. Mine might have fatal flaws that more testing > would show. My quick testing comes up with this, largely based on the code by joejev: import sys import collections def f(str): frame = sys._getframe(1) return str.format_map(collections.ChainMap( frame.f_locals, frame.f_globals, frame.f_globals['__builtins__'].__dict__)) I'm not sure about the builtins, but this seems to work. Also, you might want to be able to pass in the frame depth to allow this to be callable more than 1 level deep. So, given that this is all basically possible to implement today (at the cost of using sys._getframe()), I'm -1 on adding any compiler tricks to support this via syntax. From what I know of PyPy, this should be supported there, albeit at a large performance cost. Eric. From guido at python.org Mon Jul 20 19:25:51 2015 From: guido at python.org (Guido van Rossum) Date: Mon, 20 Jul 2015 19:25:51 +0200 Subject: [Python-ideas] Briefer string format In-Reply-To: <55AD2B15.1080909@trueblade.com> References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> Message-ID: Perhaps surprisingly, I find myself leaning in favor of the f'...{var}...' form. It is explicit in the variable name. Historically, the `x` notation as an alias for repr(x) was meant to play this role -- you'd write '...' + `var` + '...', but it wasn't brief enough, and the `` are hard to see. f'...' is more explicit, and can be combined with r'...' and b'...' (or both) as needed. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Mon Jul 20 19:57:33 2015 From: eric at trueblade.com (Eric V. Smith) Date: Mon, 20 Jul 2015 13:57:33 -0400 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> Message-ID: <55AD368D.7020108@trueblade.com> On 07/20/2015 01:25 PM, Guido van Rossum wrote: > Perhaps surprisingly, I find myself leaning in favor of the > f'...{var}...' form. It is explicit in the variable name. > > Historically, the `x` notation as an alias for repr(x) was meant to play > this role -- you'd write '...' + `var` + '...', but it wasn't brief > enough, and the `` are hard to see. f'...' is more explicit, and can be > combined with r'...' and b'...' (or both) as needed. We didn't implement b''.format(), for a variety of reasons. Mostly to do with user-defined types returning unicode from __format__, if I recall correctly. So the idea is that f'x:{a.x} y:{y}' would translate to bytecode that does: 'x:{a.x} y:{y}'.format(a=a, y=y) Correct? I think I could leverage _string.formatter_parser() to do this, although it's been a while since I wrote that. And I'm not sure what's available at compile time. But I can look into it. I guess the other option is to have it generate: 'x:{a.x} y:{y}'.format_map(collections.ChainMap(globals(), locals(), __builtins__)) That way, I wouldn't have to parse the string to pick out what variables are referenced in it, then have .format() parse it again. Eric. From guido at python.org Mon Jul 20 20:56:25 2015 From: guido at python.org (Guido van Rossum) Date: Mon, 20 Jul 2015 20:56:25 +0200 Subject: [Python-ideas] Briefer string format In-Reply-To: <55AD368D.7020108@trueblade.com> References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> Message-ID: On Mon, Jul 20, 2015 at 7:57 PM, Eric V. Smith wrote: > On 07/20/2015 01:25 PM, Guido van Rossum wrote: > > Perhaps surprisingly, I find myself leaning in favor of the > > f'...{var}...' form. It is explicit in the variable name. > > > > Historically, the `x` notation as an alias for repr(x) was meant to play > > this role -- you'd write '...' + `var` + '...', but it wasn't brief > > enough, and the `` are hard to see. f'...' is more explicit, and can be > > combined with r'...' and b'...' (or both) as needed. > > We didn't implement b''.format(), for a variety of reasons. Mostly to do > with user-defined types returning unicode from __format__, if I recall > correctly. > Oh, I forgot that. > So the idea is that > f'x:{a.x} y:{y}' > would translate to bytecode that does: > 'x:{a.x} y:{y}'.format(a=a, y=y) > > Correct? > I was more thinking of translating that specific example to 'x:{} y:{}'.format(a.x, y) which avoids some of the issues your example is trying to clarify. It would still probably be best to limit the syntax inside {} to exactly what regular .format() supports, to avoid confusing users. Though the consistency argument can be played both ways -- supporting absolutely anything that is a valid expression would be more consistent with other places where expressions occur. E.g. in principle we could support operators and function calls here. > I think I could leverage _string.formatter_parser() to do this, although > it's been a while since I wrote that. And I'm not sure what's available > at compile time. But I can look into it. > I guess that would mean the former restriction. I think it's fine. > I guess the other option is to have it generate: > 'x:{a.x} y:{y}'.format_map(collections.ChainMap(globals(), locals(), > __builtins__)) > > That way, I wouldn't have to parse the string to pick out what variables > are referenced in it, then have .format() parse it again. > No; I really want to avoid having to use globals() or locals() here. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From Steve.Dower at microsoft.com Mon Jul 20 20:41:14 2015 From: Steve.Dower at microsoft.com (Steve Dower) Date: Mon, 20 Jul 2015 18:41:14 +0000 Subject: [Python-ideas] Briefer string format In-Reply-To: <55AD368D.7020108@trueblade.com> References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> Message-ID: Eric V. Smith wrote: > On 07/20/2015 01:25 PM, Guido van Rossum wrote: >> Perhaps surprisingly, I find myself leaning in favor of the >> f'...{var}...' form. It is explicit in the variable name. >> >> Historically, the `x` notation as an alias for repr(x) was meant to >> play this role -- you'd write '...' + `var` + '...', but it wasn't >> brief enough, and the `` are hard to see. f'...' is more explicit, and >> can be combined with r'...' and b'...' (or both) as needed. > > We didn't implement b''.format(), for a variety of reasons. Mostly to do with > user-defined types returning unicode from __format__, if I recall correctly. > > So the idea is that > f'x:{a.x} y:{y}' > would translate to bytecode that does: > 'x:{a.x} y:{y}'.format(a=a, y=y) > > Correct? That's exactly what I had in mind, at least. Indexing is supported in format strings too, so f'{a[1]}' also becomes '{a[1]}'.format(a=a), but I don't think there are any other strange cases here. I would vote for f'{}' or f'{0}' to just be a SyntaxError. I briefly looked into how this would be implemented and while it's not quite trivial/localized, it should be relatively straightforward if we don't allow implicit merging of f'' strings. If we wanted to allow implicit merging then we'd need to touch more code, but I don't see any benefit from allowing it at all, let alone enough to justify seriously messing with this part of the parser. > I think I could leverage _string.formatter_parser() to do this, although it's > been a while since I wrote that. And I'm not sure what's available at compile > time. But I can look into it. > > I guess the other option is to have it generate: > 'x:{a.x} y:{y}'.format_map(collections.ChainMap(globals(), locals(), > __builtins__)) > > That way, I wouldn't have to parse the string to pick out what variables are > referenced in it, then have .format() parse it again. If you really want to go with the second approach, ChainMap isn't going to be sufficient, for example: >>> def f(): ... x = 123 ... return ['{x}'.format_map(collections.ChainMap(globals(), locals(), __builtins__)) for _ in range(1)] ... >>> f() Traceback (most recent call last): File "", line 1, in File "", line 3, in f File "", line 3, in KeyError: 'x' If the change also came with a dict-like object that will properly resolve variables from the current scope, that would be fine, but I don't think it can be constructed in terms of existing names. (Also bear in mind that other Python implementations do not necessarily provide sys._getframe(), so defining the lookup in terms of that would not be helpful either.) Cheers, Steve > Eric. From guido at python.org Mon Jul 20 21:22:55 2015 From: guido at python.org (Guido van Rossum) Date: Mon, 20 Jul 2015 21:22:55 +0200 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> Message-ID: (Our posts crossed, to some extent.) On Mon, Jul 20, 2015 at 8:41 PM, Steve Dower wrote: > Eric V. Smith wrote: > > On 07/20/2015 01:25 PM, Guido van Rossum wrote: > >> Perhaps surprisingly, I find myself leaning in favor of the > >> f'...{var}...' form. It is explicit in the variable name. > > [...] > > > So the idea is that > > f'x:{a.x} y:{y}' > > would translate to bytecode that does: > > 'x:{a.x} y:{y}'.format(a=a, y=y) > > > > Correct? > > That's exactly what I had in mind, at least. Indexing is supported in > format strings too, so f'{a[1]}' also becomes '{a[1]}'.format(a=a), but I > don't think there are any other strange cases here. I would vote for f'{}' > or f'{0}' to just be a SyntaxError. > +1 on that last sentence. But I prefer a slightly different way of implementing (see my reply to Eric). > I briefly looked into how this would be implemented and while it's not > quite trivial/localized, it should be relatively straightforward if we > don't allow implicit merging of f'' strings. If we wanted to allow implicit > merging then we'd need to touch more code, but I don't see any benefit from > allowing it at all, let alone enough to justify seriously messing with this > part of the parser. Not sure what you mean by "implicit merging" -- if you mean literal concatenation (e.g. 'foo' "bar" == 'foobar') then I think it should be allowed, just like we support mixing quotes and r''. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Mon Jul 20 21:52:12 2015 From: eric at trueblade.com (Eric V. Smith) Date: Mon, 20 Jul 2015 15:52:12 -0400 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> Message-ID: <55AD516C.3050603@trueblade.com> On 07/20/2015 03:22 PM, Guido van Rossum wrote: > > So the idea is that > > f'x:{a.x} y:{y}' > > would translate to bytecode that does: > > 'x:{a.x} y:{y}'.format(a=a, y=y) > > > > Correct? > > That's exactly what I had in mind, at least. Indexing is supported > in format strings too, so f'{a[1]}' also becomes > '{a[1]}'.format(a=a), but I don't think there are any other strange > cases here. I would vote for f'{}' or f'{0}' to just be a SyntaxError. > > > +1 on that last sentence. But I prefer a slightly different way of > implementing (see my reply to Eric). Right. And following up here to that email: > I was more thinking of translating that specific example to > > 'x:{} y:{}'.format(a.x, y) > > which avoids some of the issues your example is trying to clarify. That is better. The trick is converting the string "a.x" to the expression a.x, which should be easy enough at compile time. > It would still probably be best to limit the syntax inside {} to exactly > what regular .format() supports, to avoid confusing users. The expressions supported by .format() are limited to attribute access and "indexing". We just need to enforce that same restriction here. > Though the consistency argument can be played both ways -- supporting > absolutely anything that is a valid expression would be more consistent > with other places where expressions occur. E.g. in principle we could > support operators and function calls here. It would be easiest to not restrict the expressions, but then we'd have to maintain that restriction in two places. And now that I think about it, it's somewhat more complex than just expanding the expression. In .format(), this: '{a[0]}{b[c]}' is evaluated roughly as format(a[0]) + format(b['c']) So to be consistent with .format(), we have to fully parse at least the indexing out to see if it looks like a constant integer or a string. So given that, I think we should just support what .format() allows, since it's really not quite as simple as "evaluate the expression inside the braces". > Not sure what you mean by "implicit merging" -- if you mean literal > concatenation (e.g. 'foo' "bar" == 'foobar') then I think it should be > allowed, just like we support mixing quotes and r''. If I understand it, I think the concern is: f'{a}{b}' 'foo{}' f'{c}{d}' would need to become: f'{a}{b}foo{{}}{c}{d}' So you have to escape the braces in non-f-strings when merging strings and any of them are f-strings, and make the result an f-string. But I think that's the only complication. From eric at trueblade.com Mon Jul 20 22:20:03 2015 From: eric at trueblade.com (Eric V. Smith) Date: Mon, 20 Jul 2015 16:20:03 -0400 Subject: [Python-ideas] Briefer string format In-Reply-To: <55AD516C.3050603@trueblade.com> References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AD516C.3050603@trueblade.com> Message-ID: <55AD57F3.3030205@trueblade.com> On 07/20/2015 03:52 PM, Eric V. Smith wrote: > On 07/20/2015 03:22 PM, Guido van Rossum wrote: > >> > So the idea is that >> > f'x:{a.x} y:{y}' >> > would translate to bytecode that does: >> > 'x:{a.x} y:{y}'.format(a=a, y=y) >> > >> > Correct? >> >> That's exactly what I had in mind, at least. Indexing is supported >> in format strings too, so f'{a[1]}' also becomes >> '{a[1]}'.format(a=a), but I don't think there are any other strange >> cases here. I would vote for f'{}' or f'{0}' to just be a SyntaxError. >> >> >> +1 on that last sentence. But I prefer a slightly different way of >> implementing (see my reply to Eric). > > Right. And following up here to that email: > >> I was more thinking of translating that specific example to >> >> 'x:{} y:{}'.format(a.x, y) >> >> which avoids some of the issues your example is trying to clarify. > > That is better. The trick is converting the string "a.x" to the > expression a.x, which should be easy enough at compile time. > >> It would still probably be best to limit the syntax inside {} to exactly >> what regular .format() supports, to avoid confusing users. > > The expressions supported by .format() are limited to attribute access > and "indexing". We just need to enforce that same restriction here. > >> Though the consistency argument can be played both ways -- supporting >> absolutely anything that is a valid expression would be more consistent >> with other places where expressions occur. E.g. in principle we could >> support operators and function calls here. > > It would be easiest to not restrict the expressions, but then we'd have > to maintain that restriction in two places. > > And now that I think about it, it's somewhat more complex than just > expanding the expression. In .format(), this: > '{a[0]}{b[c]}' > is evaluated roughly as > format(a[0]) + format(b['c']) > > So to be consistent with .format(), we have to fully parse at least the > indexing out to see if it looks like a constant integer or a string. > > So given that, I think we should just support what .format() allows, > since it's really not quite as simple as "evaluate the expression inside > the braces". And thinking about it yet some more, I think the easiest and most consistent thing to do would be to translate it like: f'{a[0]}{b[c]}' == '{[0]}{[c]}'.format(a, b) So: f'api:{sys.api_version} {a} size{sys.maxsize}' would become either: f'api:{.api_version} {} size{.maxsize}'.format(sys, a, sys) or f'api:{0.api_version} {1} size{0.maxsize}'.format(sys, a) The first one seems simpler. The second probably isn't worth the micro-optimization, and it may even be a pessimization. Eric. From bruce at leban.us Mon Jul 20 23:29:01 2015 From: bruce at leban.us (Bruce Leban) Date: Mon, 20 Jul 2015 14:29:01 -0700 Subject: [Python-ideas] Briefer string format In-Reply-To: <55AD57F3.3030205@trueblade.com> References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AD516C.3050603@trueblade.com> <55AD57F3.3030205@trueblade.com> Message-ID: On Mon, Jul 20, 2015 at 11:41 AM, Steve Dower wrote: > Indexing is supported in format strings too, so f'{a[1]}' also becomes > '{a[1]}'.format(a=a), but I don't think there are any other strange cases > here. I would vote for f'{}' or f'{0}' to just be a SyntaxError. > Maybe I'm missing something, but it seems this could just as reasonably be '{}'.format(a[1])? Is there a reason to prefer the other form over this? On Mon, Jul 20, 2015 at 1:20 PM, Eric V. Smith wrote: > So: > f'api:{sys.api_version} {a} size{sys.maxsize}' > > would become either: > f'api:{.api_version} {} size{.maxsize}'.format(sys, a, sys) > or > f'api:{0.api_version} {1} size{0.maxsize}'.format(sys, a) > Or: f'api:{} {} size{}'.format(sys.api_version, a, sys.maxsize) Note that format strings don't allow variables in subscripts, so f'{a[n]}' ==> '{}'.format(a['n']) Also, the discussion has assumed that if this feature were added it necessarily must be a single character prefix. Looking at the grammar, I don't see that as a requirement as it explicitly defines multiple character sequences. A syntax like: format'a{b}c' formatted"""a{b} c""" might be more readable. There's no namespace conflict just as there is no conflict between raw string literals and a variable named r. --- Bruce Check out my new puzzle book: http://J.mp/ingToConclusions Get it free here: http://J.mp/ingToConclusionsFree (available on iOS) -------------- next part -------------- An HTML attachment was scrubbed... URL: From Steve.Dower at microsoft.com Mon Jul 20 23:46:45 2015 From: Steve.Dower at microsoft.com (Steve Dower) Date: Mon, 20 Jul 2015 21:46:45 +0000 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AD516C.3050603@trueblade.com> <55AD57F3.3030205@trueblade.com> Message-ID: Bruce Leban wrote: > [SNIP] > > Also, the discussion has assumed that if this feature were added it necessarily > must be a single character prefix. Looking at the grammar, I don't see that as a > requirement as it explicitly defines multiple character sequences. A syntax > like: > > format'a{b}c' > formatted"""a{b} > c""" > > might be more readable. There's no namespace conflict just as there is no > conflict between raw string literals and a variable named r. I'd really like to be able to write fr"C:\{dir}\{filename}" at times (those rare times when I'm not using pathlib, admittedly), though there's no reason to need to combine f with u (a no-op) or b (no bytes.format). Cheers, Steve > --- Bruce > Check out my new puzzle book: http://J.mp/ingToConclusions > Get it free here: http://J.mp/ingToConclusionsFree (available on iOS) > > From eric at trueblade.com Tue Jul 21 00:05:21 2015 From: eric at trueblade.com (Eric V. Smith) Date: Mon, 20 Jul 2015 18:05:21 -0400 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AD516C.3050603@trueblade.com> <55AD57F3.3030205@trueblade.com> Message-ID: <55AD70A1.7030302@trueblade.com> On 7/20/2015 5:29 PM, Bruce Leban wrote: > On Mon, Jul 20, 2015 at 1:20 PM, Eric V. Smith > wrote: > > So: > f'api:{sys.api_version} {a} size{sys.maxsize}' > > would become either: > f'api:{.api_version} {} size{.maxsize}'.format(sys, a, sys) > or > f'api:{0.api_version} {1} size{0.maxsize}'.format(sys, a) > > > Or: f'api:{} {} size{}'.format(sys.api_version, a, sys.maxsize) > > Note that format strings don't allow variables in subscripts, so > > f'{a[n]}' ==> '{}'.format(a['n']) Right. But why re-implement that, instead of making it: '{[n]}'.format(a)? I've convinced myself (and maybe no one else) that since you want this: a=[1,2] b={'c':42} f'{a[0]} {b[c]}' being the same as: '{} {}'.format(a[0], b['c']) that it would be easier to make it: '{[0]} {[c]}'.format(a, b) instead of trying to figure out that the numeric-looking '0' gets converted to an integer, and the non-numeric-looking 'c' gets left as a string. That logic already exists in str.format(), so let's just leverage it from there. It also means that you automatically will support the subset of expressions that str.format() already supports, with all of its limitations and quirks. But I now think that's a feature, since str.format() doesn't really support the same expressions as normal Python does (due to the [0] vs. ['c'] issue). And it's way easier to explain if f-strings support the identical syntax as str.format(). The only restriction is that all parameters must be named, and not numbered or auto-numbered. Eric. From alexander.belopolsky at gmail.com Tue Jul 21 03:39:46 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 20 Jul 2015 21:39:46 -0400 Subject: [Python-ideas] Briefer string format In-Reply-To: <55AD57F3.3030205@trueblade.com> References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AD516C.3050603@trueblade.com> <55AD57F3.3030205@trueblade.com> Message-ID: On Mon, Jul 20, 2015 at 4:20 PM, Eric V. Smith wrote: > And thinking about it yet some more, I think the easiest and most > consistent thing to do would be to translate it like: > > f'{a[0]}{b[c]}' == '{[0]}{[c]}'.format(a, b) > I think Python can do more at compile time and translate f"Result1={expr1:fmt1};Result2={expr2:fmt2}" to bytecode equivalent of "Result1=%s;Result2=%s" % ((expr1).__format__(fmt1), (expr2).__format__(fmt2)) -------------- next part -------------- An HTML attachment was scrubbed... URL: From Steve.Dower at microsoft.com Tue Jul 21 04:10:31 2015 From: Steve.Dower at microsoft.com (Steve Dower) Date: Tue, 21 Jul 2015 02:10:31 +0000 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AD516C.3050603@trueblade.com> <55AD57F3.3030205@trueblade.com>, Message-ID: I'd rather keep the transform as simple as possible. If text formatting is your bottleneck, congratulations on fixing your network, disk, RAM and probably your users. Those who need to micro-optimize this code can do what you suggested by hand - there's no need for us to make our lives more complicated for the straw man who has a string formatting bottleneck and doesn't know enough to research another approach. Cheers, Steve Top-posted from my Windows Phone ________________________________ From: Alexander Belopolsky Sent: ?7/?20/?2015 18:40 To: Eric V. Smith Cc: python-ideas Subject: Re: [Python-ideas] Briefer string format On Mon, Jul 20, 2015 at 4:20 PM, Eric V. Smith > wrote: And thinking about it yet some more, I think the easiest and most consistent thing to do would be to translate it like: f'{a[0]}{b[c]}' == '{[0]}{[c]}'.format(a, b) I think Python can do more at compile time and translate f"Result1={expr1:fmt1};Result2={expr2:fmt2}" to bytecode equivalent of "Result1=%s;Result2=%s" % ((expr1).__format__(fmt1), (expr2).__format__(fmt2)) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericfahlgren at gmail.com Tue Jul 21 04:43:29 2015 From: ericfahlgren at gmail.com (Eric Fahlgren) Date: Mon, 20 Jul 2015 19:43:29 -0700 Subject: [Python-ideas] Briefer string format In-Reply-To: <55AD70A1.7030302@trueblade.com> References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AD516C.3050603@trueblade.com> <55AD57F3.3030205@trueblade.com> <55AD70A1.7030302@trueblade.com> Message-ID: <004301d0c35f$07e9b2d0$17bd1870$@gmail.com> Eric V. Smith wrote: > Right. But why re-implement that, instead of making it: > '{[n]}'.format(a)? Consider also the case of custom formatters. I've got one that overloads format_field, adds a units specifier in the format, which then uses our model units conversion and writes values in the current user-units of the system: x = body.x_coord # A "Double()" object with units of length. print(f'{x:length:.3f}') # Uses the "length" string to perform a units conversion much as "!r" would invoke "repr()". I think your proposal above handles my use case the most cleanly. Another Eric From alexander.belopolsky at gmail.com Tue Jul 21 04:44:25 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 20 Jul 2015 22:44:25 -0400 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AD516C.3050603@trueblade.com> <55AD57F3.3030205@trueblade.com> Message-ID: On Mon, Jul 20, 2015 at 10:10 PM, Steve Dower wrote: > If text formatting is your bottleneck, congratulations on fixing your > network, disk, RAM and probably your users. Thank you, but one of my servers just spent 18 hours loading 10GB of XML data into a database. Given that CPU was loaded 100% all this time, I suspect neither network nor disk and not even RAM was the bottleneck. Since XML parsing was done by C code and only formatting of database INSERT instructions was done in Python, I strongly suspect string formatting had a sizable carbon footprint in this case. Not all string formatting is done for human consumption. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Steve.Dower at microsoft.com Mon Jul 20 22:17:25 2015 From: Steve.Dower at microsoft.com (Steve Dower) Date: Mon, 20 Jul 2015 20:17:25 +0000 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> Message-ID: Guido van Rossum wrote: > (Our posts crossed, to some extent.) Yeah, mine delayed about 20 minutes after I sent it before I saw the list send it out. Not sure what's going on there... > On Mon, Jul 20, 2015 at 8:41 PM, Steve Dower wrote: > Eric V. Smith wrote: >> On 07/20/2015 01:25 PM, Guido van Rossum wrote: >>> Perhaps surprisingly, I find myself leaning in favor of the >>> f'...{var}...' form. It is explicit in the variable name. >> >> [...] >> >>> So the idea is that >>> f'x:{a.x} y:{y}' >>> would translate to bytecode that does: >>> 'x:{a.x} y:{y}'.format(a=a, y=y) >>> >>> Correct? >> >> That's exactly what I had in mind, at least. Indexing is supported in format >> strings too, so f'{a[1]}' also becomes '{a[1]}'.format(a=a), but I don't think >> there are any other strange cases here. I would vote for f'{}' or f'{0}' to just >> be a SyntaxError. > > +1 on that last sentence. But I prefer a slightly different way of implementing > (see my reply to Eric). Yep, saw that and after giving it some thought I agree. Initially I liked the cleanliness of not modifying the original string, but the transform does seem easier to explain as "lifting" each expression out of the string (at least compared to "lifting the first part of each expression and combining duplicates and assuming they are always the same value"). One catch here is that '{a[b]}' has to transform to '{}'.format(a['b']) and not .format(a[b]), which is fine but an extra step. IIRC you can only use ints and strs as keys in a format string. >> I briefly looked into how this would be implemented and while it's not quite >> trivial/localized, it should be relatively straightforward if we don't allow >> implicit merging of f'' strings. If we wanted to allow implicit merging then >> we'd need to touch more code, but I don't see any benefit from allowing it at >> all, let alone enough to justify seriously messing with this part of the parser. > > Not sure what you mean by "implicit merging" -- if you mean literal > concatenation (e.g. 'foo' "bar" == 'foobar') then I think it should be allowed, > just like we support mixing quotes and r''. Except we don't really have a literal now - it's an expression. Does f"{a}" "{b}" become "{}{}".format(a, b), "{}".format(a) + "{b}" or "{}{{b}}".format(a)? What about f"{" f"{a}" f"}"? Should something that looks like literal concatenation silently become runtime concatenation? Yes, it's possible to answer and define all of these, but I don't see how it adds value (though I am one of those people who never use literal concatenation and advise others not to use it either), and I see plenty of ways it would unnecessarily extend discussion and prevent actually getting something done. Cheers, Steve > -- > --Guido van Rossum (python.org/~guido) From rosuav at gmail.com Tue Jul 21 04:53:28 2015 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 21 Jul 2015 12:53:28 +1000 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AD516C.3050603@trueblade.com> <55AD57F3.3030205@trueblade.com> Message-ID: On Tue, Jul 21, 2015 at 12:44 PM, Alexander Belopolsky wrote: > On Mon, Jul 20, 2015 at 10:10 PM, Steve Dower > wrote: >> >> If text formatting is your bottleneck, congratulations on fixing your >> network, disk, RAM and probably your users. > > > Thank you, but one of my servers just spent 18 hours loading 10GB of XML > data into a database. Given that CPU was loaded 100% all this time, I > suspect neither network nor disk and not even RAM was the bottleneck. Since > XML parsing was done by C code and only formatting of database INSERT > instructions was done in Python, I strongly suspect string formatting had a > sizable carbon footprint in this case. > > Not all string formatting is done for human consumption. Well-known rule of optimization: Measure, don't assume. There could be something completely different that's affecting your performance. I'd be impressed and extremely surprised if the formatting of INSERT queries took longer than the execution of those same queries, but even if that is the case, it could be the XML parsing (just because it's in C doesn't mean it's inherently faster than any Python code), or the database itself, or suboptimal paging of virtual memory. Before pointing fingers anywhere, measure. Measure. Measure! ChrisA From alexander.belopolsky at gmail.com Tue Jul 21 05:11:47 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 20 Jul 2015 23:11:47 -0400 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AD516C.3050603@trueblade.com> <55AD57F3.3030205@trueblade.com> Message-ID: On Mon, Jul 20, 2015 at 10:53 PM, Chris Angelico wrote: > I'd > be impressed and extremely surprised if the formatting of INSERT > queries took longer than the execution of those same queries, > This is getting off-topic for this list, but you may indeed be surprised by the performance that kdb+ (kx.com) with PyQ (pyq.enlnt.com) can deliver. [Full disclosure: I am the author of PyQ, so sorry for a shameless plug.] -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Tue Jul 21 05:28:44 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 20 Jul 2015 23:28:44 -0400 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AD516C.3050603@trueblade.com> <55AD57F3.3030205@trueblade.com> Message-ID: On Mon, Jul 20, 2015 at 11:16 PM, Steve Dower wrote: > Making f"" strings subtly faster isn't going to solve your performance > issue, and while I'm not advocating wastefulness, this looks like a > premature optimization, especially when put alongside the guaranteed heap > allocations and very likely IO that are also going to occur. One thing I know for a fact is that the use of % formatting instead of .format makes a significant difference in my applications. This is not surprising given these timings: $ python3 -mtimeit "'%d' % 2" 100000000 loops, best of 3: 0.00966 usec per loop $ python3 -mtimeit "'{}'.format(2)" 1000000 loops, best of 3: 0.216 usec per loop As a result, my rule of thumb is to avoid the use of .format in anything remotely performance critical. If f"" syntax is implemented as a sugar for .format - it will be equally useless for most of my needs. However, I think it can be implemented in a way that will make me consider switching away from % formatting. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Steve.Dower at microsoft.com Tue Jul 21 05:16:05 2015 From: Steve.Dower at microsoft.com (Steve Dower) Date: Tue, 21 Jul 2015 03:16:05 +0000 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AD516C.3050603@trueblade.com> <55AD57F3.3030205@trueblade.com> , Message-ID: Sounds like you deserve the congratulations then :) But when you've confirmed that string formatting is something that can be changed to improve performance (specifically parsing the format string in this case), you have options regardless of the default optimization. For instance, you probably want to preallocate a list, format and set each non-string item, then use .join (or if possible, write directly from the list without the intermediate step of producing a single string). Making f"" strings subtly faster isn't going to solve your performance issue, and while I'm not advocating wastefulness, this looks like a premature optimization, especially when put alongside the guaranteed heap allocations and very likely IO that are also going to occur. Cheers, Steve Top-posted from my Windows Phone ________________________________ From: Alexander Belopolsky Sent: ?7/?20/?2015 19:44 To: Steve Dower Cc: Eric V. Smith; python-ideas Subject: Re: [Python-ideas] Briefer string format On Mon, Jul 20, 2015 at 10:10 PM, Steve Dower > wrote: If text formatting is your bottleneck, congratulations on fixing your network, disk, RAM and probably your users. Thank you, but one of my servers just spent 18 hours loading 10GB of XML data into a database. Given that CPU was loaded 100% all this time, I suspect neither network nor disk and not even RAM was the bottleneck. Since XML parsing was done by C code and only formatting of database INSERT instructions was done in Python, I strongly suspect string formatting had a sizable carbon footprint in this case. Not all string formatting is done for human consumption. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Tue Jul 21 05:35:06 2015 From: tim.peters at gmail.com (Tim Peters) Date: Mon, 20 Jul 2015 22:35:06 -0500 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AD516C.3050603@trueblade.com> <55AD57F3.3030205@trueblade.com> Message-ID: [Alexander Belopolsky] > One thing I know for a fact is that the use of % formatting instead of > .format makes a significant difference in my applications. This is not > surprising given these timings: > > $ python3 -mtimeit "'%d' % 2" > 100000000 loops, best of 3: 0.00966 usec per loop > $ python3 -mtimeit "'{}'.format(2)" > 1000000 loops, best of 3: 0.216 usec per loop Well, be sure to check what you're actually timing. Here under Python 3.4.3: >>> from dis import dis >>> def f(): return "%d" % 2 >>> dis(f) 2 0 LOAD_CONST 3 ('2') 3 RETURN_VALUE That is, the peephole optimizer got rid of "%d" % 2 entirely, replacing it with the string constant "2". So, in all, it's more surprising that it takes so long to load a constant ;-) From rosuav at gmail.com Tue Jul 21 05:47:23 2015 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 21 Jul 2015 13:47:23 +1000 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AD516C.3050603@trueblade.com> <55AD57F3.3030205@trueblade.com> Message-ID: On Tue, Jul 21, 2015 at 1:35 PM, Tim Peters wrote: > [Alexander Belopolsky] >> One thing I know for a fact is that the use of % formatting instead of >> .format makes a significant difference in my applications. This is not >> surprising given these timings: >> >> $ python3 -mtimeit "'%d' % 2" >> 100000000 loops, best of 3: 0.00966 usec per loop >> $ python3 -mtimeit "'{}'.format(2)" >> 1000000 loops, best of 3: 0.216 usec per loop > > Well, be sure to check what you're actually timing. Here under Python 3.4.3: > >>>> from dis import dis >>>> def f(): > return "%d" % 2 >>>> dis(f) > 2 0 LOAD_CONST 3 ('2') > 3 RETURN_VALUE > > That is, the peephole optimizer got rid of "%d" % 2 entirely, > replacing it with the string constant "2". So, in all, it's more > surprising that it takes so long to load a constant ;-) Interesting that the same optimization can't be done on the .format() version - it's not as if anyone can monkey-patch str so it does something different, is it? To defeat the optimization, I tried this: rosuav at sikorsky:~$ python3 -mtimeit -s "x=2" "'%d' % 2" 100000000 loops, best of 3: 0.0156 usec per loop rosuav at sikorsky:~$ python3 -mtimeit -s "x=2" "'%d' % x" 10000000 loops, best of 3: 0.162 usec per loop rosuav at sikorsky:~$ python3 -mtimeit -s "x=2" "'{}'.format(2)" 1000000 loops, best of 3: 0.225 usec per loop rosuav at sikorsky:~$ python3 -mtimeit -s "x=2" "'{}'.format(x)" 1000000 loops, best of 3: 0.29 usec per loop The difference is still there, but it's become a lot less dramatic - about two to one. I think that's the honest difference between them, and that's not usually going to be enough to make any sort of significant difference. ChrisA From alexander.belopolsky at gmail.com Tue Jul 21 06:02:28 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 21 Jul 2015 00:02:28 -0400 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AD516C.3050603@trueblade.com> <55AD57F3.3030205@trueblade.com> Message-ID: On Mon, Jul 20, 2015 at 11:35 PM, Tim Peters wrote: > >>> dis(f) > 2 0 LOAD_CONST 3 ('2') > 3 RETURN_VALUE > > That is, the peephole optimizer got rid of "%d" % 2 entirely, > replacing it with the string constant "2". So, in all, it's more > surprising that it takes so long to load a constant ;-) > Hmm. I stand corrected: $ python3 -mtimeit -s "a=2" "'%s' % a" 10000000 loops, best of 3: 0.124 usec per loop $ python3 -mtimeit -s "a=2" "'{}'.format(a)" 1000000 loops, best of 3: 0.215 usec per loop it is 2x rather than 20x speed difference. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Tue Jul 21 06:22:18 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 21 Jul 2015 13:22:18 +0900 Subject: [Python-ideas] Briefer string format In-Reply-To: <069443CB-ACD5-44B6-BE84-0F53786A4F25@gmail.com> References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <21932.30628.340500.24858@uwakimon.sk.tsukuba.ac.jp> <069443CB-ACD5-44B6-BE84-0F53786A4F25@gmail.com> Message-ID: <21933.51450.858742.636685@uwakimon.sk.tsukuba.ac.jp> Ryan Gonzalez writes: > >I suppose the OP will claim that an explicit call to locals() is > >verbose and redundant, but if that really is a problem: > > > > def format_with_locals(fmtstr): > > return fmtstr % locals() > > > > Won't this use the locals of the function format_with_locals over its caller? Yes, it will. I apologize for posting untested code. From eric at trueblade.com Tue Jul 21 06:25:02 2015 From: eric at trueblade.com (Eric V. Smith) Date: Tue, 21 Jul 2015 00:25:02 -0400 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AD516C.3050603@trueblade.com> <55AD57F3.3030205@trueblade.com> Message-ID: <55ADC99E.302@trueblade.com> On 7/21/2015 12:02 AM, Alexander Belopolsky wrote: > > On Mon, Jul 20, 2015 at 11:35 PM, Tim Peters > wrote: > > >>> dis(f) > 2 0 LOAD_CONST 3 ('2') > 3 RETURN_VALUE > > That is, the peephole optimizer got rid of "%d" % 2 entirely, > replacing it with the string constant "2". So, in all, it's more > surprising that it takes so long to load a constant ;-) > > > Hmm. I stand corrected: > > $ python3 -mtimeit -s "a=2" "'%s' % a" > 10000000 loops, best of 3: 0.124 usec per loop > $ python3 -mtimeit -s "a=2" "'{}'.format(a)" > 1000000 loops, best of 3: 0.215 usec per loop > > it is 2x rather than 20x speed difference. The last time I looked at this, the performance difference was the lookup of "format" on a string object. Although maybe that's not true, and the problem is really function call overhead: $ python3 -mtimeit -s 'a=2' 'f="{}".format' 'f(a)' 1000000 loops, best of 3: 0.227 usec per loop $ python3 -mtimeit -s "a=2" "'%s' % a" 10000000 loops, best of 3: 0.126 usec per loop There is (or was) a special case for formatting str, int, and float to bypass the .__format__ lookup. I haven't looked at it since the PEP 393 work. Eric. From stephen at xemacs.org Tue Jul 21 07:16:24 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 21 Jul 2015 14:16:24 +0900 Subject: [Python-ideas] Briefer string format In-Reply-To: <55AD70A1.7030302@trueblade.com> References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AD516C.3050603@trueblade.com> <55AD57F3.3030205@trueblade.com> <55AD70A1.7030302@trueblade.com> Message-ID: <21933.54696.967426.117238@uwakimon.sk.tsukuba.ac.jp> Eric V. Smith writes: > instead of trying to figure out that the numeric-looking '0' gets > converted to an integer, and the non-numeric-looking 'c' gets left as a > string. That logic already exists in str.format(), so let's just > leverage it from there. Yes, please! Guido's point that he wants no explicit use of locals(), etc, in the implementation took me a bit of thought to understand, but then I realized that it means a "macro" transformation with the resulting expression evaluated in the same environment as an explicit .format() would be. And that indeed makes the whole thing as explicit as invoking str.format would be. I don't *really* care what transformations are used to get that result, but DRYing this out and letting the __format__ method of the indexed object figure out the meaning of the format string makes me feel better about my ability to *think* about the meaning of an f"..." string. In particular, isn't it possible that a user class's __format__ might decide that *all* keys are strings? I don't see how the transformation Steve Dower proposed can possibly deal with that ambiguity. Another conundrum is that it's not obvious whether f"{a[01]}" is a SyntaxError (as it is with str.format) or equivalent to "{}".format(a['01']) (as my hypothetical user's class would expect). From eric at trueblade.com Tue Jul 21 07:22:40 2015 From: eric at trueblade.com (Eric V. Smith) Date: Tue, 21 Jul 2015 01:22:40 -0400 Subject: [Python-ideas] Briefer string format In-Reply-To: <55ADC99E.302@trueblade.com> References: <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AD516C.3050603@trueblade.com> <55AD57F3.3030205@trueblade.com> <55ADC99E.302@trueblade.com> Message-ID: <55ADD720.70201@trueblade.com> On 7/21/2015 12:25 AM, Eric V. Smith wrote: > On 7/21/2015 12:02 AM, Alexander Belopolsky wrote: >> >> On Mon, Jul 20, 2015 at 11:35 PM, Tim Peters > > wrote: >> >> >>> dis(f) >> 2 0 LOAD_CONST 3 ('2') >> 3 RETURN_VALUE >> >> That is, the peephole optimizer got rid of "%d" % 2 entirely, >> replacing it with the string constant "2". So, in all, it's more >> surprising that it takes so long to load a constant ;-) >> >> >> Hmm. I stand corrected: >> >> $ python3 -mtimeit -s "a=2" "'%s' % a" >> 10000000 loops, best of 3: 0.124 usec per loop >> $ python3 -mtimeit -s "a=2" "'{}'.format(a)" >> 1000000 loops, best of 3: 0.215 usec per loop >> >> it is 2x rather than 20x speed difference. > > The last time I looked at this, the performance difference was the > lookup of "format" on a string object. Although maybe that's not true, > and the problem is really function call overhead: > > $ python3 -mtimeit -s 'a=2' 'f="{}".format' 'f(a)' > 1000000 loops, best of 3: 0.227 usec per loop > $ python3 -mtimeit -s "a=2" "'%s' % a" > 10000000 loops, best of 3: 0.126 usec per loop Oops, that should have been: $ python3 -mtimeit -s 'a=2; f="{}".format' 'f(a)' 1000000 loops, best of 3: 0.19 usec per loop $ python3 -mtimeit -s "a=2" "'%s' % a" 10000000 loops, best of 3: 0.138 usec per loop So, about 40% slower if you can get rid of the .format lookup, which we can do with f-strings. Because it's more flexible, .format is just never going to be as fast as %-formatting. But there's no doubt room for improvement. The recursive nature of: f'{o.name:{o.len}}' will complicate some of the optimizations I've been thinking of. > There is (or was) a special case for formatting str, int, and float to > bypass the .__format__ lookup. I haven't looked at it since the PEP 393 > work. Looks like it's still there. Also for complex! Eric. From eric at trueblade.com Tue Jul 21 07:43:12 2015 From: eric at trueblade.com (Eric V. Smith) Date: Tue, 21 Jul 2015 01:43:12 -0400 Subject: [Python-ideas] Briefer string format In-Reply-To: <21933.54696.967426.117238@uwakimon.sk.tsukuba.ac.jp> References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AD516C.3050603@trueblade.com> <55AD57F3.3030205@trueblade.com> <55AD70A1.7030302@trueblade.com> <21933.54696.967426.117238@uwakimon.sk.tsukuba.ac.jp> Message-ID: <55ADDBF0.4040808@trueblade.com> On 7/21/2015 1:16 AM, Stephen J. Turnbull wrote: > Eric V. Smith writes: > > > instead of trying to figure out that the numeric-looking '0' gets > > converted to an integer, and the non-numeric-looking 'c' gets left as a > > string. That logic already exists in str.format(), so let's just > > leverage it from there. > > Yes, please! Guido's point that he wants no explicit use of locals(), > etc, in the implementation took me a bit of thought to understand, but > then I realized that it means a "macro" transformation with the > resulting expression evaluated in the same environment as an explicit > .format() would be. And that indeed makes the whole thing as explicit > as invoking str.format would be. Right. That is indeed the beauty of the thing. I now think locals(), etc. is a non-starter. > I don't *really* care what transformations are used to get that > result, but DRYing this out and letting the __format__ method of the > indexed object figure out the meaning of the format string makes me > feel better about my ability to *think* about the meaning of an f"..." > string. > > In particular, isn't it possible that a user class's __format__ might > decide that *all* keys are strings? I don't see how the > transformation Steve Dower proposed can possibly deal with that > ambiguity. In today's world: '{a[0]:4d}'.format(a=a) the object who's __format__() method is being called is a[0], not a. So it's not up to the object to decide what the keys mean. That decision is being made by the ''.format() implementation. And that's also the way I'm envisioning it with f-strings. > Another conundrum is that it's not obvious whether f"{a[01]}" is a > SyntaxError (as it is with str.format) or equivalent to > "{}".format(a['01']) (as my hypothetical user's class would expect). It would still be a syntax error, in my imagined implementation, because it's really calling ''.format() to do the expansion. So here's what I'm thinking f'some-string' would expand to. As you note above, it's happening in the caller's context: new_fmt = remove_all_object_names_from_string(s) objs = find_all_objects_referenced_in_string(s) result = new_fmt.format(*objs) So given: X = namedtuple('X', 'name width') a = X('Eric', 10) value = 'some value' then: f'{a.name:*^{a.width}}:{value}' would become this transformed code: '{.name:*^{.width}}:{}'.format(*[a, a, value]) which would evaluate to: '***Eric***:some value' The transformation of the f-string to new_fmt and the computation of objs is the only new part. The transformed code above works today. Eric. From guido at python.org Tue Jul 21 08:05:17 2015 From: guido at python.org (Guido van Rossum) Date: Tue, 21 Jul 2015 08:05:17 +0200 Subject: [Python-ideas] Briefer string format In-Reply-To: <55AD516C.3050603@trueblade.com> References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AD516C.3050603@trueblade.com> Message-ID: On Mon, Jul 20, 2015 at 9:52 PM, Eric V. Smith wrote: > On 07/20/2015 03:22 PM, Guido van Rossum wrote: > > > > So the idea is that > > > f'x:{a.x} y:{y}' > > > would translate to bytecode that does: > > > 'x:{a.x} y:{y}'.format(a=a, y=y) > > > > > > Correct? > > > > That's exactly what I had in mind, at least. Indexing is supported > > in format strings too, so f'{a[1]}' also becomes > > '{a[1]}'.format(a=a), but I don't think there are any other strange > > cases here. I would vote for f'{}' or f'{0}' to just be a > SyntaxError. > > > > > > +1 on that last sentence. But I prefer a slightly different way of > > implementing (see my reply to Eric). > > Right. And following up here to that email: > > > I was more thinking of translating that specific example to > > > > 'x:{} y:{}'.format(a.x, y) > > > > which avoids some of the issues your example is trying to clarify. > > That is better. The trick is converting the string "a.x" to the > expression a.x, which should be easy enough at compile time. > I wonder if we could let the parser do this? consider f'x:{ as one token and so on? > > It would still probably be best to limit the syntax inside {} to exactly > > what regular .format() supports, to avoid confusing users. > > The expressions supported by .format() are limited to attribute access > and "indexing". We just need to enforce that same restriction here. > > > Though the consistency argument can be played both ways -- supporting > > absolutely anything that is a valid expression would be more consistent > > with other places where expressions occur. E.g. in principle we could > > support operators and function calls here. > > It would be easiest to not restrict the expressions, but then we'd have > to maintain that restriction in two places. > > And now that I think about it, it's somewhat more complex than just > expanding the expression. In .format(), this: > '{a[0]}{b[c]}' > is evaluated roughly as > format(a[0]) + format(b['c']) > Oooh, this is very unfortunate. I cannot support this. Treating b[c] as b['c'] in a "real" format string is one way, but treating it that way in an expression is just too weird. > So to be consistent with .format(), we have to fully parse at least the > indexing out to see if it looks like a constant integer or a string. > > So given that, I think we should just support what .format() allows, > since it's really not quite as simple as "evaluate the expression inside > the braces". > Alas. And this is probably why we don't already have this feature. > > Not sure what you mean by "implicit merging" -- if you mean literal > > concatenation (e.g. 'foo' "bar" == 'foobar') then I think it should be > > allowed, just like we support mixing quotes and r''. > > If I understand it, I think the concern is: > > f'{a}{b}' 'foo{}' f'{c}{d}' > > would need to become: > f'{a}{b}foo{{}}{c}{d}' > > So you have to escape the braces in non-f-strings when merging strings > and any of them are f-strings, and make the result an f-string. But I > think that's the only complication. > That's possible; another possibility would be to just have multiple .format() calls (one per f'...') and use the + operator to concatenate the pieces. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From Steve.Dower at microsoft.com Tue Jul 21 05:39:33 2015 From: Steve.Dower at microsoft.com (Steve Dower) Date: Tue, 21 Jul 2015 03:39:33 +0000 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AD516C.3050603@trueblade.com> <55AD57F3.3030205@trueblade.com> , Message-ID: That's almost certainly something that can be improved though, and maybe it's worth some investment from you. Remember, Python doesn't get better by magic - it gets better because someone gets annoyed enough about it that they volunteer to fix it (at least, that's how I ended up getting so involved :) ). My wild guess is that calling int.__format__ is the slow part, though I'd have hoped that it wouldn't be any slower for default formatting... guess not. We've got sprints coming up at PyData next week, so maybe I'll try and encourage someone to take a look and see what can be improved here. Cheers, Steve Top-posted from my Windows Phone ________________________________ From: Alexander Belopolsky Sent: ?7/?20/?2015 20:28 To: Steve Dower Cc: Eric V. Smith; python-ideas Subject: Re: [Python-ideas] Briefer string format On Mon, Jul 20, 2015 at 11:16 PM, Steve Dower > wrote: Making f"" strings subtly faster isn't going to solve your performance issue, and while I'm not advocating wastefulness, this looks like a premature optimization, especially when put alongside the guaranteed heap allocations and very likely IO that are also going to occur. One thing I know for a fact is that the use of % formatting instead of .format makes a significant difference in my applications. This is not surprising given these timings: $ python3 -mtimeit "'%d' % 2" 100000000 loops, best of 3: 0.00966 usec per loop $ python3 -mtimeit "'{}'.format(2)" 1000000 loops, best of 3: 0.216 usec per loop As a result, my rule of thumb is to avoid the use of .format in anything remotely performance critical. If f"" syntax is implemented as a sugar for .format - it will be equally useless for most of my needs. However, I think it can be implemented in a way that will make me consider switching away from % formatting. -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Tue Jul 21 13:58:08 2015 From: eric at trueblade.com (Eric V. Smith) Date: Tue, 21 Jul 2015 07:58:08 -0400 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AD516C.3050603@trueblade.com> Message-ID: <55AE33D0.6020701@trueblade.com> On 7/21/2015 2:05 AM, Guido van Rossum wrote: > And now that I think about it, it's somewhat more complex than just > expanding the expression. In .format(), this: > '{a[0]}{b[c]}' > is evaluated roughly as > format(a[0]) + format(b['c']) > > > Oooh, this is very unfortunate. I cannot support this. Treating b[c] as > b['c'] in a "real" format string is one way, but treating it that way in > an expression is just too weird. I think you're right here, and my other emails were trying too much to simplify the implementation and keep the parallels with str.format(). The difference between str.format() and f-strings is that in str.format() you can have an arbitrarily complex expression as the passed in argument to .format(). With f-strings, you'd be limited to just what can be extracted from the string itself: there are no arguments to be passed in. So maybe we do want to allow arbitrary expressions inside the f-string. For example: '{a.foo}'.format(a=b[c]) If we limit f-strings to just what str.format() string expressions can represent, it would be impossible to represent this with an f-string, without an intermediate assignment. But if we allowed arbitrary expressions inside an f-string, then we'd have: f'{b[c].foo}' and similarly: '{a.foo}'.format(a=b['c']) would become: f'{b["c"].foo}' But now we'd be breaking compatibility with str.format(). Maybe it's worth it, though. I can see 80% of the uses of str.format() being replaced by f-strings. The remainder would be cases where format strings are passed in to other functions. I do this a lot with custom logging [1]. The implementation complexity goes up by allowing arbitrary expressions. Not that that is necessarily a reason to drive a design decision. For example: f'{a[2:3]:20d}' We need to extract the expression "a[2:3]" and the format spec "20d". I can't just scan for a colon any more, I've got to actually parse the expression until I find a "}", ":", or "!" that's not part of the expression so that I know where it ends. But since it's happening at compile time, I surely have all of the tools at my disposal. I'll have to look through the grammar to see what the complexities here are and where this would fit in. > So given that, I think we should just support what .format() allows, > since it's really not quite as simple as "evaluate the expression inside > the braces". > > Alas. And this is probably why we don't already have this feature. Agreed. So I think it's either "don't be compatible with str.format expressions" or "abandon the proposed f-strings". > > Not sure what you mean by "implicit merging" -- if you mean literal > > concatenation (e.g. 'foo' "bar" == 'foobar') then I think it should be > > allowed, just like we support mixing quotes and r''. > > If I understand it, I think the concern is: > > f'{a}{b}' 'foo{}' f'{c}{d}' > > would need to become: > f'{a}{b}foo{{}}{c}{d}' > > So you have to escape the braces in non-f-strings when merging strings > and any of them are f-strings, and make the result an f-string. But I > think that's the only complication. > > > That's possible; another possibility would be to just have multiple > .format() calls (one per f'...') and use the + operator to concatenate > the pieces. Right. I think the application would actually use _PyUnicodeWriter to build the string up, but it would logically be equivalent to: 'foo ' f'b:{b["c"].foo:20d} is {on_off}' ' bar' becoming: 'foo' + 'b:' + format(b["c"].foo, '20d') + ' is ' + format(on_off) + ' bar' At this point, the implementation wouldn't call str.format() because it's not being used to evaluate the expression. It would just call format() directly. And since it's doing that without having to look up .format on the string, we'd get some performance back that str.format() currently suffers from. Nothing is really lost by not merging the adjacent strings, since the f-strings by definition are replaced by function calls. Maybe the optimizer could figure out that 'foo ' + 'b:' could be merged in to 'foo b:'. Or maybe the user should refactor the strings if it's that important. I'm out of the office all day and won't be able to respond to any follow ups until later. But that's good, since I'll be forced to think before typing! Eric. [1] Which makes me think of the crazy idea of passing in unevaluated f-strings in to another function to be evaluated in their context. But the code injection opportunities with doing this with arbitrary user-specified strings are just too scary to think about. At least with str.format() you're limited in to what the expressions can do. Basically indexing and attribute access. No function calls: '{.exit()}'.format(sys) ! From guido at python.org Tue Jul 21 15:03:43 2015 From: guido at python.org (Guido van Rossum) Date: Tue, 21 Jul 2015 15:03:43 +0200 Subject: [Python-ideas] Briefer string format In-Reply-To: <55AE33D0.6020701@trueblade.com> References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AD516C.3050603@trueblade.com> <55AE33D0.6020701@trueblade.com> Message-ID: Thanks, Eric! You're addressing all my concerns and you're going exactly where I wanted this to go. I hope that you will find the time to write up a PEP; take your time. Regarding your [1], let's not consider unevaluated f-strings as a feature; that use case is sufficiently covered by the existing str.format(). On Tue, Jul 21, 2015 at 1:58 PM, Eric V. Smith wrote: > On 7/21/2015 2:05 AM, Guido van Rossum wrote: > > And now that I think about it, it's somewhat more complex than just > > expanding the expression. In .format(), this: > > '{a[0]}{b[c]}' > > is evaluated roughly as > > format(a[0]) + format(b['c']) > > > > > > Oooh, this is very unfortunate. I cannot support this. Treating b[c] as > > b['c'] in a "real" format string is one way, but treating it that way in > > an expression is just too weird. > > I think you're right here, and my other emails were trying too much to > simplify the implementation and keep the parallels with str.format(). > The difference between str.format() and f-strings is that in > str.format() you can have an arbitrarily complex expression as the > passed in argument to .format(). With f-strings, you'd be limited to > just what can be extracted from the string itself: there are no > arguments to be passed in. So maybe we do want to allow arbitrary > expressions inside the f-string. > > For example: > > '{a.foo}'.format(a=b[c]) > > If we limit f-strings to just what str.format() string expressions can > represent, it would be impossible to represent this with an f-string, > without an intermediate assignment. > > But if we allowed arbitrary expressions inside an f-string, then we'd have: > f'{b[c].foo}' > > and similarly: > '{a.foo}'.format(a=b['c']) > would become: > f'{b["c"].foo}' > > But now we'd be breaking compatibility with str.format(). Maybe it's > worth it, though. I can see 80% of the uses of str.format() being > replaced by f-strings. The remainder would be cases where format strings > are passed in to other functions. I do this a lot with custom logging [1]. > > The implementation complexity goes up by allowing arbitrary expressions. > Not that that is necessarily a reason to drive a design decision. > > For example: > f'{a[2:3]:20d}' > > We need to extract the expression "a[2:3]" and the format spec "20d". I > can't just scan for a colon any more, I've got to actually parse the > expression until I find a "}", ":", or "!" that's not part of the > expression so that I know where it ends. But since it's happening at > compile time, I surely have all of the tools at my disposal. I'll have > to look through the grammar to see what the complexities here are and > where this would fit in. > > > So given that, I think we should just support what .format() allows, > > since it's really not quite as simple as "evaluate the expression > inside > > the braces". > > > > Alas. And this is probably why we don't already have this feature. > > Agreed. So I think it's either "don't be compatible with str.format > expressions" or "abandon the proposed f-strings". > > > > Not sure what you mean by "implicit merging" -- if you mean literal > > > concatenation (e.g. 'foo' "bar" == 'foobar') then I think it > should be > > > allowed, just like we support mixing quotes and r''. > > > > If I understand it, I think the concern is: > > > > f'{a}{b}' 'foo{}' f'{c}{d}' > > > > would need to become: > > f'{a}{b}foo{{}}{c}{d}' > > > > So you have to escape the braces in non-f-strings when merging > strings > > and any of them are f-strings, and make the result an f-string. But I > > think that's the only complication. > > > > > > That's possible; another possibility would be to just have multiple > > .format() calls (one per f'...') and use the + operator to concatenate > > the pieces. > > Right. I think the application would actually use _PyUnicodeWriter to > build the string up, but it would logically be equivalent to: > > 'foo ' f'b:{b["c"].foo:20d} is {on_off}' ' bar' > > becoming: > > 'foo' + 'b:' + format(b["c"].foo, '20d') + ' is ' + > format(on_off) + ' bar' > > At this point, the implementation wouldn't call str.format() because > it's not being used to evaluate the expression. It would just call > format() directly. And since it's doing that without having to look up > .format on the string, we'd get some performance back that str.format() > currently suffers from. > > Nothing is really lost by not merging the adjacent strings, since the > f-strings by definition are replaced by function calls. Maybe the > optimizer could figure out that 'foo ' + 'b:' could be merged in to 'foo > b:'. Or maybe the user should refactor the strings if it's that important. > > I'm out of the office all day and won't be able to respond to any follow > ups until later. But that's good, since I'll be forced to think before > typing! > > Eric. > > [1] Which makes me think of the crazy idea of passing in unevaluated > f-strings in to another function to be evaluated in their context. But > the code injection opportunities with doing this with arbitrary > user-specified strings are just too scary to think about. At least with > str.format() you're limited in to what the expressions can do. Basically > indexing and attribute access. No function calls: '{.exit()}'.format(sys) ! > > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Jul 21 15:05:45 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 21 Jul 2015 23:05:45 +1000 Subject: [Python-ideas] Briefer string format In-Reply-To: <55AE33D0.6020701@trueblade.com> References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AD516C.3050603@trueblade.com> <55AE33D0.6020701@trueblade.com> Message-ID: On 21 July 2015 at 21:58, Eric V. Smith wrote: > [1] Which makes me think of the crazy idea of passing in unevaluated > f-strings in to another function to be evaluated in their context. But > the code injection opportunities with doing this with arbitrary > user-specified strings are just too scary to think about. At least with > str.format() you're limited in to what the expressions can do. Basically > indexing and attribute access. No function calls: '{.exit()}'.format(sys) ! Yeah, this is why I think anything involving implicit interpolation needs to be transparent to the compiler: the security implications with anything other than literal format strings or some other explicitly compile time operation are far too "exciting" otherwise. I wonder though, if we went with the f-strings idea, could we make them support a *subset* of the "str.format" call syntax, rather than a superset? What if they supported name and attribute lookup syntax, but not positional or subscript lookup? They'd still be a great for formatting output in scripts and debugging messages, but more complex formatting cases would still involve reaching for str.format, str.format_map or exec("print(f'{this} is an odd way to do a {format_map} call')", namespace). Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From guido at python.org Tue Jul 21 15:50:51 2015 From: guido at python.org (Guido van Rossum) Date: Tue, 21 Jul 2015 15:50:51 +0200 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AD516C.3050603@trueblade.com> <55AE33D0.6020701@trueblade.com> Message-ID: On Tue, Jul 21, 2015 at 3:05 PM, Nick Coghlan wrote: > I wonder though, if we went with the f-strings idea, could we make > them support a *subset* of the "str.format" call syntax, rather than a > superset? What if they supported name and attribute lookup syntax, but > not positional or subscript lookup? > I don't know. Either way there's going to be complaints about the inconsistencies. :-( I wish we hadn't done the {a[x]} part of PEP 3101, but it's too late now. :-( > They'd still be a great for formatting output in scripts and debugging > messages, but more complex formatting cases would still involve > reaching for str.format, str.format_map or exec("print(f'{this} is an > odd way to do a {format_map} call')", namespace). > You lost me there (probably by trying to be too terse). -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Tue Jul 21 16:24:18 2015 From: eric at trueblade.com (Eric V. Smith) Date: Tue, 21 Jul 2015 10:24:18 -0400 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AD516C.3050603@trueblade.com> <55AE33D0.6020701@trueblade.com> Message-ID: <55AE5612.5040702@trueblade.com> On 7/21/2015 9:03 AM, Guido van Rossum wrote: > Thanks, Eric! You're addressing all my concerns and you're going exactly > where I wanted this to go. I hope that you will find the time to write > up a PEP; take your time. Regarding your [1], let's not consider > unevaluated f-strings as a feature; that use case is sufficiently > covered by the existing str.format(). Thanks, Guido. I'd already given some thought to a PEP. I'll work on it. I don't have a ton of free time, but I'd like to at least get the ideas presented so far written down. One thing I haven't completely thought through is nested expressions: f'{value:.{precision}f}' I guess this would just become: format(value, '.' + format(precision) + 'f') If I recall correctly, we only support recursive fields in the format specifier portion, and only one level deep. I'll need to keep that in mind. If this gets accepted, I'll have to speed up my own efforts to port my code to 3.x. Eric. From eric at trueblade.com Tue Jul 21 16:25:43 2015 From: eric at trueblade.com (Eric V. Smith) Date: Tue, 21 Jul 2015 10:25:43 -0400 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AD516C.3050603@trueblade.com> <55AE33D0.6020701@trueblade.com> Message-ID: <55AE5667.8010201@trueblade.com> On 7/21/2015 9:50 AM, Guido van Rossum wrote: > On Tue, Jul 21, 2015 at 3:05 PM, Nick Coghlan > wrote: > > I wonder though, if we went with the f-strings idea, could we make > them support a *subset* of the "str.format" call syntax, rather than a > superset? What if they supported name and attribute lookup syntax, but > not positional or subscript lookup? > > > I don't know. Either way there's going to be complaints about the > inconsistencies. :-( I wish we hadn't done the {a[x]} part of PEP 3101, > but it's too late now. :-( Maybe we should deprecate it. I've never used it, and I don't think I've ever seen it in the wild. Eric. From ericsnowcurrently at gmail.com Tue Jul 21 16:50:30 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 21 Jul 2015 08:50:30 -0600 Subject: [Python-ideas] Briefer string format In-Reply-To: <55AE5667.8010201@trueblade.com> References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AD516C.3050603@trueblade.com> <55AE33D0.6020701@trueblade.com> <55AE5667.8010201@trueblade.com> Message-ID: On Tue, Jul 21, 2015 at 8:25 AM, Eric V. Smith wrote: > On 7/21/2015 9:50 AM, Guido van Rossum wrote: >> I don't know. Either way there's going to be complaints about the >> inconsistencies. :-( I wish we hadn't done the {a[x]} part of PEP 3101, >> but it's too late now. :-( > > Maybe we should deprecate it. I've never used it, and I don't think I've > ever seen it in the wild. FWIW, I've used it a few times in production code. However, it wasn't strictly necessary nor was the value added substantial. -eric From oscar.j.benjamin at gmail.com Tue Jul 21 18:50:56 2015 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Tue, 21 Jul 2015 16:50:56 +0000 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AD516C.3050603@trueblade.com> <55AE33D0.6020701@trueblade.com> Message-ID: On Tue, 21 Jul 2015 at 14:14 Nick Coghlan wrote: > I wonder though, if we went with the f-strings idea, could we make > them support a *subset* of the "str.format" call syntax, rather than a > superset? What if they supported name and attribute lookup syntax, but not > positional or subscript lookup? > Please don't do either. Python already has a surplus of string formatting mini-languages. Making a new one that is similar but not the same as one of the others is a recipe for confusion as well as an additional learning burden for new users of the language. -- Oscar -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Jul 21 19:58:36 2015 From: guido at python.org (Guido van Rossum) Date: Tue, 21 Jul 2015 19:58:36 +0200 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AD516C.3050603@trueblade.com> <55AE33D0.6020701@trueblade.com> Message-ID: On Tue, Jul 21, 2015 at 6:50 PM, Oscar Benjamin wrote: > On Tue, 21 Jul 2015 at 14:14 Nick Coghlan wrote: > >> I wonder though, if we went with the f-strings idea, could we make >> them support a *subset* of the "str.format" call syntax, rather than a >> superset? What if they supported name and attribute lookup syntax, but >> not positional or subscript lookup? >> > > Please don't do either. Python already has a surplus of string formatting > mini-languages. Making a new one that is similar but not the same as one of > the others is a recipe for confusion as well as an additional learning > burden for new users of the language. > I'm not sure if you meant it this way, but if we really believed that, the only way to avoid confusion would be not to introduce f'' strings at all. (Which, BTW is a valid outcome of this discussion -- even if a PEP is written it may end up being rejected.) Personally I think that the different languages are no big deal, since realistically the far majority of use cases will use simple variables (e.g. foo) or single attributes (e.g. foo.bar). Until this discussion I had totally forgotten several of the quirks of PEP 3101, including: a[c] meaning a['c'] elsewhere; the ^ format character and the related fill/align feature; nested substitutions; the top-level format() function. Also, I can never remember how to use !r. I actually find quite unfortunate that the formatting mini-language gives a[c] the meaning of a['c'] elsewhere, since it means that the formatting mini-language to reference variables is neither a subset nor a superset of the standard expression syntax. We have a variety of other places in the syntax where a slightly different syntax is supported (e.g. it's quite subtle how commas are parsed, and decorators allow a strict subset of expressions) but the formatting mini-language is AFAIR the only one that gives a form that is allowed elsewhere a different meaning. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From python-ideas at mgmiller.net Wed Jul 22 01:04:45 2015 From: python-ideas at mgmiller.net (Mike Miller) Date: Tue, 21 Jul 2015 16:04:45 -0700 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AD516C.3050603@trueblade.com> <55AE33D0.6020701@trueblade.com> Message-ID: <55AED00D.7060904@mgmiller.net> My apologies, as I've read through this thread again and haven't found the reason the approach last mentioned by Eric S. was abandoned: f'{a[name]}' ==> '{[name]}'.format(a) This seemed to solve things neatly, letting .format() handle the messy details it already handles. Also it's easy to lift the identifier names out using their rules. Simple implementation, easy to understand. Then conversation switched to this alternative: f'{a[name]}' ==> '{}'.format(a['name']) Which has the drawback that some of the complexity of the mini-language will need to be reimplemented. Second, there is an inconsistency in quoting of string dictionary keys. That's unfortunate, but the way format currently works. Since f'' will be implemented on top, is not the quoting issue orthogonal to it? If the unquoted str dict key is indeed unacceptable I submit it should be deprecated (or not) separately, but not affect the decision on f''. Again though, I feel like I'm missing an important nugget of information. -Mike On 07/21/2015 10:58 AM, Guido van Rossum wrote: > On Tue, Jul 21, 2015 at 6:50 PM, Oscar Benjamin > wrote: > > On Tue, 21 Jul 2015 at 14:14 Nick Coghlan > wrote: > > I wonder though, if we went with the f-strings idea, could we make > them support a *subset* of the "str.format" call syntax, rather than a > superset? What if they supported name and attribute lookup syntax, but > not positional or subscript lookup? > > > Please don't do either. Python already has a surplus of string formatting > mini-languages. Making a new one that is similar but not the same as one of > the others is a recipe for confusion as well as an additional learning > burden for new users of the language. > > > I'm not sure if you meant it this way, but if we really believed that, the only > way to avoid confusion would be not to introduce f'' strings at all. (Which, BTW > is a valid outcome of this discussion -- even if a PEP is written it may end up > being rejected.) > > Personally I think that the different languages are no big deal, since > realistically the far majority of use cases will use simple variables (e.g. foo) > or single attributes (e.g. foo.bar). > > Until this discussion I had totally forgotten several of the quirks of PEP 3101, > including: a[c] meaning a['c'] elsewhere; the ^ format character and the related > fill/align feature; nested substitutions; the top-level format() function. Also, > I can never remember how to use !r. > > I actually find quite unfortunate that the formatting mini-language gives a[c] > the meaning of a['c'] elsewhere, since it means that the formatting > mini-language to reference variables is neither a subset nor a superset of the > standard expression syntax. We have a variety of other places in the syntax > where a slightly different syntax is supported (e.g. it's quite subtle how > commas are parsed, and decorators allow a strict subset of expressions) but the > formatting mini-language is AFAIR the only one that gives a form that is allowed > elsewhere a different meaning. > > -- > --Guido van Rossum (python.org/~guido ) > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > From stephen at xemacs.org Wed Jul 22 02:11:10 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 22 Jul 2015 09:11:10 +0900 Subject: [Python-ideas] Briefer string format In-Reply-To: <55AED00D.7060904@mgmiller.net> References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AD516C.3050603@trueblade.com> <55AE33D0.6020701@trueblade.com> <55AED00D.7060904@mgmiller.net> Message-ID: <87si8hdpb5.fsf@uwakimon.sk.tsukuba.ac.jp> Mike Miller writes: > My apologies, as I've read through this thread again and haven't found the > reason the approach last mentioned by Eric S. was abandoned: > > f'{a[name]}' ==> '{[name]}'.format(a) > > This seemed to solve things neatly, letting .format() handle the > messy details it already handles. That's what I thought, too, but according to https://mail.python.org/pipermail/python-ideas/2015-July/034728.html, that's not true. The problem is that .format accepts *arbitrary* expressions as arguments, eg "{a.attr}".format(a=f()), which can't be expressed as an f-string within the limits of current .format specs. Finally Eric concludes that you end up with a situation where format would need to be called directly, and str.format isn't involved at all I haven't studied the argument that leads there, but that's the context you're looking for, I believe. Python-Ideas meta: The simple implementation is surely still on the table, although I get the feeling Guido is unhappy with the restrictions implied. However, it is unlikely to be discussed again here precisely because those who understand the implementation of str.format well already understand the implications of this implementation very well -- further discussion is unnecessary. In fact, Guido asking for a PEP may put a "paragraph break" into this discussion at this point -- we have several proposed implementations with various amounts of flexibility, and the proponents understand them even if I, and perhaps you, don't. What's left is the grunt work of thinking out the corner cases and creating one or more proof-of- concept implementations, then writing the PEP. Steve From rosuav at gmail.com Wed Jul 22 02:15:21 2015 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 22 Jul 2015 10:15:21 +1000 Subject: [Python-ideas] Briefer string format In-Reply-To: <55AED00D.7060904@mgmiller.net> References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AD516C.3050603@trueblade.com> <55AE33D0.6020701@trueblade.com> <55AED00D.7060904@mgmiller.net> Message-ID: On Wed, Jul 22, 2015 at 9:04 AM, Mike Miller wrote: > Second, there is an inconsistency in quoting of string dictionary keys. > That's unfortunate, but the way format currently works. Since f'' will be > implemented on top, is not the quoting issue orthogonal to it? So I guess the question is: Does f"..." have to be implemented on top of str.format, or should it be implemented separately on top of object.__format__? The two could be virtually indistinguishable anyway. Something like this: loc = "world" print(f"Hello, {loc}!") # becomes loc = "world" print("Hello, "+loc.__format__("")+"!") # maybe with the repeated concat optimized to a join With that, there's no particular reason for the specifics of .format() key lookup to be retained. Want full expression syntax? Should be easy - it's just a matter of getting the nesting right (otherwise it's a SyntaxError, same as (1,2,[3,4) would be). Yes, it'll be a bit harder for simplistic parsers to work with, but basically, this is no longer a string literal - it's a compact syntax for string formatting and concatenation, which is something I can definitely get behind. REXX allowed abuttal for concatenation, so you could write something like this: msg = "Hello, "loc"!" Replace those interior quotes with braces, and you have an f"..." string. It's not a string, it's an expression, and it can look up names in its enclosing scope. Describe it alongside list comprehensions, lambda expressions, and so on, and it fits in fairly nicely. No longer -0.5 on this. ChrisA From rosuav at gmail.com Wed Jul 22 02:30:02 2015 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 22 Jul 2015 10:30:02 +1000 Subject: [Python-ideas] Briefer string format In-Reply-To: <7D2EB083-7137-4CE0-BCA1-5DE07EDF8D64@gmail.com> References: <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AD516C.3050603@trueblade.com> <55AE33D0.6020701@trueblade.com> <55AED00D.7060904@mgmiller.net> <7D2EB083-7137-4CE0-BCA1-5DE07EDF8D64@gmail.com> Message-ID: On Wed, Jul 22, 2015 at 10:28 AM, Ryan Gonzalez wrote: > Pretty sure I'm going to be odd one out, here... > > I don't like most of Ruby, but, after using Crystal and CoffeeScript, I have > fallen in love with #{}. It gives the appearance of a real expression, not > just a format placeholder. Like: > > f'a#{name}b' That's what I'm talking about. It's not a placeholder. It's not a duplicated name that references a keyword argument at the end of the expression. It's an actual expression. ChrisA From rymg19 at gmail.com Wed Jul 22 02:28:51 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Tue, 21 Jul 2015 19:28:51 -0500 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AD516C.3050603@trueblade.com> <55AE33D0.6020701@trueblade.com> <55AED00D.7060904@mgmiller.net> Message-ID: <7D2EB083-7137-4CE0-BCA1-5DE07EDF8D64@gmail.com> Pretty sure I'm going to be odd one out, here... I don't like most of Ruby, but, after using Crystal and CoffeeScript, I have fallen in love with #{}. It gives the appearance of a real expression, not just a format placeholder. Like: f'a#{name}b' On July 21, 2015 7:15:21 PM CDT, Chris Angelico wrote: >On Wed, Jul 22, 2015 at 9:04 AM, Mike Miller > wrote: >> Second, there is an inconsistency in quoting of string dictionary >keys. >> That's unfortunate, but the way format currently works. Since f'' >will be >> implemented on top, is not the quoting issue orthogonal to it? > >So I guess the question is: Does f"..." have to be implemented on top >of str.format, or should it be implemented separately on top of >object.__format__? The two could be virtually indistinguishable >anyway. Something like this: > >loc = "world" >print(f"Hello, {loc}!") ># becomes >loc = "world" >print("Hello, "+loc.__format__("")+"!") ># maybe with the repeated concat optimized to a join > >With that, there's no particular reason for the specifics of .format() >key lookup to be retained. Want full expression syntax? Should be easy >- it's just a matter of getting the nesting right (otherwise it's a >SyntaxError, same as (1,2,[3,4) would be). Yes, it'll be a bit harder >for simplistic parsers to work with, but basically, this is no longer >a string literal - it's a compact syntax for string formatting and >concatenation, which is something I can definitely get behind. > >REXX allowed abuttal for concatenation, so you could write something >like this: > >msg = "Hello, "loc"!" > >Replace those interior quotes with braces, and you have an f"..." >string. It's not a string, it's an expression, and it can look up >names in its enclosing scope. Describe it alongside list >comprehensions, lambda expressions, and so on, and it fits in fairly >nicely. > >No longer -0.5 on this. > >ChrisA >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >https://mail.python.org/mailman/listinfo/python-ideas >Code of Conduct: http://python.org/psf/codeofconduct/ -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Wed Jul 22 02:44:32 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Tue, 21 Jul 2015 19:44:32 -0500 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AD516C.3050603@trueblade.com> <55AE33D0.6020701@trueblade.com> <55AED00D.7060904@mgmiller.net> <7D2EB083-7137-4CE0-BCA1-5DE07EDF8D64@gmail.com> Message-ID: On July 21, 2015 7:30:02 PM CDT, Chris Angelico wrote: >On Wed, Jul 22, 2015 at 10:28 AM, Ryan Gonzalez >wrote: >> Pretty sure I'm going to be odd one out, here... >> >> I don't like most of Ruby, but, after using Crystal and CoffeeScript, >I have >> fallen in love with #{}. It gives the appearance of a real >expression, not >> just a format placeholder. Like: >> >> f'a#{name}b' > >That's what I'm talking about. It's not a placeholder. It's not a >duplicated name that references a keyword argument at the end of the >expression. It's an actual expression. > I'm referring to the syntax, though. It makes it visually distinct from normal format string placeholders. Also: plain format strings are a *pain* to escape in code generators, e.g.: print('int main() {{ return 0+{name}; }}'.format(name)) Stupid double brackets. That is why I still use % formatting. #{} isn't a common expression to ever use. I have never actually printed a string that contains #{} (except when using interpolation in CoffeeScript/Crystal). >ChrisA >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >https://mail.python.org/mailman/listinfo/python-ideas >Code of Conduct: http://python.org/psf/codeofconduct/ -- Sent from my Android device with K-9 Mail. Please excuse my brevity. Currently listening to: Deep Drive by Yoko Shimomura (KH 2.5) From eric at trueblade.com Wed Jul 22 20:52:30 2015 From: eric at trueblade.com (Eric V. Smith) Date: Wed, 22 Jul 2015 14:52:30 -0400 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> Message-ID: <55AFE66E.9070007@trueblade.com> On 07/20/2015 03:22 PM, Guido van Rossum wrote: > Not sure what you mean by "implicit merging" -- if you mean literal > concatenation (e.g. 'foo' "bar" == 'foobar') then I think it should be > allowed, just like we support mixing quotes and r''. Do we really want to support this? It complicates the implementation, and I'm not sure of the value. f'{foo}' 'bar' f'{baz}' becomes something like: format(foo) + 'bar' + format(baz) You're not merging similar things, like you are with normal string concatenation. And merging f-strings: f'{foo}' f'{bar'} similarly just becomes concatenating the results of some function calls. I guess it depends if you think of an f-string as a string, or an expression (like the function calls it will become). I don't have a real strong preference, but I'd like to get it ironed out logically before doing a trial implementation. Eric. From mistersheik at gmail.com Wed Jul 22 10:03:37 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Wed, 22 Jul 2015 01:03:37 -0700 (PDT) Subject: [Python-ideas] Secure unpickle Message-ID: I've heard it said that pickle is a security hole, and so it's better to write your own serialization routine. That's unfortunate because pickle has so many advantages such as automatically tying into copy/deepcopy. Would it be possible to make unpickle secure, e.g., by having the caller create a context in which all calls to unpickle are limited to unpickling a specific set of types? (When these types unpickle their sub-objects, they could potentially limit the set of types further.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Wed Jul 22 22:21:54 2015 From: python at mrabarnett.plus.com (MRAB) Date: Wed, 22 Jul 2015 21:21:54 +0100 Subject: [Python-ideas] Briefer string format In-Reply-To: <55AFE66E.9070007@trueblade.com> References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AFE66E.9070007@trueblade.com> Message-ID: <55AFFB62.7040308@mrabarnett.plus.com> On 2015-07-22 19:52, Eric V. Smith wrote: > On 07/20/2015 03:22 PM, Guido van Rossum wrote: >> Not sure what you mean by "implicit merging" -- if you mean literal >> concatenation (e.g. 'foo' "bar" == 'foobar') then I think it should be >> allowed, just like we support mixing quotes and r''. > > Do we really want to support this? It complicates the implementation, > and I'm not sure of the value. > > f'{foo}' 'bar' f'{baz}' > becomes something like: > format(foo) + 'bar' + format(baz) > > You're not merging similar things, like you are with normal string > concatenation. > > And merging f-strings: > f'{foo}' f'{bar'} > similarly just becomes concatenating the results of some function calls. > > I guess it depends if you think of an f-string as a string, or an > expression (like the function calls it will become). I don't have a real > strong preference, but I'd like to get it ironed out logically before > doing a trial implementation. > As Guido said, we can merge raw string literals. It would be a gotcha if: r'{foo}' 'bar' worked but: f'{foo}' 'bar' didn't. You'd then have to how they're different even though they look a lot alike. From rymg19 at gmail.com Wed Jul 22 22:27:06 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Wed, 22 Jul 2015 15:27:06 -0500 Subject: [Python-ideas] Fwd: Re: Secure unpickle In-Reply-To: References: Message-ID: <53B91A7F-32F3-4267-8657-92CCD91042B4@gmail.com> A further idea: hashes. Each Pickle database (or whatever it's called) would contain a hash made up of: a) The types used to pickle the data. b) The hash of the data itself, prefixed with 2 bytes that have some sort of hard-to-get meaning (the length of the call stack?). c) The seconds since epoch, or another 64-bit value. The three values would likely be merged via bitwise or. This has the advantage that there are three different elements making up the hash, some of which are harder to locate. Unless two of the values are known, the third can't be. The types would be extracted from the hash via some kind of magic, and then it would validate the data in the database based on the types, like Neil said. If someone wanted to change the types, they would need to regenerate the whole hash. Further security could be obtained by prefixing the first value with another special byte sequence that, although easier to find, would be used for validation purposes. Point 2's prefixing bytes and point 3's value would be especially trickier to find, since a few seconds may pass before the data is written to disk. It's still a bit insecure, but much better than the current situation. I think. On Wed, Jul 22, 2015 at 3:03 AM, Neil Girdhar wrote: > I've heard it said that pickle is a security hole, and so it's better to > write your own serialization routine. That's unfortunate because pickle > has so many advantages such as automatically tying into copy/deepcopy. > Would it be possible to make unpickle secure, e.g., by having the caller > create a context in which all calls to unpickle are limited to unpickling a > specific set of types? (When these types unpickle their sub-objects, they > could potentially limit the set of types further.) > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Ryan [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something?s wrong. http://kirbyfan64.github.io/ Currently listening to: Death Egg Boss theme (Sonic Generations) -- Sent from my Android device with K-9 Mail. Please excuse my brevity. From abarnert at yahoo.com Wed Jul 22 22:54:31 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 22 Jul 2015 13:54:31 -0700 Subject: [Python-ideas] Fwd: Re: Secure unpickle In-Reply-To: <53B91A7F-32F3-4267-8657-92CCD91042B4@gmail.com> References: <53B91A7F-32F3-4267-8657-92CCD91042B4@gmail.com> Message-ID: <2731AF91-DC93-4798-AF03-D990B4C06367@yahoo.com> On Jul 22, 2015, at 13:27, Ryan Gonzalez wrote: > > A further idea: hashes. > > Each Pickle database (or whatever it's called) would contain a hash made up > of: > > a) The types used to pickle the data. > b) The hash of the data itself, prefixed with 2 bytes that have some sort > of hard-to-get meaning (the length of the call stack?). > c) The seconds since epoch, or another 64-bit value. A type pickled and unpickled in a different interpreter instance isn't necessarily going to have the same hash value. And if you don't mean a Python hash, how do you hash an arbitrary class object? Or, if you mean just the name, how does that secure anything? For that matter, it's often important for an updated version of the code to be able to load pickles created with yesterday's version. This is easy to do with the pickle protocol, but hashing would presumably break that (unless it didn't protect anything at all). > The three values would likely be merged via bitwise or. Why would you merge three hash values with bitwise or instead of one of the usual hash combining mechanisms? This just throws away most of your entropy. > This has the advantage that there are three different elements making up > the hash, some of which are harder to locate. Unless two of the values are > known, the third can't be. > > The types would be extracted from the hash via some kind of magic, That really _would_ be magic. The whole point of a hash is that it's one-way. If the hashed values can be recovered from it, it's not a hash. Also, "harder to locate" is useless, unless you plan to continually update your code as attackers locate the things you've hidden. (And, for something used in as many high-profile uses as Python's pickler, any security by obscurity would be attacked very frequently.) > and then > it would validate the data in the database based on the types, like Neil > said. > > If someone wanted to change the types, they would need to regenerate the > whole hash. And... So what? Unless the checker has some secure way of knowing which timestamp, etc. to use in checking the hash, all you have to do is give it the timestamp, etc. that go along with your regenerated hash, and it will pass. > Further security could be obtained by prefixing the first value > with another special byte sequence that, although easier to find, would be > used for validation purposes. > > Point 2's prefixing bytes and point 3's value would be especially trickier > to find, since a few seconds may pass before the data is written to disk. > > It's still a bit insecure, but much better than the current situation. I > think. I think it's much worse than the current situation, because it adds illusory security while still being effectively just as crackable. > > >> On Wed, Jul 22, 2015 at 3:03 AM, Neil Girdhar wrote: >> >> I've heard it said that pickle is a security hole, and so it's better to >> write your own serialization routine. That's unfortunate because pickle >> has so many advantages such as automatically tying into copy/deepcopy. >> Would it be possible to make unpickle secure, e.g., by having the caller >> create a context in which all calls to unpickle are limited to unpickling a >> specific set of types? (When these types unpickle their sub-objects, they >> could potentially limit the set of types further.) >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > > > > -- > Ryan > [ERROR]: Your autotools build scripts are 200 lines longer than your > program. Something?s wrong. > http://kirbyfan64.github.io/ > Currently listening to: Death Egg Boss theme (Sonic Generations) > -- > Sent from my Android device with K-9 Mail. Please excuse my brevity. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From rymg19 at gmail.com Wed Jul 22 22:58:47 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Wed, 22 Jul 2015 15:58:47 -0500 Subject: [Python-ideas] Fwd: Re: Secure unpickle In-Reply-To: <2731AF91-DC93-4798-AF03-D990B4C06367@yahoo.com> References: <53B91A7F-32F3-4267-8657-92CCD91042B4@gmail.com> <2731AF91-DC93-4798-AF03-D990B4C06367@yahoo.com> Message-ID: <9D5C749C-AE91-4CF5-A111-BFFA046E2B6A@gmail.com> Disclaimer: I know virtually *nothing* about cryptography, so this is probably worse than it seems. On July 22, 2015 3:54:31 PM CDT, Andrew Barnert wrote: >On Jul 22, 2015, at 13:27, Ryan Gonzalez wrote: >> >> A further idea: hashes. >> >> Each Pickle database (or whatever it's called) would contain a hash >made up >> of: >> >> a) The types used to pickle the data. >> b) The hash of the data itself, prefixed with 2 bytes that have some >sort >> of hard-to-get meaning (the length of the call stack?). >> c) The seconds since epoch, or another 64-bit value. > >A type pickled and unpickled in a different interpreter instance isn't >necessarily going to have the same hash value. And if you don't mean a >Python hash, how do you hash an arbitrary class object? Or, if you mean >just the name, how does that secure anything? > >For that matter, it's often important for an updated version of the >code to be able to load pickles created with yesterday's version. This >is easy to do with the pickle protocol, but hashing would presumably >break that (unless it didn't protect anything at all). > >> The three values would likely be merged via bitwise or. > >Why would you merge three hash values with bitwise or instead of one of >the usual hash combining mechanisms? This just throws away most of your >entropy. Uhhhh...I have no clue. It just came off the top of my head. > >> This has the advantage that there are three different elements making >up >> the hash, some of which are harder to locate. Unless two of the >values are >> known, the third can't be. >> >> The types would be extracted from the hash via some kind of magic, > >That really _would_ be magic. The whole point of a hash is that it's >one-way. If the hashed values can be recovered from it, it's not a >hash. Well, I again know nothing about cryptography, so I guess "key" is a better phrase. :O > >Also, "harder to locate" is useless, unless you plan to continually >update your code as attackers locate the things you've hidden. (And, >for something used in as many high-profile uses as Python's pickler, >any security by obscurity would be attacked very frequently.) > >> and then >> it would validate the data in the database based on the types, like >Neil >> said. >> >> If someone wanted to change the types, they would need to regenerate >the >> whole hash. > >And... So what? Unless the checker has some secure way of knowing which >timestamp, etc. to use in checking the hash, all you have to do is give >it the timestamp, etc. that go along with your regenerated hash, and it >will pass. > >> Further security could be obtained by prefixing the first value >> with another special byte sequence that, although easier to find, >would be >> used for validation purposes. >> >> Point 2's prefixing bytes and point 3's value would be especially >trickier >> to find, since a few seconds may pass before the data is written to >disk. >> >> It's still a bit insecure, but much better than the current >situation. I >> think. > >I think it's much worse than the current situation, because it adds >illusory security while still being effectively just as crackable. > >> >> >>> On Wed, Jul 22, 2015 at 3:03 AM, Neil Girdhar > wrote: >>> >>> I've heard it said that pickle is a security hole, and so it's >better to >>> write your own serialization routine. That's unfortunate because >pickle >>> has so many advantages such as automatically tying into >copy/deepcopy. >>> Would it be possible to make unpickle secure, e.g., by having the >caller >>> create a context in which all calls to unpickle are limited to >unpickling a >>> specific set of types? (When these types unpickle their >sub-objects, they >>> could potentially limit the set of types further.) >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> >> >> -- >> Ryan >> [ERROR]: Your autotools build scripts are 200 lines longer than your >> program. Something?s wrong. >> http://kirbyfan64.github.io/ >> Currently listening to: Death Egg Boss theme (Sonic Generations) >> -- >> Sent from my Android device with K-9 Mail. Please excuse my brevity. >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ -- Sent from my Android device with K-9 Mail. Please excuse my brevity. From emile at fenx.com Wed Jul 22 23:51:53 2015 From: emile at fenx.com (Emile van Sebille) Date: Wed, 22 Jul 2015 14:51:53 -0700 Subject: [Python-ideas] Secure unpickle In-Reply-To: References: Message-ID: On 7/22/2015 1:03 AM, Neil Girdhar wrote: > I've heard it said that pickle is a security hole, Yes -- from the security section of the pickle docs: However, for unpickling, it is never a good idea to unpickle an untrusted string whose origins are dubious > and so it's better to > write your own serialization routine. Or unpickle only trusted strings. > That's unfortunate because pickle > has so many advantages such as automatically tying into copy/deepcopy. > Would it be possible to make unpickle secure, e.g., by having the > caller create a context in which all calls to unpickle are limited to > unpickling a specific set of types? (When these types unpickle their > sub-objects, they could potentially limit the set of types further.) Do-you-know-where-your-pickles-been-lately-ly yr's, Emile From abarnert at yahoo.com Thu Jul 23 00:17:01 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 22 Jul 2015 15:17:01 -0700 Subject: [Python-ideas] Fwd: Re: Secure unpickle In-Reply-To: <9D5C749C-AE91-4CF5-A111-BFFA046E2B6A@gmail.com> References: <53B91A7F-32F3-4267-8657-92CCD91042B4@gmail.com> <2731AF91-DC93-4798-AF03-D990B4C06367@yahoo.com> <9D5C749C-AE91-4CF5-A111-BFFA046E2B6A@gmail.com> Message-ID: <8C8499DB-BC90-497A-9BDC-7135CE74ACA9@yahoo.com> > On Jul 22, 2015, at 13:58, Ryan Gonzalez wrote: > > Disclaimer: I know virtually *nothing* about cryptography, so this is probably worse than it seems. It's always better to look for an existing cryptosystem than to try to invent a new one. Briefly, I think what you're looking for here is a way to sign pickles and verify their signatures, which is a well-known problem. If you have some secure way to store keys (e.g., the only code that ever touches the pickles runs on your backend servers), everything is easy; just use, say, OpenSSL to sign and verify your pickles (e.g., using a key on some non-public-accessible server). If you need public-accessible code to create and use pickles, there is no solution. (That's a slight oversimplification; a better way to put it is that if there's no existing cert-management and key-exchange system that can get the keys to your software securely, that probably means what you need is impossible.) Tossing in a bunch of other stuff--a manifest listing the types, a timestamp, or other nonrandom salt--or tricks like obfuscating where the key is are ultimately irrelevant. If the signature is tamper-proof, adding more stuff to it doesn't make it any more so; if it's tamperable, adding more stuff doesn't make it less so. Of course you may want to add on extra features (e.g., timestamps can be useful for key revocation schemes to mitigate damage from a crack), or some of that information may be useful for its own sake (e.g., being able to extract the list of types without running the pickle could be very handy for debugging, logging, etc.), but it doesn't increase the security of the signature. Anyway, I think what Neil is trying to solve is something different: assuming the data is insecure and there's no way to secure it, how do we write code that doesn't use it in an unsafe way? They're really separate problems. I don't think Python should do anything to solve yours (anything Python could do, OpenSSL probably can already do for you, better); it might be useful for Python to solve his (although I think picking and stdlibifying or copying a good third-party solution may be a better idea than trying to design one). >>> On July 22, 2015 3:54:31 PM CDT, Andrew Barnert wrote: >>> On Jul 22, 2015, at 13:27, Ryan Gonzalez wrote: >>> >>> A further idea: hashes. >>> >>> Each Pickle database (or whatever it's called) would contain a hash >> made up >>> of: >>> >>> a) The types used to pickle the data. >>> b) The hash of the data itself, prefixed with 2 bytes that have some >> sort >>> of hard-to-get meaning (the length of the call stack?). >>> c) The seconds since epoch, or another 64-bit value. >> >> A type pickled and unpickled in a different interpreter instance isn't >> necessarily going to have the same hash value. And if you don't mean a >> Python hash, how do you hash an arbitrary class object? Or, if you mean >> just the name, how does that secure anything? >> >> For that matter, it's often important for an updated version of the >> code to be able to load pickles created with yesterday's version. This >> is easy to do with the pickle protocol, but hashing would presumably >> break that (unless it didn't protect anything at all). >> >>> The three values would likely be merged via bitwise or. >> >> Why would you merge three hash values with bitwise or instead of one of >> the usual hash combining mechanisms? This just throws away most of your >> entropy. > > Uhhhh...I have no clue. It just came off the top of my head. > >> >>> This has the advantage that there are three different elements making >> up >>> the hash, some of which are harder to locate. Unless two of the >> values are >>> known, the third can't be. >>> >>> The types would be extracted from the hash via some kind of magic, >> >> That really _would_ be magic. The whole point of a hash is that it's >> one-way. If the hashed values can be recovered from it, it's not a >> hash. > > Well, I again know nothing about cryptography, so I guess "key" is a better phrase. :O > >> >> Also, "harder to locate" is useless, unless you plan to continually >> update your code as attackers locate the things you've hidden. (And, >> for something used in as many high-profile uses as Python's pickler, >> any security by obscurity would be attacked very frequently.) >> >>> and then >>> it would validate the data in the database based on the types, like >> Neil >>> said. >>> >>> If someone wanted to change the types, they would need to regenerate >> the >>> whole hash. >> >> And... So what? Unless the checker has some secure way of knowing which >> timestamp, etc. to use in checking the hash, all you have to do is give >> it the timestamp, etc. that go along with your regenerated hash, and it >> will pass. >> >>> Further security could be obtained by prefixing the first value >>> with another special byte sequence that, although easier to find, >> would be >>> used for validation purposes. >>> >>> Point 2's prefixing bytes and point 3's value would be especially >> trickier >>> to find, since a few seconds may pass before the data is written to >> disk. >>> >>> It's still a bit insecure, but much better than the current >> situation. I >>> think. >> >> I think it's much worse than the current situation, because it adds >> illusory security while still being effectively just as crackable. >> >>> >>> >>>> On Wed, Jul 22, 2015 at 3:03 AM, Neil Girdhar >> wrote: >>>> >>>> I've heard it said that pickle is a security hole, and so it's >> better to >>>> write your own serialization routine. That's unfortunate because >> pickle >>>> has so many advantages such as automatically tying into >> copy/deepcopy. >>>> Would it be possible to make unpickle secure, e.g., by having the >> caller >>>> create a context in which all calls to unpickle are limited to >> unpickling a >>>> specific set of types? (When these types unpickle their >> sub-objects, they >>>> could potentially limit the set of types further.) >>>> >>>> _______________________________________________ >>>> Python-ideas mailing list >>>> Python-ideas at python.org >>>> https://mail.python.org/mailman/listinfo/python-ideas >>>> Code of Conduct: http://python.org/psf/codeofconduct/ >>> >>> >>> >>> -- >>> Ryan >>> [ERROR]: Your autotools build scripts are 200 lines longer than your >>> program. Something?s wrong. >>> http://kirbyfan64.github.io/ >>> Currently listening to: Death Egg Boss theme (Sonic Generations) >>> -- >>> Sent from my Android device with K-9 Mail. Please excuse my brevity. >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ > > -- > Sent from my Android device with K-9 Mail. Please excuse my brevity. From eric at trueblade.com Thu Jul 23 00:29:37 2015 From: eric at trueblade.com (Eric V. Smith) Date: Wed, 22 Jul 2015 18:29:37 -0400 Subject: [Python-ideas] Briefer string format In-Reply-To: <55AFFB62.7040308@mrabarnett.plus.com> References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AFE66E.9070007@trueblade.com> <55AFFB62.7040308@mrabarnett.plus.com> Message-ID: <55B01951.2000505@trueblade.com> On 7/22/2015 4:21 PM, MRAB wrote: > On 2015-07-22 19:52, Eric V. Smith wrote: >> On 07/20/2015 03:22 PM, Guido van Rossum wrote: >>> Not sure what you mean by "implicit merging" -- if you mean literal >>> concatenation (e.g. 'foo' "bar" == 'foobar') then I think it should be >>> allowed, just like we support mixing quotes and r''. >> >> Do we really want to support this? It complicates the implementation, >> and I'm not sure of the value. >> >> f'{foo}' 'bar' f'{baz}' >> becomes something like: >> format(foo) + 'bar' + format(baz) >> >> You're not merging similar things, like you are with normal string >> concatenation. >> >> And merging f-strings: >> f'{foo}' f'{bar'} >> similarly just becomes concatenating the results of some function calls. >> >> I guess it depends if you think of an f-string as a string, or an >> expression (like the function calls it will become). I don't have a real >> strong preference, but I'd like to get it ironed out logically before >> doing a trial implementation. >> > As Guido said, we can merge raw string literals. True, but f-strings aren't string literals. They're expressions disguised as string literals. > It would be a gotcha if: > > r'{foo}' 'bar' > > worked but: > > f'{foo}' 'bar' > > didn't. You'd then have to how they're different even though they look > a lot alike. While they look alike, they're not at all similar. Nothing is being merged, since the f-string is being evaluated at runtime, not compile time. I'm not sure if it would be best to hide this runtime string concatenation behind something that looks like it has less of a cost. At runtime, it's likely going to look something like: ''.join([f'foo', 'bar']) although using _PyUnicodeWriter, I guess. Eric. From eric at trueblade.com Thu Jul 23 00:30:37 2015 From: eric at trueblade.com (Eric V. Smith) Date: Wed, 22 Jul 2015 18:30:37 -0400 Subject: [Python-ideas] Secure unpickle In-Reply-To: References: Message-ID: <55B0198D.5040002@trueblade.com> Have you looked at https://docs.python.org/3/library/pickle.html#pickle-restrict ? -- Eric. > On Jul 22, 2015, at 4:03 AM, Neil Girdhar wrote: > > I've heard it said that pickle is a security hole, and so it's better to write your own serialization routine. That's unfortunate because pickle has so many advantages such as automatically tying into copy/deepcopy. Would it be possible to make unpickle secure, e.g., by having the caller create a context in which all calls to unpickle are limited to unpickling a specific set of types? (When these types unpickle their sub-objects, they could potentially limit the set of types further.) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From mistersheik at gmail.com Thu Jul 23 02:27:27 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Wed, 22 Jul 2015 20:27:27 -0400 Subject: [Python-ideas] Fwd: Re: Secure unpickle In-Reply-To: <8C8499DB-BC90-497A-9BDC-7135CE74ACA9@yahoo.com> References: <53B91A7F-32F3-4267-8657-92CCD91042B4@gmail.com> <2731AF91-DC93-4798-AF03-D990B4C06367@yahoo.com> <9D5C749C-AE91-4CF5-A111-BFFA046E2B6A@gmail.com> <8C8499DB-BC90-497A-9BDC-7135CE74ACA9@yahoo.com> Message-ID: On Wed, Jul 22, 2015 at 6:17 PM, Andrew Barnert wrote: > > On Jul 22, 2015, at 13:58, Ryan Gonzalez wrote: > > > > Disclaimer: I know virtually *nothing* about cryptography, so this is > probably worse than it seems. > > It's always better to look for an existing cryptosystem than to try to > invent a new one. > > Briefly, I think what you're looking for here is a way to sign pickles and > verify their signatures, which is a well-known problem. > > If you have some secure way to store keys (e.g., the only code that ever > touches the pickles runs on your backend servers), everything is easy; just > use, say, OpenSSL to sign and verify your pickles (e.g., using a key on > some non-public-accessible server). If you need public-accessible code to > create and use pickles, there is no solution. (That's a slight > oversimplification; a better way to put it is that if there's no existing > cert-management and key-exchange system that can get the keys to your > software securely, that probably means what you need is impossible.) > > Tossing in a bunch of other stuff--a manifest listing the types, a > timestamp, or other nonrandom salt--or tricks like obfuscating where the > key is are ultimately irrelevant. If the signature is tamper-proof, adding > more stuff to it doesn't make it any more so; if it's tamperable, adding > more stuff doesn't make it less so. Of course you may want to add on extra > features (e.g., timestamps can be useful for key revocation schemes to > mitigate damage from a crack), or some of that information may be useful > for its own sake (e.g., being able to extract the list of types without > running the pickle could be very handy for debugging, logging, etc.), but > it doesn't increase the security of the signature. > > Anyway, I think what Neil is trying to solve is something different: > assuming the data is insecure and there's no way to secure it, how do we > write code that doesn't use it in an unsafe way? > > They're really separate problems. I don't think Python should do anything > to solve yours (anything Python could do, OpenSSL probably can already do > for you, better); it might be useful for Python to solve his (although I > think picking and stdlibifying or copying a good third-party solution may > be a better idea than trying to design one). > Thanks Andrew, totally agree with what you said. For the record, I don't know exactly what the problem is. I just noticed on some projects people talking about writing their own unpickling code because of insecurities in pickle, and it made me think: "why should you have to?" E.g., https://github.com/matplotlib/matplotlib/issues/3424 https://github.com/matplotlib/matplotlib/issues/4756 People explicitly say: "get the ability to dump/return our figures to *any* serialization format other than pickle"! That is so unfortunate. Pickle is such a good solution except for the security. Why can't we have security too? It doesn't seem to me to be right for a project like matplotlib to be writing their own serialization library. It would be awesome if Python had secure serialization built-in. Best, Neil > >>> On July 22, 2015 3:54:31 PM CDT, Andrew Barnert > wrote: > >>> On Jul 22, 2015, at 13:27, Ryan Gonzalez wrote: > >>> > >>> A further idea: hashes. > >>> > >>> Each Pickle database (or whatever it's called) would contain a hash > >> made up > >>> of: > >>> > >>> a) The types used to pickle the data. > >>> b) The hash of the data itself, prefixed with 2 bytes that have some > >> sort > >>> of hard-to-get meaning (the length of the call stack?). > >>> c) The seconds since epoch, or another 64-bit value. > >> > >> A type pickled and unpickled in a different interpreter instance isn't > >> necessarily going to have the same hash value. And if you don't mean a > >> Python hash, how do you hash an arbitrary class object? Or, if you mean > >> just the name, how does that secure anything? > >> > >> For that matter, it's often important for an updated version of the > >> code to be able to load pickles created with yesterday's version. This > >> is easy to do with the pickle protocol, but hashing would presumably > >> break that (unless it didn't protect anything at all). > >> > >>> The three values would likely be merged via bitwise or. > >> > >> Why would you merge three hash values with bitwise or instead of one of > >> the usual hash combining mechanisms? This just throws away most of your > >> entropy. > > > > Uhhhh...I have no clue. It just came off the top of my head. > > > >> > >>> This has the advantage that there are three different elements making > >> up > >>> the hash, some of which are harder to locate. Unless two of the > >> values are > >>> known, the third can't be. > >>> > >>> The types would be extracted from the hash via some kind of magic, > >> > >> That really _would_ be magic. The whole point of a hash is that it's > >> one-way. If the hashed values can be recovered from it, it's not a > >> hash. > > > > Well, I again know nothing about cryptography, so I guess "key" is a > better phrase. :O > > > >> > >> Also, "harder to locate" is useless, unless you plan to continually > >> update your code as attackers locate the things you've hidden. (And, > >> for something used in as many high-profile uses as Python's pickler, > >> any security by obscurity would be attacked very frequently.) > >> > >>> and then > >>> it would validate the data in the database based on the types, like > >> Neil > >>> said. > >>> > >>> If someone wanted to change the types, they would need to regenerate > >> the > >>> whole hash. > >> > >> And... So what? Unless the checker has some secure way of knowing which > >> timestamp, etc. to use in checking the hash, all you have to do is give > >> it the timestamp, etc. that go along with your regenerated hash, and it > >> will pass. > >> > >>> Further security could be obtained by prefixing the first value > >>> with another special byte sequence that, although easier to find, > >> would be > >>> used for validation purposes. > >>> > >>> Point 2's prefixing bytes and point 3's value would be especially > >> trickier > >>> to find, since a few seconds may pass before the data is written to > >> disk. > >>> > >>> It's still a bit insecure, but much better than the current > >> situation. I > >>> think. > >> > >> I think it's much worse than the current situation, because it adds > >> illusory security while still being effectively just as crackable. > >> > >>> > >>> > >>>> On Wed, Jul 22, 2015 at 3:03 AM, Neil Girdhar > >> wrote: > >>>> > >>>> I've heard it said that pickle is a security hole, and so it's > >> better to > >>>> write your own serialization routine. That's unfortunate because > >> pickle > >>>> has so many advantages such as automatically tying into > >> copy/deepcopy. > >>>> Would it be possible to make unpickle secure, e.g., by having the > >> caller > >>>> create a context in which all calls to unpickle are limited to > >> unpickling a > >>>> specific set of types? (When these types unpickle their > >> sub-objects, they > >>>> could potentially limit the set of types further.) > >>>> > >>>> _______________________________________________ > >>>> Python-ideas mailing list > >>>> Python-ideas at python.org > >>>> https://mail.python.org/mailman/listinfo/python-ideas > >>>> Code of Conduct: http://python.org/psf/codeofconduct/ > >>> > >>> > >>> > >>> -- > >>> Ryan > >>> [ERROR]: Your autotools build scripts are 200 lines longer than your > >>> program. Something?s wrong. > >>> http://kirbyfan64.github.io/ > >>> Currently listening to: Death Egg Boss theme (Sonic Generations) > >>> -- > >>> Sent from my Android device with K-9 Mail. Please excuse my brevity. > >>> _______________________________________________ > >>> Python-ideas mailing list > >>> Python-ideas at python.org > >>> https://mail.python.org/mailman/listinfo/python-ideas > >>> Code of Conduct: http://python.org/psf/codeofconduct/ > > > > -- > > Sent from my Android device with K-9 Mail. Please excuse my brevity. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Thu Jul 23 02:29:20 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Wed, 22 Jul 2015 20:29:20 -0400 Subject: [Python-ideas] Secure unpickle In-Reply-To: <55B0198D.5040002@trueblade.com> References: <55B0198D.5040002@trueblade.com> Message-ID: That's amazing. I did not know about that. On Wed, Jul 22, 2015 at 6:30 PM, Eric V. Smith wrote: > Have you looked at > https://docs.python.org/3/library/pickle.html#pickle-restrict > ? > > -- > Eric. > > > On Jul 22, 2015, at 4:03 AM, Neil Girdhar wrote: > > > > I've heard it said that pickle is a security hole, and so it's better to > write your own serialization routine. That's unfortunate because pickle > has so many advantages such as automatically tying into copy/deepcopy. > Would it be possible to make unpickle secure, e.g., by having the caller > create a context in which all calls to unpickle are limited to unpickling a > specific set of types? (When these types unpickle their sub-objects, they > could potentially limit the set of types further.) > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/OhYb7RHNHyA/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Jul 23 03:46:09 2015 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 22 Jul 2015 18:46:09 -0700 Subject: [Python-ideas] Fwd: Re: Secure unpickle In-Reply-To: References: <53B91A7F-32F3-4267-8657-92CCD91042B4@gmail.com> <2731AF91-DC93-4798-AF03-D990B4C06367@yahoo.com> <9D5C749C-AE91-4CF5-A111-BFFA046E2B6A@gmail.com> <8C8499DB-BC90-497A-9BDC-7135CE74ACA9@yahoo.com> Message-ID: On Wed, Jul 22, 2015 at 5:27 PM, Neil Girdhar wrote: > > That is so unfortunate. Pickle is such a good solution except for the > security. Why can't we have security too? It doesn't seem to me to be > right for a project like matplotlib to be writing their own serialization > library. It would be awesome if Python had secure serialization built-in. The reason you can pickle/unpickle arbitrary Python objects is that the pickle format is basically a structured, optimized way of generating and then evaluating arbitrary Python code. Which is great because it's totally general -- that's why we love pickle, you can pickle anything -- but that exact feature is what makes it insecure. If you want to make something secure, that means making some explicit decisions about what kinds of things can be put into your data format and which cannot, and write some explicit code to handle each of these things instead of just handing the file format direct access to your interpreter. But by the time you've done that you've done the hard part of implementing a new format anyway... -n -- Nathaniel J. Smith -- http://vorpus.org From abarnert at yahoo.com Thu Jul 23 03:44:56 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 22 Jul 2015 18:44:56 -0700 Subject: [Python-ideas] Fwd: Re: Secure unpickle In-Reply-To: References: <53B91A7F-32F3-4267-8657-92CCD91042B4@gmail.com> <2731AF91-DC93-4798-AF03-D990B4C06367@yahoo.com> <9D5C749C-AE91-4CF5-A111-BFFA046E2B6A@gmail.com> <8C8499DB-BC90-497A-9BDC-7135CE74ACA9@yahoo.com> Message-ID: <397CC577-B24A-41E0-A8B3-BB3AE47A848A@yahoo.com> > On Jul 22, 2015, at 17:27, Neil Girdhar wrote: > > Thanks Andrew, totally agree with what you said. For the record, I don't know exactly what the problem is. I just noticed on some projects people talking about writing their own unpickling code because of insecurities in pickle, and it made me think: "why should you have to?" The problem is inherent to the design of pickle: it's a virtual machine that can make Python import arbitrary modules and call arbitrary globals (with arbitrary literals and/or already-constructed objects as arguments). You can't fix that without replacing the whole design. And that's what they're asking for in your second link: they want explicit imperative code in matplotlib, rather than the data, to drive the process. Also, the reason pickle is so convenient is that classes can opt in just by adding the right methods, but that's the same reason that not anticipating everything your code might do can mean an invisible security hole instead of a "can't pickle that type" error, so you can't fix that either without giving up that convenience. Of course the other problem is FUD. Despite the fact that there are plenty of use cases for which pickle is safe, there are people who would rather teach you that it's never ever safe than teach you how to recognize and understand potential problems. And there are people who believe pickle is slow and space-wasteful and can't handle large data, either because they read a blog post from 15 years ago, or because they're still using 2.7 and haven't read far enough down the docs page to see that they don't have to use format 0. And people who dogmatically insist that all serialization formats should be interchange formats (a pickle can only be unpickled by the exact same program, or a carefully-updated newer version of the same program) even when interchange isn't relevant. And so on. Changing pickle wouldn't get rid of the FUD unless you completely replaced it. So, it might be useful to build a little PyPI module that offered a pickle loader that didn't allow new modules to be imported and didn't allow any globals to be called except the ones specified in an explicit tuple specified in the constructor. But you still have to understand the issues to know when that will and won't solve your problems. And it still wouldn't satisfy the people posting in those bug reports. From steve at pearwood.info Thu Jul 23 05:31:14 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 23 Jul 2015 13:31:14 +1000 Subject: [Python-ideas] Briefer string format In-Reply-To: <55AFE66E.9070007@trueblade.com> References: <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AFE66E.9070007@trueblade.com> Message-ID: <20150723033112.GJ25179@ando.pearwood.info> Executive summary for those in a hurry: * implicit concatenation of strings *of any type* always occurs at compile-time; * if the first (or any?) of the concat'ed fragments begin with an f prefix, then the resulting concatenated string is deemed to begin with an f prefix and is compiled to a call to format (or some other appropriate implementation), which is a run-time operation; * the peep-hole optimizer has to avoid concat'ing mixed f and non-f strings: f'{spam}' + '{eggs}' should evaluate to something like (format(spam) + '{eggs}'). Longer version with more detail below. On Wed, Jul 22, 2015 at 02:52:30PM -0400, Eric V. Smith wrote: > On 07/20/2015 03:22 PM, Guido van Rossum wrote: > > Not sure what you mean by "implicit merging" -- if you mean literal > > concatenation (e.g. 'foo' "bar" == 'foobar') then I think it should be > > allowed, just like we support mixing quotes and r''. > > Do we really want to support this? It complicates the implementation, > and I'm not sure of the value. > > f'{foo}' 'bar' f'{baz}' > becomes something like: > format(foo) + 'bar' + format(baz) > > You're not merging similar things, like you are with normal string > concatenation. I would not want or expect that behaviour. However, I would want and expect that behaviour with *explicit* concatenation using the + operator. I would want the peephole optimizer to avoid optimizing this case: f'{foo}' + 'bar' + f'{baz}' and allow it to be compiled to something like: format(foo) + 'bar' + format(baz) With explicit concatenation, the format() calls occur before the + operators are called. Constant-folding 'a' + 'b' to 'ab' is an optimization, it doesn't change the semantics of the concat. But constant-folding f'{a}' + '{b}' would change the semantics of the concatenation, because f strings aren't constants, they only look like them. In the case of *implicit* concatenation, I think that the concatenations should occur first, at compile time. Yes, that deliberately introduces a difference between implicit and explicit concatenation, that's a feature, not a bug! Implicit concatenation will help in the same cases that implicit concatenation usually helps: long strings without newlines: msg = (f'a long message here blah blah {x}' f' and {y} and {z} and more {stuff} and {things}' f' and perhaps even more {poppycock}' ) That should be treated as syntactically equivalent to: msg = f'a long message here blah blah {x} and {y} and {z} and more {stuff} and {things} and perhaps even more {poppycock}' which is then compiled into the usual format(...) magic, as normal. So, a very strong +1 on allowing implicit concatenation. I would go further and allow all the f prefixes apart from the first to be optional. To put it another way, the first f prefix "infects" all the other string fragments: msg = (f'a long message here blah blah {x}' ' and {y} and {z} and more {stuff} and {things}' ' and perhaps even more {poppycock}' ) should be exactly the same as the first version. My reasoning is that the implicit concatenation always occurs first, so by the time the format(...) magic occurs at run-time, the interpreter no long knows which braces came from an f-string and which came from a regular string. (Implicit concatenation is a compile-time operation, the format(...) stuff is run-time, so there is a clear and logical order of operations.) To avoid potential surprises, I would disallow the case where the f prefix doesn't occur in the first fragment, or at least raise a compile-time warning: 'spam' 'eggs' f'{cheese}' should raise or warn. (That restriction could be removed in the future, if it turns out not to be a problem.) > And merging f-strings: > f'{foo}' f'{bar'} > similarly just becomes concatenating the results of some function calls. That's safe to do at compile-time: f'{foo}' f'{bar}' f'{foo}{bar}' will always be the same. There's no need to delay the concat until after the formats. -- Steve From rosuav at gmail.com Thu Jul 23 05:41:15 2015 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 23 Jul 2015 13:41:15 +1000 Subject: [Python-ideas] Briefer string format In-Reply-To: <20150723033112.GJ25179@ando.pearwood.info> References: <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AFE66E.9070007@trueblade.com> <20150723033112.GJ25179@ando.pearwood.info> Message-ID: On Thu, Jul 23, 2015 at 1:31 PM, Steven D'Aprano wrote: > I would go further and allow all the f prefixes apart from the first to > be optional. To put it another way, the first f prefix "infects" all the > other string fragments: > > msg = (f'a long message here blah blah {x}' > ' and {y} and {z} and more {stuff} and {things}' > ' and perhaps even more {poppycock}' > ) > > should be exactly the same as the first version. Editors/IDEs would have to be taught about this (and particularly, taught to show f"..." very differently from "..."), because otherwise a missed comma could be very surprising: lines = [ "simple string", "string {with} {braces}", f"formatted string {using} {interpolation}" # oops, missed a comma "another string {with} {braces}" ] Could be a pain to try to debug that one, partly because stuff is happening at compile time, so you can't even pretty-print lines immediately after assignment to notice that there are only three elements rather than four. That solved, though, I think you're probably right about the f prefix infecting the remaining fragments. It'd be more consistent to have it *not* infect, the same way that an r or u/b prefix doesn't infect subsequent snippets; but if anyone's bothered by it, they can always stick in a few plus signs. ChrisA From anthony at xtfx.me Thu Jul 23 06:05:27 2015 From: anthony at xtfx.me (C Anthony Risinger) Date: Wed, 22 Jul 2015 23:05:27 -0500 Subject: [Python-ideas] Briefer string format In-Reply-To: <20150723033112.GJ25179@ando.pearwood.info> References: <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AFE66E.9070007@trueblade.com> <20150723033112.GJ25179@ando.pearwood.info> Message-ID: On Jul 22, 2015 10:31 PM, "Steven D'Aprano" wrote: > > [...] > > I would go further and allow all the f prefixes apart from the first to > be optional. To put it another way, the first f prefix "infects" all the > other string fragments: > > msg = (f'a long message here blah blah {x}' > ' and {y} and {z} and more {stuff} and {things}' > ' and perhaps even more {poppycock}' > ) > > should be exactly the same as the first version. My reasoning is that > the implicit concatenation always occurs first, so by the time the > format(...) magic occurs at run-time, the interpreter no long knows > which braces came from an f-string and which came from a regular string. > > (Implicit concatenation is a compile-time operation, the format(...) > stuff is run-time, so there is a clear and logical order of operations.) > > To avoid potential surprises, I would disallow the case where the f > prefix doesn't occur in the first fragment, or at least raise a > compile-time warning: > > 'spam' 'eggs' f'{cheese}' > > should raise or warn. (That restriction could be removed in the future, > if it turns out not to be a problem.) > > > > And merging f-strings: > > f'{foo}' f'{bar'} > > similarly just becomes concatenating the results of some function calls. > > That's safe to do at compile-time: > > f'{foo}' f'{bar}' > f'{foo}{bar}' > > will always be the same. There's no need to delay the concat until after > the formats. This makes perfect sense to me. -- C Anthony -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Thu Jul 23 06:23:08 2015 From: wes.turner at gmail.com (Wes Turner) Date: Wed, 22 Jul 2015 23:23:08 -0500 Subject: [Python-ideas] Fwd: Re: Secure unpickle In-Reply-To: <397CC577-B24A-41E0-A8B3-BB3AE47A848A@yahoo.com> References: <53B91A7F-32F3-4267-8657-92CCD91042B4@gmail.com> <2731AF91-DC93-4798-AF03-D990B4C06367@yahoo.com> <9D5C749C-AE91-4CF5-A111-BFFA046E2B6A@gmail.com> <8C8499DB-BC90-497A-9BDC-7135CE74ACA9@yahoo.com> <397CC577-B24A-41E0-A8B3-BB3AE47A848A@yahoo.com> Message-ID: * https://github.com/jsonpickle/jsonpickle (keep code and data separate) * https://pypi.python.org/pypi/dill (IPython) * https://github.com/zopefoundation/zodbpickle/issues/2 (cwe links) ... Alternatives to unserializing code: https://wrdrd.com/docs/consulting/knowledge-engineering#distributed-computing-protocols #json-ld On Jul 22, 2015 8:48 PM, "Andrew Barnert via Python-ideas" < python-ideas at python.org> wrote: > > On Jul 22, 2015, at 17:27, Neil Girdhar wrote: > > > > Thanks Andrew, totally agree with what you said. For the record, I > don't know exactly what the problem is. I just noticed on some projects > people talking about writing their own unpickling code because of > insecurities in pickle, and it made me think: "why should you have to?" > > The problem is inherent to the design of pickle: it's a virtual machine > that can make Python import arbitrary modules and call arbitrary globals > (with arbitrary literals and/or already-constructed objects as arguments). > You can't fix that without replacing the whole design. And that's what > they're asking for in your second link: they want explicit imperative code > in matplotlib, rather than the data, to drive the process. > > Also, the reason pickle is so convenient is that classes can opt in just > by adding the right methods, but that's the same reason that not > anticipating everything your code might do can mean an invisible security > hole instead of a "can't pickle that type" error, so you can't fix that > either without giving up that convenience. > > Of course the other problem is FUD. Despite the fact that there are plenty > of use cases for which pickle is safe, there are people who would rather > teach you that it's never ever safe than teach you how to recognize and > understand potential problems. And there are people who believe pickle is > slow and space-wasteful and can't handle large data, either because they > read a blog post from 15 years ago, or because they're still using 2.7 and > haven't read far enough down the docs page to see that they don't have to > use format 0. And people who dogmatically insist that all serialization > formats should be interchange formats (a pickle can only be unpickled by > the exact same program, or a carefully-updated newer version of the same > program) even when interchange isn't relevant. And so on. Changing pickle > wouldn't get rid of the FUD unless you completely replaced it. > > So, it might be useful to build a little PyPI module that offered a pickle > loader that didn't allow new modules to be imported and didn't allow any > globals to be called except the ones specified in an explicit tuple > specified in the constructor. But you still have to understand the issues > to know when that will and won't solve your problems. And it still wouldn't > satisfy the people posting in those bug reports. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bruce at leban.us Thu Jul 23 06:28:19 2015 From: bruce at leban.us (Bruce Leban) Date: Wed, 22 Jul 2015 21:28:19 -0700 Subject: [Python-ideas] Briefer string format In-Reply-To: <20150723033112.GJ25179@ando.pearwood.info> References: <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AFE66E.9070007@trueblade.com> <20150723033112.GJ25179@ando.pearwood.info> Message-ID: On Wed, Jul 22, 2015 at 8:31 PM, Steven D'Aprano wrote: > > Constant-folding 'a' + 'b' to 'ab' is an optimization, it doesn't change > the semantics of the concat. But constant-folding f'{a}' + '{b}' would > change the semantics of the concatenation, because f strings aren't > constants, they only look like them. > It doesn't have to change semantics and it shouldn't. This is a strawman argument. While we could do it wrong, why would we? It's hardly difficult to quote the non-format string while still optimizing the concatenation. That is, f'{foo}' '{bar}' f'{oof}' could compile to the same thing as if you wrote: f'{foo}{{bar}}{oof}' the result of something like this: COMPILE_TIME_FORMAT_TRANSFORM('{foo}' + COMPILE_TIME_ESCAPE('{bar}') + '{baz'}) This is analogous what happens with mixing raw and non-raw strings: r'a\b' 'm\n' r'x\y' is the same as if you wrote: 'a\\bm\nx\\y' or r'''a\bm x\y''' In the case of *implicit* concatenation, I think that the concatenations > should occur first, at compile time. Yes, that deliberately introduces a > difference between implicit and explicit concatenation, that's a > feature, not a bug! > Doing the concatenation at compile time does NOT require the "infected" behavior you describe below as noted above. > I would go further and allow all the f prefixes apart from the first to > be optional. To put it another way, the first f prefix "infects" all the > other string fragments: > > I'd call that a bug. I suppose one person's bug is another person's feature. It violates the principle of least surprise. When I look at a line in isolation and it starts and ends with a quote, I would not expect that to not just be a plain string. > (Implicit concatenation is a compile-time operation, the format(...) > stuff is run-time, so there is a clear and logical order of operations.) > To you, maybe. To the average developer, I doubt it. I view the compile time evaluation of implicit concatenation as a compiler implementation detail as it makes essentially no difference to the semantics of the program. (Yes, I know that runtime concatenation *might* produce a different string object each time through the code but it doesn't have to. I hope you don't write programs that depend on the presence or absence of string pooling.) > And merging f-strings: > > f'{foo}' f'{bar'} > > similarly just becomes concatenating the results of some function calls. > > That's safe to do at compile-time: > > f'{foo}' f'{bar}' > f'{foo}{bar}' > > will always be the same. There's no need to delay the concat until after > the formats. Just as it's safe to concat strings after escaping the non-format ones. There is one additional detail. I think it should be required that each format string stand on its own. That is: f'x{foo' f'bar}y' should be an error and not the equivalent of f'x{foobar}y' --- Bruce Check out my new puzzle book: http://J.mp/ingToConclusions Get it free here: http://J.mp/ingToConclusionsFree (available on iOS) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Thu Jul 23 09:48:11 2015 From: guido at python.org (Guido van Rossum) Date: Thu, 23 Jul 2015 09:48:11 +0200 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AFE66E.9070007@trueblade.com> <20150723033112.GJ25179@ando.pearwood.info> Message-ID: I'm against f'' infecting subsequent "literals". After all, r'' or b'' don't infect their neighbors. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Thu Jul 23 09:49:30 2015 From: guido at python.org (Guido van Rossum) Date: Thu, 23 Jul 2015 09:49:30 +0200 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AC3425.5010509@mgmiller.net> <55ACFE26.4000303@trueblade.com> <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AFE66E.9070007@trueblade.com> <20150723033112.GJ25179@ando.pearwood.info> Message-ID: But to be clear I'm in favor of the implicit concatenation syntax. On Jul 23, 2015 9:48 AM, "Guido van Rossum" wrote: > I'm against f'' infecting subsequent "literals". After all, r'' or b'' > don't infect their neighbors. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Thu Jul 23 15:54:58 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Thu, 23 Jul 2015 09:54:58 -0400 Subject: [Python-ideas] Fwd: Re: Secure unpickle In-Reply-To: References: <53B91A7F-32F3-4267-8657-92CCD91042B4@gmail.com> <2731AF91-DC93-4798-AF03-D990B4C06367@yahoo.com> <9D5C749C-AE91-4CF5-A111-BFFA046E2B6A@gmail.com> <8C8499DB-BC90-497A-9BDC-7135CE74ACA9@yahoo.com> Message-ID: On Wed, Jul 22, 2015 at 9:46 PM, Nathaniel Smith wrote: > On Wed, Jul 22, 2015 at 5:27 PM, Neil Girdhar > wrote: > > > > That is so unfortunate. Pickle is such a good solution except for the > > security. Why can't we have security too? It doesn't seem to me to be > > right for a project like matplotlib to be writing their own serialization > > library. It would be awesome if Python had secure serialization > built-in. > > The reason you can pickle/unpickle arbitrary Python objects is that > the pickle format is basically a structured, optimized way of > generating and then evaluating arbitrary Python code. Which is great > because it's totally general -- that's why we love pickle, you can > pickle anything -- but that exact feature is what makes it insecure. > If you want to make something secure, that means making some explicit > decisions about what kinds of things can be put into your data format > and which cannot, and write some explicit code to handle each of these > things instead of just handing the file format direct access to your > interpreter. But by the time you've done that you've done the hard > part of implementing a new format anyway... > Wouldn't it be easier to just tell unpickle which code it's allowed to run (by passing a list of modules and classes)? Then your serializer can be reused by deepcopy and other Python routines that might tie into "reduce"? I think that's easier than "implementing (yet another) a new format". > -n > > -- > Nathaniel J. Smith -- http://vorpus.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Thu Jul 23 16:22:14 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 24 Jul 2015 00:22:14 +1000 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AFE66E.9070007@trueblade.com> <20150723033112.GJ25179@ando.pearwood.info> Message-ID: <20150723142213.GK25179@ando.pearwood.info> On Wed, Jul 22, 2015 at 09:28:19PM -0700, Bruce Leban wrote: > On Wed, Jul 22, 2015 at 8:31 PM, Steven D'Aprano > wrote: > > > > > Constant-folding 'a' + 'b' to 'ab' is an optimization, it doesn't change > > the semantics of the concat. But constant-folding f'{a}' + '{b}' would > > change the semantics of the concatenation, because f strings aren't > > constants, they only look like them. > > > > It doesn't have to change semantics and it shouldn't. This is a strawman > argument. If I had a dollar for everytime somebody on the Internet misused "strawman argument", I would be a rich man. Just because you disagree with me or think I'm wrong doesn't make my argument a strawman. It just makes me wrong-headed, or wrong :-) I'm having trouble understand what precisely you are disagreeing with. The example I give which you quote involves explicit concatenation with the + operator, but your examples below use implicit concatenation with no operator at all. Putting aside the question of implementation, I think: (1) Explicit concatenation with the + operator should be treated as occuring after the f strings are evaluated, *as if* the following occurs: f'{spam}' + '{eggs}' => compiles to format(spam) + '{eggs}' If you can come up with a clever optimization that avoids the need to *actually* build two temporary strings and then concatenate them, I don't have a problem with that. I'm only talking about the semantics. I don't want this: f'{spam}' + '{eggs}' => compiles to format(spam) + format(eggs) # not this! Do you agree with those semantics for explicit + concatenation? If not, what behaviour do you want? (2) Implicit concatenation should occur as early as possible, before the format. Take the easy case first: both fragments are f-strings. f'{spam}' f'{eggs}' => behaves as if you wrote f'{spam}{eggs}' => which compiles to format(spam) + format(eggs) Do you agree with those semantics for implicit concatenation? (3) The hard case, when you mix f and non-f strings. f'{spam}' '{eggs}' Notwithstanding raw strings, the behaviour which makes sense to me is that the implicit string concatenation occurs first, followed by format. So, semantically, if the parser sees the above, it should concat the string: => f'{spam}{eggs}' then transform it to a call to format: => format(spam) + format(eggs) I described that as the f "infecting" the other string. Guido has said he doesn't like this, but I'm not sure what behaviour he wants instead. I don't think I want this behaviour: f'{spam}' '{eggs}' => format(spam) + '{eggs}' for two reasons. Firstly, I already have (at least!) one way of getting that behaviour, such as explicit + concatenation as above. Secondly, it feels that this does the concatenation in the wrong order. Implicit concatenation occurs as early as possible in every other case. But here, we're delaying the concatenation until after the format. So this feels wrong to me. (Again, I'm talking semantics, not implementation. Clever tricks with escaping the brackets don't matter.) If there's no consensus on the behaviour of mixed f and non-f strings with implicit concatenation, rather than pick one and frustrate and surprise half the users, we should make it an error: f'{spam}' '{eggs}' => raises SyntaxError and require people to be explicit about what they want, e.g.: f'{spam}' + '{eggs}' # concatenation occurs after the format() f'{spam}' f'{eggs}' # implicit concatenation before format() (for the avoidance of doubt, I don't care whether the concatenation *actually* occurs after the format, I'm only talking about semantics, not implementation, sorry to keep beating this dead horse). > > I would go further and allow all the f prefixes apart from the first to > > be optional. To put it another way, the first f prefix "infects" all the > > other string fragments: > > > I'd call that a bug. I suppose one person's bug is another person's > feature. It violates the principle of least surprise. When I look at a line > in isolation and it starts and ends with a quote, I would not expect that > to not just be a plain string. I don't think we can look at strings in isolation line-by-line. s = r'''This is a long \raw s\tring that goes over mul\tiple lines and contains "\backslashes" okay? ''' > > (Implicit concatenation is a compile-time operation, the format(...) > > stuff is run-time, so there is a clear and logical order of operations.) > > > > To you, maybe. To the average developer, I doubt it. I'm not sure if you are complementing me on being a genius, or putting the average developer down for being even more dimwitted than me :-) > I view the compile > time evaluation of implicit concatenation as a compiler implementation > detail as it makes essentially no difference to the semantics of the > program. But once you bring f strings into the picture, then it DOES make a very large semantic difference. f'{spam}' '{eggs}' is very different depending on whether that is semantically the same as: - concat '{spam}' and '{eggs}', then format - format spam alone, then concat '{eggs}' We can't just say that when the concatenation actually occurs is an optimization, as we can with raw and cooked string literals, because the f string is not a literal, it's actually a function call in disguise. So we have to pick one or the other (or refuse to guess and raise a syntax error). You're right that it doesn't have to occur at compile time. (Although that has been the case all the way back to at least Python 1.5.) But it is a syntactic feature: "Note that this feature is defined at the syntactical level, but implemented at compile time. The ?+? operator must be used to concatenate string expressions at run time." https://docs.python.org/3/reference/lexical_analysis.html#string-literal-concatenation which suggests to me that *semantically* it should occur as early as possible, before the format() operation. That is, it should be equivalent to: - concat '{spam}' and '{eggs}', then format and not format followed by concat. You mentioned the principle of least surprise. I think it would be very surprising to have implicit concatenation behave *as if* it were occurring after the format, which is what you get if you escape the {{eggs}}. But YMMV. If we (the community) cannot reach consensus, perhaps the safest thing would be to just refuse to guess and raise an error on implicit concat of f and non-f strings. -- Steve From eric at trueblade.com Thu Jul 23 16:26:00 2015 From: eric at trueblade.com (Eric V. Smith) Date: Thu, 23 Jul 2015 10:26:00 -0400 Subject: [Python-ideas] Fwd: Re: Secure unpickle In-Reply-To: References: <53B91A7F-32F3-4267-8657-92CCD91042B4@gmail.com> <2731AF91-DC93-4798-AF03-D990B4C06367@yahoo.com> <9D5C749C-AE91-4CF5-A111-BFFA046E2B6A@gmail.com> <8C8499DB-BC90-497A-9BDC-7135CE74ACA9@yahoo.com> Message-ID: <55B0F978.9060601@trueblade.com> On 07/23/2015 09:54 AM, Neil Girdhar wrote: > > > On Wed, Jul 22, 2015 at 9:46 PM, Nathaniel Smith > wrote: > > On Wed, Jul 22, 2015 at 5:27 PM, Neil Girdhar > wrote: > > > > That is so unfortunate. Pickle is such a good solution except for the > > security. Why can't we have security too? It doesn't seem to me to be > > right for a project like matplotlib to be writing their own serialization > > library. It would be awesome if Python had secure serialization built-in. > > The reason you can pickle/unpickle arbitrary Python objects is that > the pickle format is basically a structured, optimized way of > generating and then evaluating arbitrary Python code. Which is great > because it's totally general -- that's why we love pickle, you can > pickle anything -- but that exact feature is what makes it insecure. > If you want to make something secure, that means making some explicit > decisions about what kinds of things can be put into your data format > and which cannot, and write some explicit code to handle each of these > things instead of just handing the file format direct access to your > interpreter. But by the time you've done that you've done the hard > part of implementing a new format anyway... > > > Wouldn't it be easier to just tell unpickle which code it's allowed to > run (by passing a list of modules and classes)? unpickle can already do that, via Unpickler.find_class. There's an example in the docs. Eric. From mistersheik at gmail.com Thu Jul 23 16:28:38 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Thu, 23 Jul 2015 10:28:38 -0400 Subject: [Python-ideas] Fwd: Re: Secure unpickle In-Reply-To: <55B0F978.9060601@trueblade.com> References: <53B91A7F-32F3-4267-8657-92CCD91042B4@gmail.com> <2731AF91-DC93-4798-AF03-D990B4C06367@yahoo.com> <9D5C749C-AE91-4CF5-A111-BFFA046E2B6A@gmail.com> <8C8499DB-BC90-497A-9BDC-7135CE74ACA9@yahoo.com> <55B0F978.9060601@trueblade.com> Message-ID: Right, I forgot that that was mentioned in this thread. Then, I don't see the problem with unpickle. Is it still not secure enough for matplotlib e.g.? On Thu, Jul 23, 2015 at 10:26 AM, Eric V. Smith wrote: > On 07/23/2015 09:54 AM, Neil Girdhar wrote: > > > > > > On Wed, Jul 22, 2015 at 9:46 PM, Nathaniel Smith > > wrote: > > > > On Wed, Jul 22, 2015 at 5:27 PM, Neil Girdhar > > wrote: > > > > > > That is so unfortunate. Pickle is such a good solution except for > the > > > security. Why can't we have security too? It doesn't seem to me > to be > > > right for a project like matplotlib to be writing their own > serialization > > > library. It would be awesome if Python had secure serialization > built-in. > > > > The reason you can pickle/unpickle arbitrary Python objects is that > > the pickle format is basically a structured, optimized way of > > generating and then evaluating arbitrary Python code. Which is great > > because it's totally general -- that's why we love pickle, you can > > pickle anything -- but that exact feature is what makes it insecure. > > If you want to make something secure, that means making some explicit > > decisions about what kinds of things can be put into your data format > > and which cannot, and write some explicit code to handle each of > these > > things instead of just handing the file format direct access to your > > interpreter. But by the time you've done that you've done the hard > > part of implementing a new format anyway... > > > > > > Wouldn't it be easier to just tell unpickle which code it's allowed to > > run (by passing a list of modules and classes)? > > unpickle can already do that, via Unpickler.find_class. There's an > example in the docs. > > Eric. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/OhYb7RHNHyA/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Thu Jul 23 16:44:14 2015 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 23 Jul 2015 15:44:14 +0100 Subject: [Python-ideas] Briefer string format In-Reply-To: <20150723142213.GK25179@ando.pearwood.info> References: <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AFE66E.9070007@trueblade.com> <20150723033112.GJ25179@ando.pearwood.info> <20150723142213.GK25179@ando.pearwood.info> Message-ID: <55B0FDBE.4010909@mrabarnett.plus.com> On 2015-07-23 15:22, Steven D'Aprano wrote: > On Wed, Jul 22, 2015 at 09:28:19PM -0700, Bruce Leban wrote: >> On Wed, Jul 22, 2015 at 8:31 PM, Steven D'Aprano >> wrote: >> >> > >> > Constant-folding 'a' + 'b' to 'ab' is an optimization, it doesn't change >> > the semantics of the concat. But constant-folding f'{a}' + '{b}' would >> > change the semantics of the concatenation, because f strings aren't >> > constants, they only look like them. >> > >> >> It doesn't have to change semantics and it shouldn't. This is a strawman >> argument. > > If I had a dollar for everytime somebody on the Internet misused > "strawman argument", I would be a rich man. Just because you disagree > with me or think I'm wrong doesn't make my argument a strawman. It just > makes me wrong-headed, or wrong :-) > > I'm having trouble understand what precisely you are disagreeing with. > The example I give which you quote involves explicit concatenation with > the + operator, but your examples below use implicit concatenation with > no operator at all. > > Putting aside the question of implementation, I think: > > (1) Explicit concatenation with the + operator should be treated as > occuring after the f strings are evaluated, *as if* the following > occurs: > > f'{spam}' + '{eggs}' > => compiles to format(spam) + '{eggs}' > > If you can come up with a clever optimization that avoids the need to > *actually* build two temporary strings and then concatenate them, I > don't have a problem with that. I'm only talking about the semantics. I > don't want this: > > f'{spam}' + '{eggs}' > => compiles to format(spam) + format(eggs) # not this! > > Do you agree with those semantics for explicit + concatenation? If not, > what behaviour do you want? > > > (2) Implicit concatenation should occur as early as possible, before > the format. Take the easy case first: both fragments are f-strings. > > f'{spam}' f'{eggs}' > => behaves as if you wrote f'{spam}{eggs}' > => which compiles to format(spam) + format(eggs) > > Do you agree with those semantics for implicit concatenation? > To me, implicit concatenation is just concatenation that binds more tightly, so: 'a' 'b' is: ('a' + 'b') It can be optimised to 'ab' at compile-time. > > (3) The hard case, when you mix f and non-f strings. > > f'{spam}' '{eggs}' > > Notwithstanding raw strings, the behaviour which makes sense to me is > that the implicit string concatenation occurs first, followed by format. > So, semantically, if the parser sees the above, it should concat the > string: > > => f'{spam}{eggs}' > > then transform it to a call to format: > > => format(spam) + format(eggs) > To me: f'{spam}' '{eggs}' is: (f'{spam}' + '{eggs}') just as: r'\a' '\\' is: (r'\a' + '\\') [snip] From ron3200 at gmail.com Thu Jul 23 19:52:30 2015 From: ron3200 at gmail.com (Ron Adam) Date: Thu, 23 Jul 2015 13:52:30 -0400 Subject: [Python-ideas] Briefer string format In-Reply-To: <20150723142213.GK25179@ando.pearwood.info> References: <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AFE66E.9070007@trueblade.com> <20150723033112.GJ25179@ando.pearwood.info> <20150723142213.GK25179@ando.pearwood.info> Message-ID: On 07/23/2015 10:22 AM, Steven D'Aprano wrote: > > (3) The hard case, when you mix f and non-f strings. > > f'{spam}' '{eggs}' > > Notwithstanding raw strings, the behaviour which makes sense to me is > that the implicit string concatenation occurs first, followed by format. > So, semantically, if the parser sees the above, it should concat the > string: > > => f'{spam}{eggs}' > > then transform it to a call to format: > > => format(spam) + format(eggs) I think this should be... => f'{spam}{{eggs}}' The advantage that has is you could call it's format method manually again to set the eggs name in a different context. It would also work as expected in the case the second stirng is a f-string. '{spam}' f'{eggs}' f'{{spam}}{eggs}' So if any part of an implicitly concatenated string is an f-string, then the whole becomes an f-string, and the parts that were not have their braces escaped. The part that bothers me is it seems like the "f" should be a unary operator rather than a string prefix. As a prefix: s = f'{spam}{{eggs}}' # spam s2 = s.format(eggs=eggs) # eggs As an unary operator: s = ? '{spam}{{eggs}}' # spam s2 = ? s # eggs (? == some to be determined symbol) They are just normal strings in the second case. Cheers, Ron From rosuav at gmail.com Thu Jul 23 19:57:58 2015 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 24 Jul 2015 03:57:58 +1000 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AFE66E.9070007@trueblade.com> <20150723033112.GJ25179@ando.pearwood.info> <20150723142213.GK25179@ando.pearwood.info> Message-ID: On Fri, Jul 24, 2015 at 3:52 AM, Ron Adam wrote: > The part that bothers me is it seems like the "f" should be a unary operator > rather than a string prefix. > > As a prefix: > > s = f'{spam}{{eggs}}' # spam > s2 = s.format(eggs=eggs) # eggs > > > As an unary operator: > > s = ? '{spam}{{eggs}}' # spam > s2 = ? s # eggs > > (? == some to be determined symbol) > > They are just normal strings in the second case. Except that they can't be normal strings, because the compiler has to parse them. They're expressions. You can't take input from a user and f-string it (short of using exec/eval, of course); it has to be there in the source code. ChrisA From eric at trueblade.com Thu Jul 23 20:15:55 2015 From: eric at trueblade.com (Eric V. Smith) Date: Thu, 23 Jul 2015 14:15:55 -0400 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AFE66E.9070007@trueblade.com> <20150723033112.GJ25179@ando.pearwood.info> <20150723142213.GK25179@ando.pearwood.info> Message-ID: <55B12F5B.1080300@trueblade.com> On 7/23/2015 1:57 PM, Chris Angelico wrote: > On Fri, Jul 24, 2015 at 3:52 AM, Ron Adam wrote: >> The part that bothers me is it seems like the "f" should be a unary operator >> rather than a string prefix. >> >> As a prefix: >> >> s = f'{spam}{{eggs}}' # spam >> s2 = s.format(eggs=eggs) # eggs >> >> >> As an unary operator: >> >> s = ? '{spam}{{eggs}}' # spam >> s2 = ? s # eggs >> >> (? == some to be determined symbol) >> >> They are just normal strings in the second case. > > Except that they can't be normal strings, because the compiler has to > parse them. They're expressions. You can't take input from a user and > f-string it (short of using exec/eval, of course); it has to be there > in the source code. Right. This is the "unevaluated f-string" of which I spoke in https://mail.python.org/pipermail/python-ideas/2015-July/034728.html It would be a huge code injection opportunity, and I agree it's best we don't implement it. Eric. From abedillon at gmail.com Thu Jul 23 20:16:00 2015 From: abedillon at gmail.com (Abe Dillon) Date: Thu, 23 Jul 2015 13:16:00 -0500 Subject: [Python-ideas] A different format for PI? Message-ID: Is there a forum or something similar related to python-ideas? If there isn't, I think there should be. The mailing list format is restrictive. There's no good way to search past discussions and the digests I get are disorganized and difficult to follow. I'd like to contribute, but I don't know if my ideas are topics that have already been discussed in depth or if they're actually new. -Abe Dillon -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Thu Jul 23 20:26:21 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Thu, 23 Jul 2015 13:26:21 -0500 Subject: [Python-ideas] A different format for PI? In-Reply-To: References: Message-ID: Python-ideas archives: https://mail.python.org/pipermail/python-ideas/ Google Groups forum: https://groups.google.com/forum/m/#!forum/python-ideas On July 23, 2015 1:16:00 PM CDT, Abe Dillon wrote: >Is there a forum or something similar related to python-ideas? If there >isn't, I think there should be. > >The mailing list format is restrictive. There's no good way to search >past >discussions and the digests I get are disorganized and difficult to >follow. >I'd like to contribute, but I don't know if my ideas are topics that >have >already been discussed in depth or if they're actually new. > >-Abe Dillon > > >------------------------------------------------------------------------ > >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >https://mail.python.org/mailman/listinfo/python-ideas >Code of Conduct: http://python.org/psf/codeofconduct/ -- Sent from my Nexus 5 with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ron3200 at gmail.com Thu Jul 23 20:32:28 2015 From: ron3200 at gmail.com (Ron Adam) Date: Thu, 23 Jul 2015 14:32:28 -0400 Subject: [Python-ideas] Briefer string format In-Reply-To: <55B12F5B.1080300@trueblade.com> References: <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AFE66E.9070007@trueblade.com> <20150723033112.GJ25179@ando.pearwood.info> <20150723142213.GK25179@ando.pearwood.info> <55B12F5B.1080300@trueblade.com> Message-ID: On 07/23/2015 02:15 PM, Eric V. Smith wrote: > On 7/23/2015 1:57 PM, Chris Angelico wrote: >> >On Fri, Jul 24, 2015 at 3:52 AM, Ron Adam wrote: >>> >>The part that bothers me is it seems like the "f" should be a unary operator >>> >>rather than a string prefix. >>> >> >>> >>As a prefix: >>> >> >>> >> s = f'{spam}{{eggs}}' # spam >>> >> s2 = s.format(eggs=eggs) # eggs >>> >> >>> >> >>> >>As an unary operator: >>> >> >>> >> s = ? '{spam}{{eggs}}' # spam >>> >> s2 = ? s # eggs >>> >> >>> >>(? == some to be determined symbol) >>> >> >>> >>They are just normal strings in the second case. >> > >> >Except that they can't be normal strings, because the compiler has to >> >parse them. They're expressions. You can't take input from a user and >> >f-string it (short of using exec/eval, of course); it has to be there >> >in the source code. > Right. This is the "unevaluated f-string" of which I spoke in > https://mail.python.org/pipermail/python-ideas/2015-July/034728.html > > It would be a huge code injection opportunity, and I agree it's best we > don't implement it. I see, it would be like using eval on user strings, except it will be less noticeable. Cheers, Ron From steve at pearwood.info Thu Jul 23 20:51:44 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 24 Jul 2015 04:51:44 +1000 Subject: [Python-ideas] A different format for PI? In-Reply-To: References: Message-ID: <20150723185143.GM25179@ando.pearwood.info> On Thu, Jul 23, 2015 at 01:16:00PM -0500, Abe Dillon wrote: > Is there a forum or something similar related to python-ideas? If there > isn't, I think there should be. Do you mean a web forum? This is an email forum. No, there's no web forum yet, but at some time in the not-too-distant future, the Hyperkitty web interface to Mailman3 may be deployed. For now, it's email only. > The mailing list format is restrictive. In what way? > There's no good way to search past discussions https://duckduckgo.com/html/?q=site:mail.python.org+tail+recursion https://startpage.com/do/search?q=site:mail.python.org+matrix+multiplication https://www.google.com.au/search?q=site:mail.python.org+compose+function plus Bing, Yahoo, etc. > and the digests I get are disorganized and difficult to follow. Then don't use digests. P.S. since you are using digests, if you reply, please: (1) change the subject line to something meaningful; (2) trim the quoted text. We don't need or want the *entire* digest of potentially dozens of irrelevant emails included in your reply. Thank you. -- Steve From greg.ewing at canterbury.ac.nz Fri Jul 24 01:40:49 2015 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 24 Jul 2015 11:40:49 +1200 Subject: [Python-ideas] Briefer string format In-Reply-To: <20150723142213.GK25179@ando.pearwood.info> References: <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AFE66E.9070007@trueblade.com> <20150723033112.GJ25179@ando.pearwood.info> <20150723142213.GK25179@ando.pearwood.info> Message-ID: <55B17B81.3070000@canterbury.ac.nz> Steven D'Aprano wrote: > I don't think I want this behaviour: > > f'{spam}' '{eggs}' > => format(spam) + '{eggs}' What do you think should happen in this case: '{spam}' f'{eggs}' It would seem very strange to me if the f infected strings *before* it as well as after it. -- Greg From wes.turner at gmail.com Fri Jul 24 03:18:17 2015 From: wes.turner at gmail.com (Wes Turner) Date: Thu, 23 Jul 2015 20:18:17 -0500 Subject: [Python-ideas] A different format for PI? In-Reply-To: <20150723185143.GM25179@ando.pearwood.info> References: <20150723185143.GM25179@ando.pearwood.info> Message-ID: On Jul 23, 2015 1:52 PM, "Steven D'Aprano" wrote: > > On Thu, Jul 23, 2015 at 01:16:00PM -0500, Abe Dillon wrote: > > Is there a forum or something similar related to python-ideas? If there > > isn't, I think there should be. > > Do you mean a web forum? This is an email forum. > > No, there's no web forum yet, but at some time in the not-too-distant > future, the Hyperkitty web interface to Mailman3 may be deployed. For > now, it's email only. How do I link this to "[Python-Dev] Devguide - Add Communications Quick Start Section"? ... https://www.google.com/search?q=%22%5BPython-Dev%5D+Devguide+-+Add+Communications+Quick+Start+Section%22 https://bugs.python.org/issue24682 > > > > The mailing list format is restrictive. > > In what way? > > > > There's no good way to search past discussions > > https://duckduckgo.com/html/?q=site:mail.python.org+tail+recursion > > https://startpage.com/do/search?q=site:mail.python.org+matrix+multiplication > > https://www.google.com.au/search?q=site:mail.python.org+compose+function > > plus Bing, Yahoo, etc. > > > > and the digests I get are disorganized and difficult to follow. > > Then don't use digests. > > P.S. since you are using digests, if you reply, please: > > (1) change the subject line to something meaningful; > > (2) trim the quoted text. We don't need or want the *entire* digest of > potentially dozens of irrelevant emails included in your reply. In terms of q&a things to link to: Reddit, stack exchange, askbot/osqa, google groups tags/labels/types http://schema.org/Code http://schema.org/Question (Issue, tests, patch, docs, build, review) http://schema.org/Code > > Thank you. > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Fri Jul 24 03:39:14 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 24 Jul 2015 11:39:14 +1000 Subject: [Python-ideas] A different format for PI? In-Reply-To: References: <20150723185143.GM25179@ando.pearwood.info> Message-ID: <20150724013913.GN25179@ando.pearwood.info> On Thu, Jul 23, 2015 at 08:18:17PM -0500, Wes Turner wrote: > On Jul 23, 2015 1:52 PM, "Steven D'Aprano" wrote: > > > > On Thu, Jul 23, 2015 at 01:16:00PM -0500, Abe Dillon wrote: > > > Is there a forum or something similar related to python-ideas? If there > > > isn't, I think there should be. > > > > Do you mean a web forum? This is an email forum. > > > > No, there's no web forum yet, but at some time in the not-too-distant > > future, the Hyperkitty web interface to Mailman3 may be deployed. For > > now, it's email only. > > How do I link this to "[Python-Dev] Devguide - Add Communications Quick > Start Section"? I don't understand your question. Are you asking for a link to a web interface that doesn't exist yet? Come back in a year or three when Hyperkitty is actually deployed, and we can give a url to it. In the meantime, the old pipermail interface still works, although unfortunately the urls aren't guaranteed to be stable, they can occasionally change and break hyperlinks. Sad but true. https://mail.python.org/pipermail/python-dev/2015-July/140856.html Gmane mirrors the mailing list: http://article.gmane.org/gmane.comp.python.devel/154038 As does ActiveState: http://code.activestate.com/lists/python-dev/137242/ There's also a Google Groups mirror, although navigating it to individual messages and extracting stable urls is painful): https://groups.google.com/d/forum/dev-python With the possible exception of the Google Groups mirror, none of these are web forums -- you can't post from the web archive. In the future, Hyperkitty will change that. -- Steve From wes.turner at gmail.com Fri Jul 24 03:57:52 2015 From: wes.turner at gmail.com (Wes Turner) Date: Thu, 23 Jul 2015 20:57:52 -0500 Subject: [Python-ideas] A different format for PI? In-Reply-To: <20150724013913.GN25179@ando.pearwood.info> References: <20150723185143.GM25179@ando.pearwood.info> <20150724013913.GN25179@ando.pearwood.info> Message-ID: On Jul 23, 2015 8:39 PM, "Steven D'Aprano" wrote: > > On Thu, Jul 23, 2015 at 08:18:17PM -0500, Wes Turner wrote: > > On Jul 23, 2015 1:52 PM, "Steven D'Aprano" wrote: > > > > > > On Thu, Jul 23, 2015 at 01:16:00PM -0500, Abe Dillon wrote: > > > > Is there a forum or something similar related to python-ideas? If there > > > > isn't, I think there should be. > > > > > > Do you mean a web forum? This is an email forum. > > > > > > No, there's no web forum yet, but at some time in the not-too-distant > > > future, the Hyperkitty web interface to Mailman3 may be deployed. For > > > now, it's email only. > > > > How do I link this to "[Python-Dev] Devguide - Add Communications Quick > > Start Section"? > > I don't understand your question. Are you asking for a link to a web > interface that doesn't exist yet? Come back in a year or three when > Hyperkitty is actually deployed, and we can give a url to it. > > In the meantime, the old pipermail interface still works, although > unfortunately the urls aren't guaranteed to be stable, they can > occasionally change and break hyperlinks. Sad but true. > > https://mail.python.org/pipermail/python-dev/2015-July/140856.html > > Gmane mirrors the mailing list: > > http://article.gmane.org/gmane.comp.python.devel/154038 > > As does ActiveState: > > http://code.activestate.com/lists/python-dev/137242/ > > > There's also a Google Groups mirror, although navigating it to > individual messages and extracting stable urls is painful): > > https://groups.google.com/d/forum/dev-python > > > With the possible exception of the Google Groups mirror, none of these > are web forums -- you can't post from the web archive. In the future, > Hyperkitty will change that. Thanks! > > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bruce at leban.us Fri Jul 24 03:57:25 2015 From: bruce at leban.us (Bruce Leban) Date: Thu, 23 Jul 2015 18:57:25 -0700 Subject: [Python-ideas] Briefer string format In-Reply-To: <20150723142213.GK25179@ando.pearwood.info> References: <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AFE66E.9070007@trueblade.com> <20150723033112.GJ25179@ando.pearwood.info> <20150723142213.GK25179@ando.pearwood.info> Message-ID: On Thu, Jul 23, 2015 at 7:22 AM, Steven D'Aprano wrote: > > If I had a dollar for everytime somebody on the Internet misused > "strawman argument", I would be a rich man. You wouldn't get a dollar here. If you want to be strict, a strawman argument is misrepresenting an opponent's viewpoint to make it easier to refute but it also applies to similar arguments. You stated that "constant folding ... *would* change the semantics" *[emphasis added]*. It's not a fact that constant folding must change the semantics as is easily shown. And in fact, by definition, constant folding should never change semantics. So the straw here is imagining that the implementer of this feature would ignore the accepted rules regarding constant folding and then criticizing the implementer for doing that. > (1) Explicit concatenation with the + operator should be treated as > occuring after the f strings are evaluated, *as if* the following > occurs: > > f'{spam}' + '{eggs}' > => compiles to format(spam) + '{eggs}' > > > > Do you agree with those semantics for explicit + concatenation? If not, > what behaviour do you want? I agree with that. > > > (2) Implicit concatenation should occur as early as possible, before > the format. Take the easy case first: both fragments are f-strings. > > f'{spam}' f'{eggs}' > => behaves as if you wrote f'{spam}{eggs}' > => which compiles to format(spam) + format(eggs) > > Do you agree with those semantics for implicit concatenation? > Yes (3) The hard case, when you mix f and non-f strings. > > f'{spam}' '{eggs}' > > Notwithstanding raw strings, the behaviour which makes sense to me is > that the implicit string concatenation occurs first, followed by format. > You talk about which happens "first" so let's recast this as an operator precedence question. Think of f as a unary operator. Does f bind tighter than implicit concatenation? Well, all other string operators like this bind more tightly than concatenation. f'{spam}' '{eggs}' > Secondly, it feels that this does the concatenation in the wrong order. > Implicit concatenation occurs as early as possible in every other case. > But here, we're delaying the concatenation until after the format. So > this feels wrong to me. > Implicit concatenation does NOT happen as early as possible in every case. When I write: r'a\n' 'b\n' ==> 'a\\nb\n' the r is applied to the first string *before* the concatenation with the second string. > If there's no consensus on the behaviour of mixed f and non-f strings > with implicit concatenation, rather than pick one and frustrate and > surprise half the users, we should make it an error: > > f'{spam}' '{eggs}' > => raises SyntaxError > We can't just say that when the concatenation actually occurs is an > optimization, as we can with raw and cooked string literals, because the > f string is not a literal, it's actually a function call in disguise. So > we have to pick one or the other (or refuse to guess and raise a syntax > error). > Imagine that we have another prefix that escapes strings for regex. That is e'a+b' ==> 'a\\+b'. This is another function call in disguise, just calling re.escape. Applying your reasoning could have us conclude that e is just like f and should infect all the other strings it is concatenated with. But that would actually break the reason to have this in the first place, writing strings like this: '(' e'1+2' '|' e'1*2' '){1,2}' Perhaps you're thinking that e should be done at compile time. Well, when I combine it with f, it clearly must be done at run-time: '(' ef'{foo}' '|' ef'{bar}' '){1,2}' I'm not actually proposing an e prefix. I'm just speculating how it would work if we had one. And combining e and f must mean do f then e because the other order is useless, just as combining f and r must mean do r then f. Maybe you can't say that concatenation is an optimization but I can (new text underlined): Multiple adjacent string or bytes literals (delimited by whitespace), possibly using different quoting conventions, are allowed, and their meaning is the same as their concatenation. ... Thus, "hello" 'world' is equivalent to "helloworld". This feature can be used to reduce the number of backslashes needed, to split long strings conveniently across long lines, *to mix formatted and unformatted strings,* or even to add comments to parts of strings, for example: re.compile("[A-Za-z_]" # letter or underscore "[A-Za-z0-9_]*" # letter, digit or underscore ) Note that this feature is defined at the syntactical level, but implemented at compile time *as an optimization*. The ?+? operator must be used to concatenate string expressions at run time. Also note that literal concatenation can use different quoting styles for each component (even mixing raw strings and triple quoted strings). *If formatted strings are mixed with unformatted strings, they are concatenated at compile time and the unformatted parts are escaped so they will not be subject to format substitutions.* --- Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Fri Jul 24 04:08:50 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 23 Jul 2015 22:08:50 -0400 Subject: [Python-ideas] Briefer string format In-Reply-To: <20150723142213.GK25179@ando.pearwood.info> References: <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AFE66E.9070007@trueblade.com> <20150723033112.GJ25179@ando.pearwood.info> <20150723142213.GK25179@ando.pearwood.info> Message-ID: On Thu, Jul 23, 2015 at 10:22 AM, Steven D'Aprano wrote: > I don't think I want this behaviour: > > f'{spam}' '{eggs}' > => format(spam) + '{eggs}' > > for two reasons. Firstly, I already have (at least!) one way of getting > that behaviour, such as explicit + concatenation as above. > > Secondly, it feels that this does the concatenation in the wrong order. > Implicit concatenation occurs as early as possible in every other case. > But here, we're delaying the concatenation until after the format. So > this feels wrong to me. > > (Again, I'm talking semantics, not implementation. Clever tricks with > escaping the brackets don't matter.) > I don't know what you would call "Clever tricks with escaping", but I would expect f'{spam}' '{eggs}' => '{spam}{{eggs}}'.format(**ChainMap(locals(), globals()) just as 'foo'r'bar' => 'foo\\bar' -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Fri Jul 24 04:12:19 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 23 Jul 2015 22:12:19 -0400 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AFE66E.9070007@trueblade.com> <20150723033112.GJ25179@ando.pearwood.info> <20150723142213.GK25179@ando.pearwood.info> Message-ID: Sorry, I meant to write 'foo'r'\bar' at the end of my previous message. On Thu, Jul 23, 2015 at 10:08 PM, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > > On Thu, Jul 23, 2015 at 10:22 AM, Steven D'Aprano > wrote: > >> I don't think I want this behaviour: >> >> f'{spam}' '{eggs}' >> => format(spam) + '{eggs}' >> >> for two reasons. Firstly, I already have (at least!) one way of getting >> that behaviour, such as explicit + concatenation as above. >> >> Secondly, it feels that this does the concatenation in the wrong order. >> Implicit concatenation occurs as early as possible in every other case. >> But here, we're delaying the concatenation until after the format. So >> this feels wrong to me. >> >> (Again, I'm talking semantics, not implementation. Clever tricks with >> escaping the brackets don't matter.) >> > > I don't know what you would call "Clever tricks with escaping", but I > would expect > > > f'{spam}' '{eggs}' > => '{spam}{{eggs}}'.format(**ChainMap(locals(), globals()) > > just as > > 'foo'r'bar' > => 'foo\\bar' > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Fri Jul 24 04:16:46 2015 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 24 Jul 2015 12:16:46 +1000 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AFE66E.9070007@trueblade.com> <20150723033112.GJ25179@ando.pearwood.info> <20150723142213.GK25179@ando.pearwood.info> Message-ID: On Fri, Jul 24, 2015 at 11:57 AM, Bruce Leban wrote: > > You talk about which happens "first" so let's recast this as an operator > precedence question. Think of f as a unary operator. Does f bind tighter > than implicit concatenation? Well, all other string operators like this bind > more tightly than concatenation. > f'{spam}' '{eggs}' > Thing is, though, it isn't an operator, any more than list display is an operator. Operators take values and result in values. You can break out some part of an expression and it'll have the same result (apart from short-circuit evaluation). With f"...", it's a piece of special syntax, not something you apply to a string. You can't do this: fmt = "Hello, {place}!" place = "world" print(f fmt) If f were an operator, with precedence, then this would work. But it doesn't, for the same reason that this doesn't work: path = "C:\users\nobody" fixed_path = r path These are pieces of syntax, and syntax is at a level prior to all considerations of operator precedence. ChrisA From jakedrummond at verizon.net Fri Jul 24 04:49:26 2015 From: jakedrummond at verizon.net (jakedrummond at verizon.net) Date: Thu, 23 Jul 2015 21:49:26 -0500 (CDT) Subject: [Python-ideas] Add SMTPS support to logging.handlers.SMTPHandler Message-ID: <0NRZ00IB516EUNM0@vms173025.mailsrvcs.net> Example of modification: old: smtp = smtplib.SMTP(self.mailhost, port, timeout=self._timeout) new: if self.smtps: smtp = smtplib.SMTP_SSL(self.mailhost, port, *self.smtps, timeout=self._timeout) else: smtp = smtplib.SMTP(self.mailhost, port, timeout=self._timeout) old: if self.secure is not None: smtp.ehlo() smtp.starttls(*self.secure) smtp.ehlo() new: smtp.ehlo() if self.tls: smtp.starttls(*self.tls) smtp.ehlo() From srkunze at mail.de Fri Jul 24 16:22:48 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Fri, 24 Jul 2015 16:22:48 +0200 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: <55A283AA.4070207@jmunch.dk> References: <85zj33w0ts.fsf@benfinney.id.au> <20150711150756.GE27268@ando.pearwood.info> <20150711161819.GG27268@ando.pearwood.info> <877fq67boo.fsf@vostro.rath.org> <55A283AA.4070207@jmunch.dk> Message-ID: <55B24A38.3070303@mail.de> My stance on this: that's the responsibility of the Python interpreter not the one of the Python programmer. Reminds me of the string concatenation issue of older Python versions: - https://wiki.python.org/moin/PythonSpeed/PerformanceTips#String_Concatenation - https://wiki.python.org/moin/PythonSpeed/PerformanceTips#Avoiding_dots... On 12.07.2015 17:11, Anders J. Munch wrote: > Terry Reedy wrote: > > Localization, which is not limited to self attributes, does two > > things: it removes the need to continually type 'object.x'; it makes > > the remaining code run faster. I would be interested to know how > > many faster local references are needed to make up for the overhead > > of localization. > > If you access the attribute just twice, that's enough for localisation > to yield > a speedup. Local variable access is just that much faster than > attribute lookup. > > $ python3 -V > Python 3.4.3Terry Reedy wrote: > > Localization, which is not limited to self attributes, does two > things: it > > removes the need to continually type 'object.x'; it makes the > remaining code > > run faster. I would be interested to know how many faster local > references > > are needed to make up for the overhead of localization. > > If you access the attribute just twice, that's enough for localisation > to yield > a speedup. Local variable access is just that much faster than > attribute lookup. > > $ python3 -V > Python 3.4.3 > $ python3 -m timeit -s "class C: pass" -s "c = C();c.x=1" "c.x;c.x" > 10000000 loops, best of 3: 0.106 usec per loop > $ python3 -m timeit -s "class C: pass" -s "c = C();c.x=1" "x=c.x;x;x" > 10000000 loops, best of 3: 0.0702 usec per loop > > $ python -V > Python 2.7.3 > $ python -m timeit -s "class C(object): pass" -s "c = C();c.x=1" > "c.x;c.x" > 10000000 loops, best of 3: 0.105 usec per loop > $ python -m timeit -s "class C(object): pass" -s "c = C();c.x=1" > "x=c.x;x;x" > 10000000 loops, best of 3: 0.082 usec per loop > > regards, Anders > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From ronaldoussoren at mac.com Fri Jul 24 16:47:45 2015 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Fri, 24 Jul 2015 16:47:45 +0200 Subject: [Python-ideas] Mitigating 'self.' Method Pollution In-Reply-To: <55B24A38.3070303@mail.de> References: <85zj33w0ts.fsf@benfinney.id.au> <20150711150756.GE27268@ando.pearwood.info> <20150711161819.GG27268@ando.pearwood.info> <877fq67boo.fsf@vostro.rath.org> <55A283AA.4070207@jmunch.dk> <55B24A38.3070303@mail.de> Message-ID: <8F7534B6-5EFF-4757-A01A-58D2B6547567@mac.com> > On 24 Jul 2015, at 16:22, Sven R. Kunze wrote: > > My stance on this: that's the responsibility of the Python interpreter not the one of the Python programmer. It?s just one of the tricks you can micro-optimise code when needed. The important bit is *when needed*, this trick (and simular ones) should IMHO only be used when benchmarking shows that you have a problem that is solved by it and not proactively because these tricks tend to make code less readable. Ronald From srkunze at mail.de Fri Jul 24 19:02:49 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Fri, 24 Jul 2015 19:02:49 +0200 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AFE66E.9070007@trueblade.com> <20150723033112.GJ25179@ando.pearwood.info> <20150723142213.GK25179@ando.pearwood.info> Message-ID: <55B26FB9.1010601@mail.de> On 24.07.2015 04:16, Chris Angelico wrote: > > syntax, not something you apply to a string. You can't do this: > > fmt = "Hello, {place}!" > place = "world" > print(f fmt) > > If f were an operator, with precedence, then this would work. But it > doesn't, for the same reason that this doesn't work: > > path = "C:\users\nobody" > fixed_path = r path > > These are pieces of syntax, and syntax is at a level prior to all > considerations of operator precedence. You might be true about this. I think he just used operators as some sort of analogy to figure out which comes first: concat or format. My semantic opinion on this: first format, then concat. Why? Because '...' is a atomic thing and shouldn't be modified by its peer elements (i.e. strings). About implementation: the idea of first concat with **implicit** escaping braces illustrated another minor use case for me: no need to escape braces. f'Let {var} = ''{x | x > 3}' This way, the f syntax would really help readability when it comes to situations where many braces are used. From ben+python at benfinney.id.au Fri Jul 24 22:27:34 2015 From: ben+python at benfinney.id.au (Ben Finney) Date: Sat, 25 Jul 2015 06:27:34 +1000 Subject: [Python-ideas] Briefer string format References: <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AFE66E.9070007@trueblade.com> <20150723033112.GJ25179@ando.pearwood.info> <20150723142213.GK25179@ando.pearwood.info> <55B17B81.3070000@canterbury.ac.nz> Message-ID: <85380dnvwp.fsf@benfinney.id.au> Greg Ewing writes: > Steven D'Aprano wrote: > > > I don't think I want this behaviour: > > > > f'{spam}' '{eggs}' > > => format(spam) + '{eggs}' > > What do you think should happen in this case: > > '{spam}' f'{eggs}' > > It would seem very strange to me if the f infected strings *before* it > as well as after it. The existing behaviour of implicit concatenation doesn't give much of a guide here, unfortunately:: >>> 'foo\abar' r'lorem\tipsum' 'wibble\bwobble' 'foo\x07barlorem\\tipsumwibble\x08wobble' >>> type(b'abc' 'def' b'ghi') File "", line 1 SyntaxError: cannot mix bytes and nonbytes literals So, the ?b? prefix expects to apply to all the implicitly-concatenated parts (and fails if they're not all bytes strings); the ?r? prefix expects to apply only to the one fragment, leaving others alone. Is the proposed ?f? prefix, on a fragment in implicit concatenation, meant to have behaviour analogous to the ?r? prefix or the ?b? prefix, or something else? What's the argument in favour of that choice? -- \ ?If we ruin the Earth, there is no place else to go. This is | `\ not a disposable world, and we are not yet able to re-engineer | _o__) other planets.? ?Carl Sagan, _Cosmos_, 1980 | Ben Finney From srkunze at mail.de Fri Jul 24 23:41:15 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Fri, 24 Jul 2015 23:41:15 +0200 Subject: [Python-ideas] Concurrency Modules In-Reply-To: <9c139305-f583-46c1-b819-6a98dbd04acc@googlegroups.com> References: <559EFB73.5050606@mail.de> <9c139305-f583-46c1-b819-6a98dbd04acc@googlegroups.com> Message-ID: <55B2B0FB.1060409@mail.de> Hi. I am back. First of all thanks for your eager participation. I would like to catch on on Steve's and Mark's examples as they seem to be very good illustrations of what issue I still have. Steve explained why asyncio is great and Mark explained why threading+multiprocessing is great. Each from his own perspective and focusing on the internal implementation details. To me, all approaches can now be fit into this sort of table. Please, correct me if it's wrong (that is very important): # | code lives in | managed by --+---------------+------------- 1 | processes | os scheduler 2 | threads | os scheduler 3 | tasks | event loop But the original question still stands: Which one to use? Ignoring little details like 'shared state', 'custom prioritization', etc., they all look the same to me and to what it all comes down are these little nasty details people try to explain so eagerly. Not saying that is a bad thing but it has some implications on production code I do not like and in the following I am going to explain that. Say, we have decided for approach N because of some requirements (examples from here and there, guidelines given by smart people, customer needs etc.) and wrote hundred thousand lines of code. What if these requirements change 6 years in the future? What if the maintainer of approach N decided to change it in such a way that is not compatible with our requirements anymore? From what I can see there is no easy way 'back' to use another approach. They all have different APIs, basically for: 'executing a function and returning its precious result (the cake)'. asyncio gives us the flexibility to choose a prioritization mechanism. Nice to have, because we are now independent on the os scheduler. But do we really ever need that? What is wrong with the os scheduler? Would that not mean that Mark better switches to asyncio? We don't know if we ever would need that in project A and project B. What now? Use asyncio just in case? Preemptively? @Steve Thanks for that great explanation of how asyncio works and its relationship to threads/processes. But I still have a question: why can't we use threads for the cakes? (1 cake = 1 thread). Not saying that asyncio would be a bad idea to use here, but couldn't we accomplish the same functionality by using threads? I think, after we've settled the above questions, we should change the focus from How do they work internally and what are the tiny differences? (answered greatly by Mark) to When do I use which one? The latter question actually is what counts for production code. It actually is quite interesting to know and to ponder over all the differences, dependencies, corner cases etc. However, when it actually comes down to 'executing a piece of code and returning its result', you end up deciding which approach to choose. You won't implement all 3 different ways just because it is great to see all the nasty little details to click in. On Thursday, July 9, 2015 at 11:54:11 PM UTC+1, Sven R. Kunze wrote: > > In order to make a sound decision for the question: "Which one(s) do I > use?", at least the following items should be somehow defined clearly > for these modules: > > 1) relationship between the modules > 2) NON-overlapping usage scenarios > 3) future development intentions > 4) ease of usage of the modules => future syntax > 5) examples From sturla.molden at gmail.com Sat Jul 25 00:19:47 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Fri, 24 Jul 2015 22:19:47 +0000 (UTC) Subject: [Python-ideas] Concurrency Modules References: <55B2B0FB.1060409@mail.de> Message-ID: <570407725459468499.381183sturla.molden-gmail.com@news.gmane.org> "Sven R. Kunze" wrote: > To me, all approaches > can now be fit into this sort of table. Please, correct me if it's wrong > (that is very important): > > # | code lives in | managed by > --+---------------+------------- > 1 | processes | os scheduler > 2 | threads | os scheduler > 3 | tasks | event loop In CPython threads are actually managed by a combination of the OS scheduler and the interpreter (which controls the GIL). Processes on the other hand are only managed by the scheduler. Then there is the address space, which is shared for threads and tasks and private for processes. 1 | processes | os scheduler 2 | threads | os scheduler and python interpreter 3 | tasks | event loop > Say, we have decided for approach N because of some requirements > (examples from here and there, guidelines given by smart people, > customer needs etc.) and wrote hundred thousand lines of code. > What if these requirements change 6 years in the future? Then you are screwed, which is a PITA for all concurrency code, not just the one written in Python. Sturla From steve at pearwood.info Sat Jul 25 04:37:29 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 25 Jul 2015 12:37:29 +1000 Subject: [Python-ideas] Briefer string format In-Reply-To: <55B17B81.3070000@canterbury.ac.nz> References: <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AFE66E.9070007@trueblade.com> <20150723033112.GJ25179@ando.pearwood.info> <20150723142213.GK25179@ando.pearwood.info> <55B17B81.3070000@canterbury.ac.nz> Message-ID: <20150725023729.GQ25179@ando.pearwood.info> On Fri, Jul 24, 2015 at 11:40:49AM +1200, Greg Ewing wrote: > Steven D'Aprano wrote: > > >I don't think I want this behaviour: > > > > f'{spam}' '{eggs}' > > => format(spam) + '{eggs}' > > What do you think should happen in this case: > > '{spam}' f'{eggs}' As I stated before, I think that should at least raise a warning, if not a syntax error. I think we're in uncharted territory here, because f strings aren't really a literal string, they're actually a runtime function call, and we haven't got much in the way of prior-art for implicit concatenation of a literal with a function call. So we ought to be cautious when dealing with anything the least bit ambiguous, and avoid baking in a mistake that we can't easily change. There's no ambiguity with concat'ing f strings only, or non-f strings only, but the more we discuss this, the more inclined I am to say that implicit concatenation between f strings and non-f strings *in any order* should be a syntax error. > It would seem very strange to me if the f infected > strings *before* it as well as after it. It's consistent with Python 2: py> "abc" u"?????" "def" u'abc\xdf\u016e\u0195\u03a9\u0436def' The unicodeness of the middle term turns the entire concatenation into unicode. I think it is informative that Python 3 no longer allows this behaviour: py> b"abc" u"?????" b"def" File "", line 1 SyntaxError: cannot mix bytes and nonbytes literals -- Steve From steve at pearwood.info Sat Jul 25 05:51:50 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 25 Jul 2015 13:51:50 +1000 Subject: [Python-ideas] Briefer string format In-Reply-To: <55B26FB9.1010601@mail.de> References: <55AD368D.7020108@trueblade.com> <55AFE66E.9070007@trueblade.com> <20150723033112.GJ25179@ando.pearwood.info> <20150723142213.GK25179@ando.pearwood.info> <55B26FB9.1010601@mail.de> Message-ID: <20150725035150.GR25179@ando.pearwood.info> On Fri, Jul 24, 2015 at 07:02:49PM +0200, Sven R. Kunze wrote: > My semantic opinion on this: first format, then concat. Why? Because > '...' is a atomic thing and shouldn't be modified by its peer elements > (i.e. strings). Implicit concatenation is a lexical feature, not a runtime feature. Every other case of implicit concatenation in Python occurs at compile- time, and has since Python 1.5 if not earlier. This would be an exception, and would occur after the function call. That's surprising and inconsistent with all the other examples of implicit concatenation. 'aaa' 'bbb'.upper() returns 'AAABBB', not 'aaaBBB'. > About implementation: the idea of first concat with **implicit** > escaping braces illustrated another minor use case for me: no need to > escape braces. > > f'Let {var} = ''{x | x > 3}' Sorry, I find that really hard to parse without a space between the two fragments, so let me add a space: f'Let {var} = ' '{x | x > 3}' That's better :-) I completely understand the appeal of your point of view. But it feels wrong to me, I think that it mixes up syntactical features and runtime features inappropriately. If we write f'{spam}' that's syntactic sugar for a call to the format method: '{spam}'.format(***) where the *** stands in for some sort of ChainMap of locals, nonlocals, globals and built-ins, purely for brevity, I'm not implying that should be new syntax. Since *in all other cases* implicit concatenation occurs before runtime method or function calls: f'{spam}' '{eggs}' should be seen as: # option (1) '{spam}' '{eggs}' . format(***) not # option (2a) '{spam}' . format(***) + '{eggs}' I'm not implying that the *implementation* must involve an explicit concat after the format. It might, or it might optimize the format string by escaping the braces and concat'ing first: # option (2b) '{spam}{{eggs}}' . format(***) Apart from side-effects like time and memory, options (2a) and (2b) are equivalent, so I'll just call it "option (2)" and leave the implementation unspecified. I think that option (2) is surprising and inconsistent with all other examples of implicit concatenation in Python. I think that *predictability* is a powerful virtue in programming languages, special cases should be avoided if possible. Option (1) follows from two facts: - implicit concatenation occurs as early as possible (it is a lexical feature, so it can occur at compile-time, or as close to compile-time as possible); - f strings are syntactic sugar for a call to format() which must be delayed to run-time, as late as possible. These two facts alone allow the programmer to reason that f'{spam}' '{eggs}' must be analogous to the case of 'aaa' 'bbb'.upper() above. Option (2) requires at least one of the two special cases: - implicit concatenation occurs as early as possible, unless one of the strings is a f string, in which case it occurs... when exactly? - literal strings like '{eggs}' always stand for themselves, i.e. what you see is what you get, except when implicitly concatenated to f strings, where they are magically escaped. We already have at least two other ways to get the same result that option (2) gives: f'{spam}' + '{eggs}' # unambiguously format() first, concat second f'{spam}{{eggs}}' # unambiguously escaped braces Giving implicit concatenation a special case just for convenience sake would, in my opinion, make Python just a little more surprising for little real benefit. -- Steve From Steve.Dower at microsoft.com Sat Jul 25 07:28:50 2015 From: Steve.Dower at microsoft.com (Steve Dower) Date: Sat, 25 Jul 2015 05:28:50 +0000 Subject: [Python-ideas] Concurrency Modules In-Reply-To: <55B2B0FB.1060409@mail.de> References: <559EFB73.5050606@mail.de> <9c139305-f583-46c1-b819-6a98dbd04acc@googlegroups.com>, <55B2B0FB.1060409@mail.de> Message-ID: "But I still have a question: why can't we use threads for the cakes? (1 cake = 1 thread)." Because that is the wrong equality - it's really 1 baker = 1 thread. Bakers aren't free, you have to pay for each one (memory, stack space), it will take time for each one to learn how your bakery works (startup time), and you will waste some of your own time coordinating them (interthread communication). You also only have one set of baking equipment (the GIL), buying another bakery is expensive (another process) and fitting more equipment into the current one is very complicated (subinterpreters). So you either pay a high price for 2 bakers = 2 cakes, or you accept 2 bakers = 1.5 cakes (in the same amount of time). It turns out that often 1 baker can do 1.5 cakes in the same time as well, and it's much easier to reason about and implement correctly. Hope that makes sense and I'm not stretching things too far. Guess I should make this into a talk for PyCon next year. Cheers, Steve Top-posted from my Windows Phone ________________________________ From: Sven R. Kunze Sent: ?7/?24/?2015 14:41 To: Mark Summerfield; python-ideas at googlegroups.com; python-ideas at python.org; Steve Dower Subject: Re: [Python-ideas] Concurrency Modules Hi. I am back. First of all thanks for your eager participation. I would like to catch on on Steve's and Mark's examples as they seem to be very good illustrations of what issue I still have. Steve explained why asyncio is great and Mark explained why threading+multiprocessing is great. Each from his own perspective and focusing on the internal implementation details. To me, all approaches can now be fit into this sort of table. Please, correct me if it's wrong (that is very important): # | code lives in | managed by --+---------------+------------- 1 | processes | os scheduler 2 | threads | os scheduler 3 | tasks | event loop But the original question still stands: Which one to use? Ignoring little details like 'shared state', 'custom prioritization', etc., they all look the same to me and to what it all comes down are these little nasty details people try to explain so eagerly. Not saying that is a bad thing but it has some implications on production code I do not like and in the following I am going to explain that. Say, we have decided for approach N because of some requirements (examples from here and there, guidelines given by smart people, customer needs etc.) and wrote hundred thousand lines of code. What if these requirements change 6 years in the future? What if the maintainer of approach N decided to change it in such a way that is not compatible with our requirements anymore? From what I can see there is no easy way 'back' to use another approach. They all have different APIs, basically for: 'executing a function and returning its precious result (the cake)'. asyncio gives us the flexibility to choose a prioritization mechanism. Nice to have, because we are now independent on the os scheduler. But do we really ever need that? What is wrong with the os scheduler? Would that not mean that Mark better switches to asyncio? We don't know if we ever would need that in project A and project B. What now? Use asyncio just in case? Preemptively? @Steve Thanks for that great explanation of how asyncio works and its relationship to threads/processes. But I still have a question: why can't we use threads for the cakes? (1 cake = 1 thread). Not saying that asyncio would be a bad idea to use here, but couldn't we accomplish the same functionality by using threads? I think, after we've settled the above questions, we should change the focus from How do they work internally and what are the tiny differences? (answered greatly by Mark) to When do I use which one? The latter question actually is what counts for production code. It actually is quite interesting to know and to ponder over all the differences, dependencies, corner cases etc. However, when it actually comes down to 'executing a piece of code and returning its result', you end up deciding which approach to choose. You won't implement all 3 different ways just because it is great to see all the nasty little details to click in. On Thursday, July 9, 2015 at 11:54:11 PM UTC+1, Sven R. Kunze wrote: > > In order to make a sound decision for the question: "Which one(s) do I > use?", at least the following items should be somehow defined clearly > for these modules: > > 1) relationship between the modules > 2) NON-overlapping usage scenarios > 3) future development intentions > 4) ease of usage of the modules => future syntax > 5) examples -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sat Jul 25 07:32:31 2015 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 25 Jul 2015 15:32:31 +1000 Subject: [Python-ideas] Concurrency Modules In-Reply-To: References: <559EFB73.5050606@mail.de> <9c139305-f583-46c1-b819-6a98dbd04acc@googlegroups.com> <55B2B0FB.1060409@mail.de> Message-ID: On Sat, Jul 25, 2015 at 3:28 PM, Steve Dower wrote: > Hope that makes sense and I'm not stretching things too far. Guess I should > make this into a talk for PyCon next year. Yes. And serve cake. On a more serious note, I'd like to see some throughput tests for process-pool, thread-pool, and asyncio on a single thread. That'd make a great PyCon talk; make sure it's videoed, as I'd likely be linking to it a lot. ChrisA From steve at pearwood.info Sat Jul 25 08:05:58 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 25 Jul 2015 16:05:58 +1000 Subject: [Python-ideas] Briefer string format In-Reply-To: References: <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AFE66E.9070007@trueblade.com> <20150723033112.GJ25179@ando.pearwood.info> <20150723142213.GK25179@ando.pearwood.info> Message-ID: <20150725060558.GS25179@ando.pearwood.info> TL;DR: Please let's just ban implicit concatenation between f strings (a runtime function call) and non-f strings. The user should be explicit in what they want, using either explicitly escaped braces or the + operator. Anything else is going to be surprising. On Thu, Jul 23, 2015 at 06:57:25PM -0700, Bruce Leban wrote: > On Thu, Jul 23, 2015 at 7:22 AM, Steven D'Aprano > wrote: > > > > > If I had a dollar for everytime somebody on the Internet misused > > "strawman argument", I would be a rich man. > > > You wouldn't get a dollar here. If you want to be strict, a strawman > argument is misrepresenting an opponent's viewpoint to make it easier to > refute but it also applies to similar arguments. Are you saying that any good faith disagreement about people's position is a strawman? If not, I don't understand what you mean by "similar arguments". A strawman argument is explicitly a bad-faith argument. Describing my argument as a strawman implies bad faith on my part. I don't mind if you think my argument is wrong, mistaken or even incoherent, but it is not made in bad faith and you should imply that it is without good reason. Moving on to the feature: > You stated that "constant > folding ... *would* change the semantics" *[emphasis added]*. In context, I said that constant-folding the *explicit* + concatenation of f'{a}' + '{b}' to f'{a}{b}' would change the semantics. I'm sorry if it was not clear enough that I specifically meant that. I thought that the context was enough to show what I meant. By constant-folding, I mean when the parser/lexer/compiler/whatever (I really don't care which) folds expressions like the following: 'a' + 'b' to this: 'ab' If the parser/whatever does that to mixed f and non-f strings, I think that would be harmful, because it would change the semantics: f'{a}' + '{b}' executed at runtime with no constant-folding is not equivalent to the folded version: f'{a}{b}' Hence, the peephole optimizer should not do that. I hoped that wouldn't be controversial. [...] > So the straw here is imagining that the implementer of > this feature would ignore the accepted rules regarding constant folding and > then criticizing the implementer for doing that. I'm taken aback that you seem to think my pointing out the above is a criticism of an implementer who doesn't even exist yet! We're still discussing what the semantics of f strings should be, and I don't think anyone should be offended or threatened by me being explicit about what the behaviour should be. And for the record, it is not unheard of for constant-folding peephole optimizers to accidentally, or deliberately, change the sematics of code. For example, in D constant-folded 0.1 + 0.2 is not the same as 0.1 + 0.2 done at runtime (constant folding is done at single precision instead of double precision): http://stackoverflow.com/questions/6874357/why-0-1-0-2-0-3-in-d This paper discusses the many pitfalls of optimizing floating point code, and mentions that C may change the value of literal expressions depending on whether they are done at runtime or not: Another effect of this pragma is to change how much the compiler can evaluate at compile time regarding constant initialisations. [...] If it is set to OFF, the compiler can evaluate floating-point constants at compile time, whereas if they had been evaluated at runtime, they would have resulted in different values (because of different rounding modes) or floating-point exception. http://arxiv.org/pdf/cs/0701192.pdf Constant-folding *shouldn't* change the semantics of code, but programmers are only human. They make bad design decisions or write buggy code the same as all of us. > (3) The hard case, when you mix f and non-f strings. > > > > f'{spam}' '{eggs}' > > > > Notwithstanding raw strings, the behaviour which makes sense to me is > > that the implicit string concatenation occurs first, followed by format. > > > > You talk about which happens "first" so let's recast this as an operator > precedence question. Think of f as a unary operator. Does f bind tighter > than implicit concatenation? Well, all other string operators like this > bind more tightly than concatenation. > f'{spam}' '{eggs}' I don't think this is correct. Can you give an example? All the examples I can come up with show implicit concatenation binding more tightly (i.e. it occurs first), e.g.: py> 'a' 'ba'.replace('a', 'z') 'zbz' not 'abz'. And of course, you can't implicitly concat to a method call: py> 'a'.replace('a', 'z') 'ba' File "", line 1 'a'.replace('a', 'z') 'ba' ^ SyntaxError: invalid syntax So I think it would be completely unprecedented if the f pseudo-operator bound more tightly than the implicit concatenation. > > Secondly, it feels that this does the concatenation in the wrong order. > > Implicit concatenation occurs as early as possible in every other case. > > But here, we're delaying the concatenation until after the format. So > > this feels wrong to me. > > > > Implicit concatenation does NOT happen as early as possible in every case. > When I write: > > r'a\n' 'b\n' ==> 'a\\nb\n' > > the r is applied to the first string *before* the concatenation with the > second string. r isn't a function, it's syntax. There's nothing to apply. This is why I don't think that the behaviour of mixed raw and cooked strings is a good model for mixing f and non-f strings. Both raw and cooked strings are lexical features and should be read from left to right, in the order that they occur, not function calls which must be delayed until runtime. [...] > Imagine that we have another prefix that escapes strings for regex. That is > e'a+b' ==> 'a\\+b'. This is another function call in disguise, just calling > re.escape. Now you're the one confusing interface with implementation :-) Such an e string need not be a function call, it could be a lexical feature like raw strings. In fact, I would expect that they should be. These hypothetical e strings could be a lexical feature, or a runtime function, but f *must* be a runtime function since the variables being interpolated don't have values to interpolate until runtime. We have no choice in the manner, whereas we do have a choice with e strings. In any case, I don't think it is a productive use of our time to discuss a hypothetical e string that neither of us intend to propose. > Maybe you can't say that concatenation is an optimization but I can (new > text underlined): > > Multiple adjacent string or bytes literals (delimited by whitespace), > possibly using different quoting conventions, are allowed, and their > meaning is the same as their concatenation. ... Thus, "hello" 'world' is > equivalent to "helloworld". This feature can be used to reduce the number > of backslashes needed, to split long strings conveniently across long > lines, *to mix formatted and unformatted strings,* or even to add comments > to parts of strings, for example: > > re.compile("[A-Za-z_]" # letter or underscore > "[A-Za-z0-9_]*" # letter, digit or underscore > ) > Note that this feature is defined at the syntactical level, but implemented > at compile time *as an optimization*. I don't think that flies. It's *not just an optimization* when it comes to f strings. It makes a difference to the semantics. f'{spam}' '{eggs}' being turned into "format first, then concat" has a very different meaning to "concat first, then format". To get the semantics you want, you need a third option: escape first, then concat, then format But there's nothing obvious in the syntax '{eggs}' that tells anyone when it will be escaped and when it won't be. You need to be aware of the special case "when implicitly concat'ed to f strings, BUT NO OTHER TIME, braces in ordinary strings will be escaped". I dislike special cases. They increase the number of things to memorise and lead to surprises. > *If formatted strings are > mixed with unformatted strings, they are concatenated at compile time and > the unformatted parts are escaped so they will not be subject to format > substitutions.* That's your opinion for the desirable behaviour. I don't like it, I don't expect it. The fact that you have to explicitly document it shows that it is a special case that doesn't follow from the existing behaviour of Python's implicit concatenation rules. I don't think we should have such a special case, when there are already at least two other ways to get the same effect. But since my preferred suggestion is unpopular, I'd much rather just ban implicit concat'ing of f and non-f strings and avoid the whole argument. That's not an onerous burden on the coder: result = (f'{this}' + '{that}') is not that much more difficult to type than: result = (f'{this}' '{that}') and it makes the behaviour clear. -- Steve From guido at python.org Sat Jul 25 09:06:08 2015 From: guido at python.org (Guido van Rossum) Date: Sat, 25 Jul 2015 09:06:08 +0200 Subject: [Python-ideas] Briefer string format In-Reply-To: <85380dnvwp.fsf@benfinney.id.au> References: <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AFE66E.9070007@trueblade.com> <20150723033112.GJ25179@ando.pearwood.info> <20150723142213.GK25179@ando.pearwood.info> <55B17B81.3070000@canterbury.ac.nz> <85380dnvwp.fsf@benfinney.id.au> Message-ID: On Fri, Jul 24, 2015 at 10:27 PM, Ben Finney wrote: > Greg Ewing writes: > > > Steven D'Aprano wrote: > > > > > I don't think I want this behaviour: > > > > > > f'{spam}' '{eggs}' > > > => format(spam) + '{eggs}' > > > > What do you think should happen in this case: > > > > '{spam}' f'{eggs}' > > > > It would seem very strange to me if the f infected strings *before* it > > as well as after it. > > The existing behaviour of implicit concatenation doesn't give much of a > guide here, unfortunately:: > > >>> 'foo\abar' r'lorem\tipsum' 'wibble\bwobble' > 'foo\x07barlorem\\tipsumwibble\x08wobble' > > >>> type(b'abc' 'def' b'ghi') > File "", line 1 > SyntaxError: cannot mix bytes and nonbytes literals > > So, the ?b? prefix expects to apply to all the implicitly-concatenated > parts (and fails if they're not all bytes strings); the ?r? prefix > expects to apply only to the one fragment, leaving others alone. > > Is the proposed ?f? prefix, on a fragment in implicit concatenation, > meant to have behaviour analogous to the ?r? prefix or the ?b? prefix, > or something else? What's the argument in favour of that choice? > It *must* work like r'' does. Implicit concatenation must be thought of as letting each string do its thing and then concatenating using '+', just optimized if possible. The error for b'' comes out because the '+' refuses b'' + ''. I find it a sign of the times that even this simple argument goes on and on forever. Please stop the thread until Eric has had the time to write up a PEP. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben+python at benfinney.id.au Sat Jul 25 09:55:18 2015 From: ben+python at benfinney.id.au (Ben Finney) Date: Sat, 25 Jul 2015 17:55:18 +1000 Subject: [Python-ideas] Briefer string format References: <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AFE66E.9070007@trueblade.com> <20150723033112.GJ25179@ando.pearwood.info> <20150723142213.GK25179@ando.pearwood.info> <55B17B81.3070000@canterbury.ac.nz> <85380dnvwp.fsf@benfinney.id.au> Message-ID: <85y4i4n02h.fsf@benfinney.id.au> Guido van Rossum writes: > On Fri, Jul 24, 2015 at 10:27 PM, Ben Finney wrote: > > > > Steven D'Aprano wrote: > > > > I don't think I want this behaviour: > > > > > > > > f'{spam}' '{eggs}' > > > > => format(spam) + '{eggs}' > [?] > > > > >>> 'foo\abar' r'lorem\tipsum' 'wibble\bwobble' > > 'foo\x07barlorem\\tipsumwibble\x08wobble' > > > > >>> type(b'abc' 'def' b'ghi') > > File "", line 1 > > SyntaxError: cannot mix bytes and nonbytes literals > > > > [?] > > Is the proposed ?f? prefix, on a fragment in implicit concatenation, > > meant to have behaviour analogous to the ?r? prefix or the ?b? > > prefix, or something else? What's the argument in favour of that > > choice? > > It *must* work like r'' does. Implicit concatenation must be thought > of as letting each string do its thing and then concatenating using > '+', just optimized if possible. The error for b'' comes out because > the '+' refuses b'' + ''. That makes sense, and is nicely consistent (?f?, ?r?, and ?b? all apply only to the one fragment, and then concatenation rules apply). Thanks. > I find it a sign of the times that even this simple argument goes on > and on forever. Please stop the thread until Eric has had the time to > write up a PEP. I found this discussion helpful in knowing the intent, and what people's existing expectations are. Hopefully you found it helpful too, Eric! In either case, I look forward to your PEP. -- \ ?? one of the main causes of the fall of the Roman Empire was | `\ that, lacking zero, they had no way to indicate successful | _o__) termination of their C programs.? ?Robert Firth | Ben Finney From ronaldoussoren at mac.com Sat Jul 25 13:47:06 2015 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Sat, 25 Jul 2015 13:47:06 +0200 Subject: [Python-ideas] Class introspection by pydoc vs. attributes on meta classes Message-ID: <04992DDB-026B-4F7C-8FBB-118D497836FC@mac.com> Hi, Pydoc (and AFAIK loads of other introspection tools as well) currently ignores attributes on metaclasses. I wonder if it would be better to teach pydoc (and possibly inspect as well) about those, and thought it would be better to ask here before starting on a patch. That is, given the following definitions: class Meta (type): def hidden_class(cls): print('H', cls.__name__) class MyObject (metaclass=Meta): @classmethod def public_class(cls): print('P', cls.__name__) Pydoc will show ?public_class? as a class method of ?MyObject? when you use ?help(MyObject)?, but it will not show that there is a method named ?hidden_class? that can be called on ?MyObject? (but not on instances of that class). That is a problem when ?hidden_class? is part of the public interface of the class itself. The issue was found in the context of PyObjC (obviously?): Objective-C classes can and do have instance and class methods of the same name. Those cannot both be a descriptor in the Python proxy class, and that?s why PyObjC uses an metaclass to expose Objective-C class methods to Python code. This works very well, except for some problems with introspection such as the pydoc issue I refer to above. Ronald -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Sat Jul 25 19:37:01 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Sat, 25 Jul 2015 19:37:01 +0200 Subject: [Python-ideas] Concurrency Modules In-Reply-To: References: <559EFB73.5050606@mail.de> <9c139305-f583-46c1-b819-6a98dbd04acc@googlegroups.com>, <55B2B0FB.1060409@mail.de> Message-ID: <55B3C93D.9090601@mail.de> Nice, that really clears it up for me. So, let's summarize what we have so far: | 1 | 2 | 3 ---------------+-------------------------+----------------------------+------------------------ code lives in | processes | threads | coroutines managed by | os scheduler | os scheduler + interpreter | customizable event loop | | | parallelism | yes | depends (cf. GIL) | no shared state | no | yes | yes | | | startup impact | biggest | medium | smallest cpu impact | biggest | medium | smallest memory impact | biggest | medium | smallest | | | purpose | cpu-bound tasks | i/o-bound tasks | ??? | | | module pool | multiprocessing.Pool | multiprocessing.dummy.Pool | ??? module solo | multiprocessing.Process | threading.Thread | ??? Please, feel free to amend/correct the table and fill in the ??? parts if you know better. On 25.07.2015 07:28, Steve Dower wrote: > "But I still have a question: why can't we use threads for the cakes? (1 > cake = 1 thread)." > > Because that is the wrong equality - it's really 1 baker = 1 thread. > > Bakers aren't free, you have to pay for each one (memory, stack > space), it will take time for each one to learn how your bakery works > (startup time), and you will waste some of your own time coordinating > them (interthread communication). > > You also only have one set of baking equipment (the GIL), buying > another bakery is expensive (another process) and fitting more > equipment into the current one is very complicated (subinterpreters). > > So you either pay a high price for 2 bakers = 2 cakes, or you accept 2 > bakers = 1.5 cakes (in the same amount of time). It turns out that > often 1 baker can do 1.5 cakes in the same time as well, and it's much > easier to reason about and implement correctly. > > Hope that makes sense and I'm not stretching things too far. Guess I > should make this into a talk for PyCon next year. > > Cheers, > Steve > > Top-posted from my Windows Phone > ------------------------------------------------------------------------ > From: Sven R. Kunze > Sent: ?7/?24/?2015 14:41 > To: Mark Summerfield ; > python-ideas at googlegroups.com ; > python-ideas at python.org ; Steve Dower > > Subject: Re: [Python-ideas] Concurrency Modules > > Hi. I am back. First of all thanks for your eager participation. I would > like to catch on on Steve's and Mark's examples as they seem to be very > good illustrations of what issue I still have. > > Steve explained why asyncio is great and Mark explained why > threading+multiprocessing is great. Each from his own perspective and > focusing on the internal implementation details. To me, all approaches > can now be fit into this sort of table. Please, correct me if it's wrong > (that is very important): > > # | code lives in | managed by > --+---------------+------------- > 1 | processes | os scheduler > 2 | threads | os scheduler > 3 | tasks | event loop > > > > But the original question still stands: > > Which one to use? > > > Ignoring little details like 'shared state', 'custom prioritization', > etc., they all look the same to me and to what it all comes down are > these little nasty details people try to explain so eagerly. Not saying > that is a bad thing but it has some implications on production code I do > not like and in the following I am going to explain that. > > Say, we have decided for approach N because of some requirements > (examples from here and there, guidelines given by smart people, > customer needs etc.) and wrote hundred thousand lines of code. > What if these requirements change 6 years in the future? > What if the maintainer of approach N decided to change it in such a way > that is not compatible with our requirements anymore? > From what I can see there is no easy way 'back' to use another > approach. They all have different APIs, basically for: 'executing a > function and returning its precious result (the cake)'. > > > asyncio gives us the flexibility to choose a prioritization mechanism. > Nice to have, because we are now independent on the os scheduler. > But do we really ever need that? > What is wrong with the os scheduler? > Would that not mean that Mark better switches to asyncio? > We don't know if we ever would need that in project A and project B. > What now? Use asyncio just in case? Preemptively? > > > @Steve > Thanks for that great explanation of how asyncio works and its > relationship to threads/processes. > > But I still have a question: why can't we use threads for the cakes? (1 > cake = 1 thread). Not saying that asyncio would be a bad idea to use > here, but couldn't we accomplish the same functionality by using threads? > > > > I think, after we've settled the above questions, we should change the > focus from > > How do they work internally and what are the tiny differences? > (answered greatly by Mark) > > to > > When do I use which one? > > > The latter question actually is what counts for production code. It > actually is quite interesting to know and to ponder over all the > differences, dependencies, corner cases etc. However, when it actually > comes down to 'executing a piece of code and returning its result', you > end up deciding which approach to choose. You won't implement all 3 > different ways just because it is great to see all the nasty little > details to click in. > > > On Thursday, July 9, 2015 at 11:54:11 PM UTC+1, Sven R. Kunze wrote: > > > > In order to make a sound decision for the question: "Which one(s) do I > > use?", at least the following items should be somehow defined clearly > > for these modules: > > > > 1) relationship between the modules > > 2) NON-overlapping usage scenarios > > 3) future development intentions > > 4) ease of usage of the modules => future syntax > > 5) examples > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Sat Jul 25 21:55:30 2015 From: eric at trueblade.com (Eric V. Smith) Date: Sat, 25 Jul 2015 15:55:30 -0400 Subject: [Python-ideas] Briefer string format In-Reply-To: <85y4i4n02h.fsf@benfinney.id.au> References: <76E93441-D589-4E1C-9A03-8448F0EF0B73@gmail.com> <55AD0376.7020000@trueblade.com> <55AD2B15.1080909@trueblade.com> <55AD368D.7020108@trueblade.com> <55AFE66E.9070007@trueblade.com> <20150723033112.GJ25179@ando.pearwood.info> <20150723142213.GK25179@ando.pearwood.info> <55B17B81.3070000@canterbury.ac.nz> <85380dnvwp.fsf@benfinney.id.au> <85y4i4n02h.fsf@benfinney.id.au> Message-ID: <55B3E9B2.50709@trueblade.com> On 7/25/2015 3:55 AM, Ben Finney wrote: > Guido van Rossum writes: > >> On Fri, Jul 24, 2015 at 10:27 PM, Ben Finney wrote: >> >>>> Steven D'Aprano wrote: >>>>> I don't think I want this behaviour: >>>>> >>>>> f'{spam}' '{eggs}' >>>>> => format(spam) + '{eggs}' >> [?] >>> >>> >>> 'foo\abar' r'lorem\tipsum' 'wibble\bwobble' >>> 'foo\x07barlorem\\tipsumwibble\x08wobble' >>> >>> >>> type(b'abc' 'def' b'ghi') >>> File "", line 1 >>> SyntaxError: cannot mix bytes and nonbytes literals >>> >>> [?] >>> Is the proposed ?f? prefix, on a fragment in implicit concatenation, >>> meant to have behaviour analogous to the ?r? prefix or the ?b? >>> prefix, or something else? What's the argument in favour of that >>> choice? >> >> It *must* work like r'' does. Implicit concatenation must be thought >> of as letting each string do its thing and then concatenating using >> '+', just optimized if possible. The error for b'' comes out because >> the '+' refuses b'' + ''. > > That makes sense, and is nicely consistent (?f?, ?r?, and ?b? all apply > only to the one fragment, and then concatenation rules apply). Thanks. Yes, I think that's the only interpretation that makes sense. >> I find it a sign of the times that even this simple argument goes on >> and on forever. Please stop the thread until Eric has had the time to >> write up a PEP. > > I found this discussion helpful in knowing the intent, and what people's > existing expectations are. > > Hopefully you found it helpful too, Eric! In either case, I look forward > to your PEP. In trying to understand the issues for a PEP, I'm working on a sample implementation. There, I've just disallowed concatentation entirely. Compared to all of the other issues, it's really insignificant. I'll put it back at some point. Eric. From Nikolaus at rath.org Sun Jul 26 02:58:09 2015 From: Nikolaus at rath.org (Nikolaus Rath) Date: Sat, 25 Jul 2015 17:58:09 -0700 Subject: [Python-ideas] Concurrency Modules In-Reply-To: <55B3C93D.9090601@mail.de> (Sven R. Kunze's message of "Sat, 25 Jul 2015 19:37:01 +0200") References: <559EFB73.5050606@mail.de> <9c139305-f583-46c1-b819-6a98dbd04acc@googlegroups.com> <55B2B0FB.1060409@mail.de> <55B3C93D.9090601@mail.de> Message-ID: <87egjvn3a6.fsf@vostro.rath.org> On Jul 25 2015, "Sven R. Kunze" wrote: > startup impact | biggest | medium | smallest > cpu impact | biggest | medium | smallest > memory impact | biggest | medium | smallest > purpose | cpu-bound tasks | i/o-bound tasks | ??? I don't think any of these is correct. Unfortunately, I also don't think there even is a correct version, the differences are simply not so clear-cut. On Unix, Process startup-cost can be high if you do fork() + exec(), but if you just fork, it's as cheap as a thread. With asyncio, it's not clear to me what exactly you'd define as the "startup impact" (the creation of a future maybe? Or setting up the event loop?). "CPU impact" as a category doesn't make any sense to me. If you execute the same code it's going to take the same amount of (cumulative) CPU time, no matter if this code runs in a separate thread, separate process, or asynchronously. "memory impact" is probably highest for separate processes, but I don't see an obvious difference when using threads vs asyncio. Where did you get this from? As far as purpose is concerned, pretty much the only limitation is that asyncio is not suitable for cpu-bound tasks. Any other combination is possible and also most appropriate in specific circumstances. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F ?Time flies like an arrow, fruit flies like a Banana.? From srkunze at mail.de Sun Jul 26 12:07:05 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Sun, 26 Jul 2015 12:07:05 +0200 Subject: [Python-ideas] Concurrency Modules In-Reply-To: <87egjvn3a6.fsf@vostro.rath.org> References: <559EFB73.5050606@mail.de> <9c139305-f583-46c1-b819-6a98dbd04acc@googlegroups.com> <55B2B0FB.1060409@mail.de> <55B3C93D.9090601@mail.de> <87egjvn3a6.fsf@vostro.rath.org> Message-ID: <55B4B149.1090408@mail.de> Thanks, Nikolaus. Mostly I refer to things Steve brought up in his analogies (two recent posts). So, I might interpreted them the wrong way. On 26.07.2015 02:58, Nikolaus Rath wrote: > On Jul 25 2015, "Sven R. Kunze" wrote: >> startup impact | biggest | medium | smallest >> cpu impact | biggest | medium | smallest >> memory impact | biggest | medium | smallest >> purpose | cpu-bound tasks | i/o-bound tasks | ??? > I don't think any of these is correct. Unfortunately, I also don't think > there even is a correct version, the differences are simply not so > clear-cut. I think that has already been discussed. We just try to boil it down to assist people making the decision of which module might be the best for them. > On Unix, Process startup-cost can be high if you do fork() + exec(), but > if you just fork, it's as cheap as a thread. Didn't know that. Thanks for clarifying. How do multiprocessing.Pool and multiprocessing.Process work in this regard? > With asyncio, it's not > clear to me what exactly you'd define as the "startup impact" (the > creation of a future maybe? Or setting up the event loop?). The purpose of survey is to give developers an easy way to decide which approach might be suitable for them. So, the definition of 'startup time' should be roughly equivalent across the approaches. >> What's necessary to get a process up and running a piece of code compared to what's necessary to get asyncio up and running the same piece of code. Steve: "Bakers aren't free, you have to pay for each one (memory, stack space), it will take time for each one to learn how your bakery works (startup time)" > "CPU impact" as a category doesn't make any sense to me. If you execute > the same code it's going to take the same amount of (cumulative) CPU > time, no matter if this code runs in a separate thread, separate > process, or asynchronously. From what I understand, switching contexts impacts cpu whereas the event loop does not so much. > "memory impact" is probably highest for separate processes, but I don't > see an obvious difference when using threads vs asyncio. Where did you > get this from? I can imagine that when the os needs to manage threads it creates more overhead for each thread than what it takes for the Python interpreter when suspending coroutines. That could be wrong? Do you have any material on this? > As far as purpose is concerned, pretty much the only limitation is that > asyncio is not suitable for cpu-bound tasks. Any other combination is > possible and also most appropriate in specific circumstances. What exactly do you mean by any other combination? I take from this that asyncio is suitable for heavy i/o-bound, threads are for cpu/io-bound and processes for mainly cpu-bound. Best, Sven From abarnert at yahoo.com Sun Jul 26 12:29:01 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 26 Jul 2015 12:29:01 +0200 Subject: [Python-ideas] Concurrency Modules In-Reply-To: <55B4B149.1090408@mail.de> References: <559EFB73.5050606@mail.de> <9c139305-f583-46c1-b819-6a98dbd04acc@googlegroups.com> <55B2B0FB.1060409@mail.de> <55B3C93D.9090601@mail.de> <87egjvn3a6.fsf@vostro.rath.org> <55B4B149.1090408@mail.de> Message-ID: <59F2450C-FF41-4C62-9924-432993BC8A55@yahoo.com> On Jul 26, 2015, at 12:07, Sven R. Kunze wrote: > > Thanks, Nikolaus. Mostly I refer to things Steve brought up in his analogies (two recent posts). So, I might interpreted them the wrong way. > >> On 26.07.2015 02:58, Nikolaus Rath wrote: >>> On Jul 25 2015, "Sven R. Kunze" wrote: >>> startup impact | biggest | medium | smallest >>> cpu impact | biggest | medium | smallest >>> memory impact | biggest | medium | smallest >>> purpose | cpu-bound tasks | i/o-bound tasks | ??? >> I don't think any of these is correct. Unfortunately, I also don't think >> there even is a correct version, the differences are simply not so >> clear-cut. > I think that has already been discussed. We just try to boil it down to assist people making the decision of which module might be the best for them. One huge thing you're missing is cooperative vs. preemptive switching. In asyncio, you know that no other task is going to run until you reach the next explicit yield point; with threads, it can happen after any bytecode; with processes, it can happen anywhere at all. This means that if you're using shared state, your locking strategy can be simpler, more efficient, and easier to prove correct with asyncio. And likewise, if you need to sequence things, it can be easier with asyncio (although often the simplest way to do that in any mechanism is to make each of those things into a task and just chain futures together). >> On Unix, Process startup-cost can be high if you do fork() + exec(), but >> if you just fork, it's as cheap as a thread. > Didn't know that. Thanks for clarifying. How do multiprocessing.Pool and multiprocessing.Process work in this regard? It's your choice: just fork, spawn (fork+exec), or spawn a special "server" process to fork copies off. (Except on Windows, where spawn is the only possibility.) How do you know which one to choose? Well, you have to learn the differences to make a decision. Forking is fastest, and it means some kinds of globals are automatically shared, but it can lead to a variety of problems, especially if you're also using threads (and some libraries may use threads without you knowing about it--especially on OS X, where a variety of Cocoa APIs sometimes use threads and sometimes don't). >> With asyncio, it's not >> clear to me what exactly you'd define as the "startup impact" (the >> creation of a future maybe? Or setting up the event loop?). > The purpose of survey is to give developers an easy way to decide which approach might be suitable for them. > So, the definition of 'startup time' should be roughly equivalent across the approaches. >> What's necessary to get a process up and running a piece of code compared to what's necessary to get asyncio up and running the same piece of code. > > Steve: "Bakers aren't free, you have to pay for each one (memory, stack space), it will take time for each one to learn how your bakery works (startup time)" >> "CPU impact" as a category doesn't make any sense to me. If you execute >> the same code it's going to take the same amount of (cumulative) CPU >> time, no matter if this code runs in a separate thread, separate >> process, or asynchronously. > From what I understand, switching contexts impacts cpu whereas the event loop does not so much. Yes. There's always a context switch going on, but a cooperative context switch can swap a lot less, and can do it without having to cross the user-kernel boundary. >> "memory impact" is probably highest for separate processes, but I don't >> see an obvious difference when using threads vs asyncio. Where did you >> get this from? > I can imagine that when the os needs to manage threads it creates more overhead for each thread than what it takes for the Python interpreter when suspending coroutines. That could be wrong? Do you have any material on this? The overhead for the contexts themselves is tiny--but one of the things each thread context points at is the stack, and that may be 1MB or even more. So, a program with 500 threads may be using half a GB just for stacks. That may not be as bad as it sounds, because if you never use most of the stack, most of it may never actually get paged to physical memory. (But on 32-bit OS's, you're still using up a quarter of your page table space.) >> As far as purpose is concerned, pretty much the only limitation is that >> asyncio is not suitable for cpu-bound tasks. Any other combination is >> possible and also most appropriate in specific circumstances. > What exactly do you mean by any other combination? > > I take from this that asyncio is suitable for heavy i/o-bound, threads are for cpu/io-bound and processes for mainly cpu-bound. Asyncio is best for massively concurrent i/o bound code that does pretty much the same thing for each one, like a web server that has to handle thousands of users. Threads are also used for i/o bound code; it's more a matter of how you want to write the code than of what it does. Processes, on the other hand, are the only way (other than a C extension that releases the GIL--or, of course, using a different Python interpreter) to get CPU parallelism. So, that part is right. But there are other advantages of using processes sometimes--it guarantees no accidental shared state; it gives you a way to "recycle" your workers if you might call some C library that can crash or leak memory or corrupt things; it gives you another VM space (which can be a big deal in 32-bit platforms). Also, you can write multiprocessing code as if you were writing distributed code, which makes it easier to turn into real distributed code if you later need to do that. From srkunze at mail.de Sun Jul 26 13:44:59 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Sun, 26 Jul 2015 13:44:59 +0200 Subject: [Python-ideas] Concurrency Modules In-Reply-To: <59F2450C-FF41-4C62-9924-432993BC8A55@yahoo.com> References: <559EFB73.5050606@mail.de> <9c139305-f583-46c1-b819-6a98dbd04acc@googlegroups.com> <55B2B0FB.1060409@mail.de> <55B3C93D.9090601@mail.de> <87egjvn3a6.fsf@vostro.rath.org> <55B4B149.1090408@mail.de> <59F2450C-FF41-4C62-9924-432993BC8A55@yahoo.com> Message-ID: <55B4C83B.2030904@mail.de> Wow. Thanks, Andrew for this very informative response. I am going to integrate your thoughts in to the table later and re-post it again. Just one question: On 26.07.2015 12:29, Andrew Barnert wrote: > It's your choice: just fork, spawn (fork+exec), or spawn a special "server" process to fork copies off. (Except on Windows, where spawn is the only possibility.) > > How do you know which one to choose? Well, you have to learn the differences to make a decision. Forking is fastest, and it means some kinds of globals are automatically shared, but it can lead to a variety of problems, especially if you're also using threads (and some libraries may use threads without you knowing about it--especially on OS X, where a variety of Cocoa APIs sometimes use threads and sometimes don't). If I read the documentation of https://docs.python.org/2/library/multiprocessing.html#module-multiprocessing.pool for instance, I do not see a way to specify my choice. There, I pass a function and this function is executed in another process/thread. Is that just forking? From p.f.moore at gmail.com Sun Jul 26 14:18:08 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 26 Jul 2015 13:18:08 +0100 Subject: [Python-ideas] Concurrency Modules In-Reply-To: <55B3C93D.9090601@mail.de> References: <559EFB73.5050606@mail.de> <9c139305-f583-46c1-b819-6a98dbd04acc@googlegroups.com> <55B2B0FB.1060409@mail.de> <55B3C93D.9090601@mail.de> Message-ID: On 25 July 2015 at 18:37, Sven R. Kunze wrote: > Nice, that really clears it up for me. So, let's summarize what we have so > far: Just as a note - even given the various provisos and "it's not that simple" comments that have been made, I found this table extremely useful. Like any such high-level summary, I expect to have to take it with a pinch of salt, but I don't see that as an issue - anyone who doesn't fully appreciate that there are subtleties, probably wouldn't read a longer explanation anyway. So many thanks for taking the time to put this together (and for continuing to improve it). +1 on something like this ending up in the Python docs somewhere. Paul From ncoghlan at gmail.com Sun Jul 26 16:09:09 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 27 Jul 2015 00:09:09 +1000 Subject: [Python-ideas] Class introspection by pydoc vs. attributes on meta classes In-Reply-To: <04992DDB-026B-4F7C-8FBB-118D497836FC@mac.com> References: <04992DDB-026B-4F7C-8FBB-118D497836FC@mac.com> Message-ID: On 25 July 2015 at 21:47, Ronald Oussoren wrote: > The issue was found in the context of PyObjC (obviously?): Objective-C > classes can and do have instance and class methods of the same name. Those > cannot both be a descriptor in the Python proxy class, and that?s why PyObjC > uses an metaclass to expose Objective-C class methods to Python code. This > works very well, except for some problems with introspection such as the > pydoc issue I refer to above. The main problem with doing that by default is the number of methods and attributes that exist on type: >>> dir(type) ['__abstractmethods__', '__base__', '__bases__', '__basicsize__', '__call__', '__class__', '__delattr__', '__dict__', '__dictoffset__', '__dir__', '__doc__', '__eq__', '__flags__', '__format__', '__ge__ ', '__getattribute__', '__gt__', '__hash__', '__init__', '__instancecheck__', '__itemsize__', '__le__', '__lt__', '__module__', '__mro__', '__name__', '__ne__', '__new__', '__prepare__', '__qualname__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasscheck__', '__subclasses__', '__subclasshook__', '__text_signature__', '__weakrefoffset__', 'mro'] We *don't* want all of those turning up in the API reported for every class object, so we don't go to the metaclass by default in dir(), and hence not in help() output either. However, if you override __dir__ on the metaclass, you also affect pydoc output on instances of that metaclass: >>> class M(type): ... def __dir__(cls): ... return ["meta_method"] ... def meta_method(cls): ... """Metaclass method""" ... return 42 ... >>> class C(metaclass=M): pass ... >>> C.meta_method() 42 >>> dir(C) ['meta_method'] >>> print(pydoc.render_doc(C)) Python Library Documentation: class C in module __main__ class C(builtins.object) | Methods inherited from M: | | meta_method() from __main__.M | Metaclass method It actually slightly confuses pydoc (it's assuming any method not defined locally is "inherited", which isn't really the right word for metaclass methods), but it's good enough to get people going in the right direction. And dynamically reverting C back to the default help output to demonstrate that it really is the __dir__ that makes the difference: >>> del M.__dir__ >>> print(pydoc.render_doc(C)) Python Library Documentation: class C in module __main__ class C(builtins.object) | Data descriptors defined here: | | __dict__ | dictionary for instance variables (if defined) | | __weakref__ | list of weak references to the object (if defined) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sun Jul 26 16:19:40 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 27 Jul 2015 00:19:40 +1000 Subject: [Python-ideas] Concurrency Modules In-Reply-To: References: <559EFB73.5050606@mail.de> <9c139305-f583-46c1-b819-6a98dbd04acc@googlegroups.com> <55B2B0FB.1060409@mail.de> Message-ID: On 25 July 2015 at 15:32, Chris Angelico wrote: > On Sat, Jul 25, 2015 at 3:28 PM, Steve Dower wrote: >> Hope that makes sense and I'm not stretching things too far. Guess I should >> make this into a talk for PyCon next year. > > Yes. And serve cake. > > On a more serious note, I'd like to see some throughput tests for > process-pool, thread-pool, and asyncio on a single thread. That'd make > a great PyCon talk; make sure it's videoed, as I'd likely be linking > to it a lot. Dave Beazley's "Python Concurrency from the Ground Up" talk at PyCon US this year was almost exactly that: https://us.pycon.org/2015/schedule/presentation/374/ Video: https://www.youtube.com/watch?v=MCs5OvhV9S4 Demo code: https://github.com/dabeaz/concurrencylive There's a direct causal link between that talk and our renewed interest in getting subinterpreters up to a point where they can offer most of the low overhead of interpreter threads with most of the memory safety of operating system level processes :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sun Jul 26 16:28:35 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 27 Jul 2015 00:28:35 +1000 Subject: [Python-ideas] Concurrency Modules In-Reply-To: <55B4C83B.2030904@mail.de> References: <559EFB73.5050606@mail.de> <9c139305-f583-46c1-b819-6a98dbd04acc@googlegroups.com> <55B2B0FB.1060409@mail.de> <55B3C93D.9090601@mail.de> <87egjvn3a6.fsf@vostro.rath.org> <55B4B149.1090408@mail.de> <59F2450C-FF41-4C62-9924-432993BC8A55@yahoo.com> <55B4C83B.2030904@mail.de> Message-ID: On 26 July 2015 at 21:44, Sven R. Kunze wrote: > If I read the documentation of > https://docs.python.org/2/library/multiprocessing.html#module-multiprocessing.pool > for instance, I do not see a way to specify my choice. The Python 2.7 multiprocessing module API is ~5 years old at this point, Andrew's referring to the API in Python 3.4+: https://docs.python.org/2/library/multiprocessing.html#module-multiprocessing.pool As far as the other benefits of asyncio go, one of the perks is that you can stop all processing smoothly just by stopping the event loop, and then they'll all resume together later. This gives you a *lot* more predictability than using threads or processes, which genuinely execute in parallel. After the previous discussion, I wrote http://www.curiousefficiency.org/posts/2015/07/asyncio-tcp-echo-server.html to attempt to convey some of the *practical* benefits of using asyncio to manage interleaved network operations within a single thread. While in the blog post I'm just playing with TCP clients and echo servers at the interactive prompt, it wouldn't be too hard to adapt those techniques to running network client and server testing code as part of a synchronous test suite. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sun Jul 26 16:30:38 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 27 Jul 2015 00:30:38 +1000 Subject: [Python-ideas] Concurrency Modules In-Reply-To: References: <559EFB73.5050606@mail.de> <9c139305-f583-46c1-b819-6a98dbd04acc@googlegroups.com> <55B2B0FB.1060409@mail.de> <55B3C93D.9090601@mail.de> <87egjvn3a6.fsf@vostro.rath.org> <55B4B149.1090408@mail.de> <59F2450C-FF41-4C62-9924-432993BC8A55@yahoo.com> <55B4C83B.2030904@mail.de> Message-ID: On 27 July 2015 at 00:28, Nick Coghlan wrote: > On 26 July 2015 at 21:44, Sven R. Kunze wrote: >> If I read the documentation of >> https://docs.python.org/2/library/multiprocessing.html#module-multiprocessing.pool >> for instance, I do not see a way to specify my choice. > > The Python 2.7 multiprocessing module API is ~5 years old at this > point, Andrew's referring to the API in Python 3.4+: > https://docs.python.org/2/library/multiprocessing.html#module-multiprocessing.pool It would help if I actually replaced the link with the one I intended to provide...: https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From abarnert at yahoo.com Sun Jul 26 17:00:06 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 26 Jul 2015 17:00:06 +0200 Subject: [Python-ideas] Concurrency Modules In-Reply-To: <55B4C83B.2030904@mail.de> References: <559EFB73.5050606@mail.de> <9c139305-f583-46c1-b819-6a98dbd04acc@googlegroups.com> <55B2B0FB.1060409@mail.de> <55B3C93D.9090601@mail.de> <87egjvn3a6.fsf@vostro.rath.org> <55B4B149.1090408@mail.de> <59F2450C-FF41-4C62-9924-432993BC8A55@yahoo.com> <55B4C83B.2030904@mail.de> Message-ID: <556A4DCB-1E86-4FFF-B546-913BF3434E18@yahoo.com> On Jul 26, 2015, at 13:44, Sven R. Kunze wrote: > > Wow. Thanks, Andrew for this very informative response. I am going to integrate your thoughts in to the table later and re-post it again. > > Just one question: > >> On 26.07.2015 12:29, Andrew Barnert wrote: >> It's your choice: just fork, spawn (fork+exec), or spawn a special "server" process to fork copies off. (Except on Windows, where spawn is the only possibility.) >> >> How do you know which one to choose? Well, you have to learn the differences to make a decision. Forking is fastest, and it means some kinds of globals are automatically shared, but it can lead to a variety of problems, especially if you're also using threads (and some libraries may use threads without you knowing about it--especially on OS X, where a variety of Cocoa APIs sometimes use threads and sometimes don't). > > If I read the documentation of https://docs.python.org/2/library/multiprocessing.html#module-multiprocessing.pool for instance, I do not see a way to specify my choice. That's because you're reading the documentation for Python 2.7. In 2.7, you always get fork on Unix and spawn on Windows; the choice of start methods was added in 3.3 or 3.4. > > There, I pass a function and this function is executed in another process/thread. Is that just forking? If you pass a function to a Process in 2.7, on Unix, that's just forking; the parent process returns while the child process calls your function and exits. If you pass it to a Pool, all the pool processes are forked, but they keep running and pick new tasks off a queue. On Windows, on the other hand, a new Process calls CreateNewProcess (the equivalent of fork then exec, or posix_spawn, on Unix) to launch an entirely new Python interpreter, which then imports your module and calls your function. With a Pool, all the new processes get started the same way, then keep running and pick new tasks off a queue. From encukou at gmail.com Sun Jul 26 18:09:17 2015 From: encukou at gmail.com (Petr Viktorin) Date: Sun, 26 Jul 2015 18:09:17 +0200 Subject: [Python-ideas] Syntax for key-value iteration over mappings Message-ID: Hello, Currently, the way to iterate over keys and values of a mapping is to call items() and iterate over the resulting view:: for key, value in a_dict.items(): print(key, value) I believe that looping over all the data in a dict is a very imporant operation, and I find myself writing this quite often. Every time I do, it seems it's boilerplate; it looks a like a workaround rather than a preferred way of doing things. In dict comprehensions and literals, key-value pairs are separated by colons. How about allowing that in for loops as well? for key: value in a_dict: print(key, value) I argue that to anyone familiar with dict literals, let alone dict comprehensions, the semantics of this loop should be pretty obvious. In dict comprehensions, similarity to existing syntax becomes even more clear: a_mapping = {1: 'one', 2: 'two'} inverse = {val: key for key: val in a_mapping} I've bounced this idea off a few EuroPython sprinters, and got some questions/concerns I can answer here: * But, the colon is supposed to start a block! Well, it's already used in dict comprehensions/literals (though it's true that there it's always inside brackets). And in lambdas ? Here's code that is legal today (though not very practical): while lambda: True: break * There's supposed to be only one obvious way to do it! We alredy have .items()! I don't think this stops us from adding a new way of doing things which is more obvious than the old, and which should become the one way. After all, you don't say "for key in mapping.keys():", even though the keys() method exists. * What exactly would it do? There are multiple options ? - loop over .keys() and use __getitem__ each time, like the dict() constructor? - loop over .items(), like most of the code used today? - become a well-specified "key/value iteration protocol" with __iteritems__() and its own bytecode operation? ? but here I'm asking if building this bikeshed sounds useful, rather than what paint to buy. That said, I do have a proof of concept implementation of the second option, in case you'd like to play around with this: Github: https://github.com/encukou/cpython/tree/keyval-iteration patch: https://github.com/encukou/cpython/commit/b9b0d973342280f0ef52e26a4b67f326ece82a54.patch From srkunze at mail.de Sun Jul 26 22:05:02 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Sun, 26 Jul 2015 22:05:02 +0200 Subject: [Python-ideas] Syntax for key-value iteration over mappings In-Reply-To: References: Message-ID: <55B53D6E.3040209@mail.de> I'd love that because I find .items() quite cumbersome as well if I have to use it. I'd like to know if there were some reason not to introduce this in the first place. Best, Sven On 26.07.2015 18:09, Petr Viktorin wrote: > Hello, > Currently, the way to iterate over keys and values of a mapping > is to call items() and iterate over the resulting view:: > > for key, value in a_dict.items(): > print(key, value) > > I believe that looping over all the data in a dict is a very imporant > operation, and I find myself writing this quite often. Every time I do, > it seems it's boilerplate; it looks a like a workaround rather than a > preferred way of doing things. > > In dict comprehensions and literals, key-value pairs are separated by > colons. How about allowing that in for loops as well? > > for key: value in a_dict: > print(key, value) > > I argue that to anyone familiar with dict literals, let alone dict > comprehensions, the semantics of this loop should be pretty obvious. > In dict comprehensions, similarity to existing syntax becomes even > more clear: > > a_mapping = {1: 'one', 2: 'two'} > inverse = {val: key for key: val in a_mapping} > > > I've bounced this idea off a few EuroPython sprinters, and got some > questions/concerns I can answer here: > > * But, the colon is supposed to start a block! > > Well, it's already used in dict comprehensions/literals (though it's > true that there it's always inside brackets). And in lambdas ? > Here's code that is legal today (though not very practical): > > while lambda: True: > break > > * There's supposed to be only one obvious way to do it! We alredy have .items()! > > I don't think this stops us from adding a new way of doing things which > is more obvious than the old, and which should become the one way. > After all, you don't say "for key in mapping.keys():", even though > the keys() method exists. > > * What exactly would it do? > > There are multiple options ? > - loop over .keys() and use __getitem__ each time, like the > dict() constructor? > - loop over .items(), like most of the code used today? > - become a well-specified "key/value iteration protocol" with > __iteritems__() and its own bytecode operation? > > ? but here I'm asking if building this bikeshed sounds useful, rather > than what paint to buy. > > > That said, I do have a proof of concept implementation of the second > option, in case you'd like to play around with this: > Github: https://github.com/encukou/cpython/tree/keyval-iteration > patch: https://github.com/encukou/cpython/commit/b9b0d973342280f0ef52e26a4b67f326ece82a54.patch > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From srkunze at mail.de Sun Jul 26 23:26:38 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Sun, 26 Jul 2015 23:26:38 +0200 Subject: [Python-ideas] Concurrency Modules In-Reply-To: References: <559EFB73.5050606@mail.de> <9c139305-f583-46c1-b819-6a98dbd04acc@googlegroups.com> <55B2B0FB.1060409@mail.de> <55B3C93D.9090601@mail.de> Message-ID: <55B5508E.1000201@mail.de> Next update: Improving Performance by Running Independent Tasks Concurrently - A Survey | processes | threads | coroutines ---------------+-------------------------+----------------------------+------------------------- purpose | cpu-bound tasks | cpu- & i/o-bound tasks | i/o-bound tasks | | | managed by | os scheduler | os scheduler + interpreter | customizable event loop controllable | no | no | yes | | | parallelism | yes | depends (cf. GIL) | no switching | at any time | after any bytecode | at user-defined points shared state | no | yes | yes | | | startup impact | biggest/medium* | medium | smallest cpu impact** | biggest | medium | smallest memory impact | biggest | medium | smallest | | | pool module | multiprocessing.Pool | multiprocessing.dummy.Pool | asyncio.BaseEventLoop solo module | multiprocessing.Process | threading.Thread | --- * biggest - if spawn (fork+exec) and always on Windows medium - if fork alone ** due to context switching On 26.07.2015 14:18, Paul Moore wrote: > Just as a note - even given the various provisos and "it's not that > simple" comments that have been made, I found this table extremely > useful. Like any such high-level summary, I expect to have to take it > with a pinch of salt, but I don't see that as an issue - anyone who > doesn't fully appreciate that there are subtleties, probably wouldn't > read a longer explanation anyway. > > So many thanks for taking the time to put this together (and for > continuing to improve it). You are welcome. :) > +1 on something like this ending up in the Python docs somewhere. Not sure how the process for this is but I think the Python gurus will find a way. -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Sun Jul 26 23:54:14 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Sun, 26 Jul 2015 23:54:14 +0200 Subject: [Python-ideas] Concurrency Modules In-Reply-To: <556A4DCB-1E86-4FFF-B546-913BF3434E18@yahoo.com> References: <559EFB73.5050606@mail.de> <9c139305-f583-46c1-b819-6a98dbd04acc@googlegroups.com> <55B2B0FB.1060409@mail.de> <55B3C93D.9090601@mail.de> <87egjvn3a6.fsf@vostro.rath.org> <55B4B149.1090408@mail.de> <59F2450C-FF41-4C62-9924-432993BC8A55@yahoo.com> <55B4C83B.2030904@mail.de> <556A4DCB-1E86-4FFF-B546-913BF3434E18@yahoo.com> Message-ID: <55B55706.2080907@mail.de> Big thanks to you, Andrew, Nick and Nikolaus for the latest comments and ideas. I think the table is in a very good shape now and the questions I started this thread with are now answered (at least) to my satisfaction. The relationships are clear (they are all different modules for the same overall purpose), they have different fields of application (cpu vs io) and they have slightly different properties. How do we proceed from here? Btw. the number of different approaches (currently 3, but I assume this will go up in the future) is quite unfortunate. What's even more unfortunate is the missing exchangeability due to API differences and a common syntax for executing functions concurrently. Something that struck me as odd was that asyncio got syntactic sugar although the module itself is actually quite young compared to the support of processes and of threads. These two alternatives have actually no a single bit of syntax support until now. On 26.07.2015 17:00, Andrew Barnert wrote: > On Jul 26, 2015, at 13:44, Sven R. Kunze wrote: >> Wow. Thanks, Andrew for this very informative response. I am going to integrate your thoughts in to the table later and re-post it again. >> >> Just one question: >> >>> On 26.07.2015 12:29, Andrew Barnert wrote: >>> It's your choice: just fork, spawn (fork+exec), or spawn a special "server" process to fork copies off. (Except on Windows, where spawn is the only possibility.) >>> >>> How do you know which one to choose? Well, you have to learn the differences to make a decision. Forking is fastest, and it means some kinds of globals are automatically shared, but it can lead to a variety of problems, especially if you're also using threads (and some libraries may use threads without you knowing about it--especially on OS X, where a variety of Cocoa APIs sometimes use threads and sometimes don't). >> If I read the documentation of https://docs.python.org/2/library/multiprocessing.html#module-multiprocessing.pool for instance, I do not see a way to specify my choice. > That's because you're reading the documentation for Python 2.7. In 2.7, you always get fork on Unix and spawn on Windows; the choice of start methods was added in 3.3 or 3.4. >> There, I pass a function and this function is executed in another process/thread. Is that just forking? > If you pass a function to a Process in 2.7, on Unix, that's just forking; the parent process returns while the child process calls your function and exits. If you pass it to a Pool, all the pool processes are forked, but they keep running and pick new tasks off a queue. > > On Windows, on the other hand, a new Process calls CreateNewProcess (the equivalent of fork then exec, or posix_spawn, on Unix) to launch an entirely new Python interpreter, which then imports your module and calls your function. With a Pool, all the new processes get started the same way, then keep running and pick new tasks off a queue. From rob.cliffe at btinternet.com Mon Jul 27 03:52:29 2015 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Mon, 27 Jul 2015 02:52:29 +0100 Subject: [Python-ideas] Syntax for key-value iteration over mappings In-Reply-To: References: Message-ID: <55B58EDD.4020209@btinternet.com> On 26/07/2015 17:09, Petr Viktorin wrote: > Hello, > Currently, the way to iterate over keys and values of a mapping > is to call items() and iterate over the resulting view:: > > for key, value in a_dict.items(): > print(key, value) > > I believe that looping over all the data in a dict is a very imporant > operation, and I find myself writing this quite often. Every time I do, > it seems it's boilerplate; it looks a like a workaround rather than a > preferred way of doing things. > > In dict comprehensions and literals, key-value pairs are separated by > colons. How about allowing that in for loops as well? > > for key: value in a_dict: > print(key, value) > > I argue that to anyone familiar with dict literals, let alone dict > comprehensions, the semantics of this loop should be pretty obvious. > In dict comprehensions, similarity to existing syntax becomes even > more clear: > > a_mapping = {1: 'one', 2: 'two'} > inverse = {val: key for key: val in a_mapping} > > > I've bounced this idea off a few EuroPython sprinters, and got some > questions/concerns I can answer here: > > * But, the colon is supposed to start a block! > > Well, it's already used in dict comprehensions/literals (though it's > true that there it's always inside brackets). And in lambdas ? > Here's code that is legal today (though not very practical): > > while lambda: True: > break > > * There's supposed to be only one obvious way to do it! We alredy have .items()! > > I don't think this stops us from adding a new way of doing things which > is more obvious than the old, and which should become the one way. > After all, you don't say "for key in mapping.keys():", even though > the keys() method exists. You just might, if you modified the dictionary in the loop body, and you wanted to process the original list of keys but didn't need to remember the original values and wanted to avoid the overhead of copying the values. > > * What exactly would it do? > > There are multiple options ? > - loop over .keys() and use __getitem__ each time, like the > dict() constructor? > - loop over .items(), like most of the code used today? > - become a well-specified "key/value iteration protocol" with > __iteritems__() and its own bytecode operation? > > ? but here I'm asking if building this bikeshed sounds useful, rather > than what paint to buy. I like it! It seems so intuitive that, like Sven, I wonder why it's not already in the language. As far as I can see it doesn't introduce any ambiguities. I am thinking of code such as for k,j : x,(y,z), in complicated_expression: I would guess (from a position of complete ignorance) that there would be no *insuperable* difficulty in parsing this. I suggest (without feeling strongly about it) that optional parentheses should be allowed for stylistic reasons, i.e. for ( k : v ) in a_dict: [I thought about for { k : v } in a_dict: before I realised that this is currently legal, albeit (probably) nonsensical, syntax. [Python 2.7.3]] One downside: Whatever implementation is chosen, it will not be "the one obvious way to do it". E.g. an .iteritems()-like implementation will fail if the dictionary is modified during the loop. an .items()-like implementation will be expensive on a huge dictionary. As there are already several ways of iterating over a dictionary, I think the new construct should be semantically equivalent to one of the existing ways, so that we don't have yet another behaviour to learn. My bikeshed colour is that it be equivalent to using .items(), as I think this is least likely to trip up newbies (it won't raise an error if the dictionary is modified); YMMV. Rob Cliffe > > That said, I do have a proof of concept implementation of the second > option, in case you'd like to play around with this: > Github: https://github.com/encukou/cpython/tree/keyval-iteration > patch: https://github.com/encukou/cpython/commit/b9b0d973342280f0ef52e26a4b67f326ece82a54.patch > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > ----- > No virus found in this message. > Checked by AVG - www.avg.com > Version: 2014.0.4821 / Virus Database: 4365/10312 - Release Date: 07/26/15 From steve at pearwood.info Mon Jul 27 04:12:09 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 27 Jul 2015 12:12:09 +1000 Subject: [Python-ideas] Syntax for key-value iteration over mappings In-Reply-To: References: Message-ID: <20150727021209.GW25179@ando.pearwood.info> On Sun, Jul 26, 2015 at 06:09:17PM +0200, Petr Viktorin wrote: > Hello, > Currently, the way to iterate over keys and values of a mapping > is to call items() and iterate over the resulting view:: > > for key, value in a_dict.items(): > print(key, value) > > I believe that looping over all the data in a dict is a very imporant > operation, and I find myself writing this quite often. Every time I do, > it seems it's boilerplate; What part looks like boilerplate? The "for"? The "key,value"? The "in"? The "a_dict"? If none of them are boilerplate, why would ".items()" be boilerplate? > it looks a like a workaround rather than a > preferred way of doing things. A work-around for what? It can't be "work-around for lack of a way to get the (key,value) pairs from a dict", because the items() method *is* the preferred way to get the (key,value) pairs from a dict, and has been since Python 1.5 or even older. I don't think that describing an explicit call to items() method as "boilerplate" or "a work-around" can be justified. If it is either, then the terms are so meaningless that they could be applied to anything at all. > In dict comprehensions and literals, key-value pairs are separated by > colons. How about allowing that in for loops as well? > > for key: value in a_dict: > print(key, value) A very strong -1 to this. It's ugly and unattractive. "for x:" looks like the end of a statement, not the beginning of one. Yes, as you point out, we can already write a similarly ugly statement "while lambda: None:" but nobody does, and just because existing syntax accidently allows one ugly construct doesn't give an excuse to deliberately add an ugly construct. It's one more special case syntax for beginners to learn. And it really is a special case: there's nothing about "for k:v in iterable" that tells you that iterable must have an items() method. You have to memorise that fact. Being a special case, you can only use this for iterables that have an items() method. You can't do: for k:v in [(1, 'a'), (2, 'b')]: ... because the list doesn't have an items() method. In dict literals and dict comprehensions, the k:v syntax is only used to construct the dict, not to extract items from it. We have a standard way of doing sequence bindings: a, b = ... # right-hand side must be a sequence of two items and the standard way of extracting (key, value) pairs from a mapping is the items() method. If you know that a mapping has only one item, we can even write: [[key, value]] = mapping.items() and sequence unpacking will do the work for us. Do you expect this to work too? [key:value] = mapping This proposed syntactic sugar doesn't add any new functionality or make anything simpler. It just saves you eight keystrokes in one special case. -- Steve From mistersheik at gmail.com Mon Jul 27 02:56:27 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Sun, 26 Jul 2015 17:56:27 -0700 (PDT) Subject: [Python-ideas] Syntax for key-value iteration over mappings In-Reply-To: References: Message-ID: Cool suggestion, but I prefer how things are. (As an aside, calling getitem each time is not efficient.) On Sunday, July 26, 2015 at 12:10:00 PM UTC-4, Petr Viktorin wrote: Hello, > Currently, the way to iterate over keys and values ??of a mapping > is to call items () and iterate over the Resulting view :: > > for key, value in A_dict.items (): > print (key, value) > > I believe That looping over all the data in a Dict is a very imporant > operation, and I find myself writing this quite Often. Every time I do, > it Seems it's boilerplate; it looks a like a workaround rather than a > preferred way of doing things. > > Dict in Comprehensions and Literals, key-value pairs are separated by > Colons. How about allowing that in for loops as well? > > for key: value in A_dict: > print (key, value) > > That I argue to anyone familiar with Dict Literals, let alone Dict > Comprehensions, the semantics of this loop shouldnt be pretty Obvious. > In Dict Comprehensions, Similarity to Existing syntax becomes even > more clear: > > A_mapping = {1: 'one', 2: 'two'} > = {val inverse: key for key: val in A_mapping} > > > I've bounced this idea off a few EuroPython Sprinters, and got some > questions / Concerns I can answer here: > > * But, the colon is supposed to start a block! > > Well, it's already used in Dict Comprehensions / Literals (though it's > true That there it's always inside brackets). And in lambdas - > Here's That code is legal today (though not very practical): > > the while lambda: True: > break > > * There's supposed to be only one obvious way to do it! We alredy have > .items ()! > > I do not think this stops us from adding a new way of doing things Which > is more than the Obvious old, and Which shouldnt become one the way. > After all, you do not say "for key in Mapping.keys (): ", even though > the keys () method exists. > > * What exactly would it do? > > There are multiple options - > - loop over .keys () and use __getitem__ each time, like the > Dict () constructor? > - loop over .items (), like most of the code used today? > - become a well-specified "key / value iteration protocol "with > __iteritems __ () and its own bytecode operation? > > - But here I'm asking if this building Bikeshed sounds useful, rather > than what to buy paint. > > > That said, I do have a proof of concept Implementation of the second > option, in case you'd like to play around with mailing list Python ... @ > Python.org Https://mail.python.org/ > mailman / Listinfo > / python-ideas > Code of Conduct: Http://python.org/psf/ > Codeofconduct / > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Mon Jul 27 04:36:48 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 27 Jul 2015 12:36:48 +1000 Subject: [Python-ideas] Syntax for key-value iteration over mappings In-Reply-To: <20150727021209.GW25179@ando.pearwood.info> References: <20150727021209.GW25179@ando.pearwood.info> Message-ID: <20150727023647.GX25179@ando.pearwood.info> On Mon, Jul 27, 2015 at 12:12:09PM +1000, Steven D'Aprano wrote: [...] > It's one more special case syntax for beginners to learn. And it really > is a special case: there's nothing about "for k:v in iterable" that > tells you that iterable must have an items() method. You have to > memorise that fact. I forgot to say, "or whatever implementation you choose for this syntax". It doesn't necessarily have to be calling the items() method, although the proof of concept given does that. The principle applies either way. -- Steve From ben+python at benfinney.id.au Mon Jul 27 06:23:46 2015 From: ben+python at benfinney.id.au (Ben Finney) Date: Mon, 27 Jul 2015 14:23:46 +1000 Subject: [Python-ideas] Syntax for key-value iteration over mappings References: Message-ID: <85d1zemdnx.fsf@benfinney.id.au> Petr Viktorin writes: > Currently, the way to iterate over keys and values of a mapping is to > call items() and iterate over the resulting view:: > > for key, value in a_dict.items(): > print(key, value) > > I believe that looping over all the data in a dict is a very imporant > operation, and I find myself writing this quite often. Every time I > do, it seems it's boilerplate; it looks a like a workaround rather > than a preferred way of doing things. I am sympathetic to this complaint. It does seem that mapping, for all their ?obvious first choice? as a data structure, are more cumbersome to iterate through than other sequences. I tend to write the above as:: for (key, value) in a_dict.items(): # ... because it's easier to see that the items that come from the view are themselves two-item tuples which are then unpacked. > In dict comprehensions and literals, key-value pairs are separated by > colons. How about allowing that in for loops as well? > > for key: value in a_dict: > print(key, value) Hmm, that's a bit too easy to misread for my liking. A colon in the middle of a line, without clear parenthesis syntax nearby, looks too much like a single-line compound statement:: if foo: bar while True: flonk for key: value in a_dict: I would be only +0 on the above ?for? syntax, and would prefer that it remains a SyntaxError. Analogous to what I described above for the tuple unpacking, how about this:: for {key: value} in a_dict: # ... That makes the correspondence with a mapping much less ambiguous, and it clearly marks the whole item which will be emitted by the iteration. -- \ ?There's a certain part of the contented majority who love | `\ anybody who is worth a billion dollars.? ?John Kenneth | _o__) Galbraith, 1992-05-23 | Ben Finney From abarnert at yahoo.com Mon Jul 27 08:10:54 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 27 Jul 2015 08:10:54 +0200 Subject: [Python-ideas] Concurrency Modules In-Reply-To: <55B55706.2080907@mail.de> References: <559EFB73.5050606@mail.de> <9c139305-f583-46c1-b819-6a98dbd04acc@googlegroups.com> <55B2B0FB.1060409@mail.de> <55B3C93D.9090601@mail.de> <87egjvn3a6.fsf@vostro.rath.org> <55B4B149.1090408@mail.de> <59F2450C-FF41-4C62-9924-432993BC8A55@yahoo.com> <55B4C83B.2030904@mail.de> <556A4DCB-1E86-4FFF-B546-913BF3434E18@yahoo.com> <55B55706.2080907@mail.de> Message-ID: <28DB0EFC-7782-4A95-B712-A609623C95CC@yahoo.com> On Jul 26, 2015, at 23:54, Sven R. Kunze wrote: > > Big thanks to you, Andrew, Nick and Nikolaus for the latest comments and ideas. > > I think the table is in a very good shape now and the questions I started this thread with are now answered (at least) to my satisfaction. The relationships are clear (they are all different modules for the same overall purpose), they have different fields of application (cpu vs io) and they have slightly different properties. > > > How do we proceed from here? > > > Btw. the number of different approaches (currently 3, but I assume this will go up in the future) is quite unfortunate. It may go up to four with subinterpreters or something like PyParallel, but I can't see much reason for it to go beyond that in the foreseeable future. In theory, there are two possible things missing here: preemptive, non-GIL-restricted, CPU-parallel switching, with implicit shared data (like threads in, say, Java), and the same without implicit shared data but still with efficient explicit shared data (like Erlang processes). But I don't think the former will ever happen in CPython, and in other interpreters it will just use the same API that threads do today (as is already true for Jython). > What's even more unfortunate is the missing exchangeability due to API differences and a common syntax for executing functions concurrently. But you don't really need any social syntax. Submitting a function to an executor and getting back a future is only tricky in languages like Java because they don't have first-class functions. In Python > Something that struck me as odd was that asyncio got syntactic sugar although the module itself is actually quite young compared to the support of processes and of threads. These two alternatives have actually no a single bit of syntax support until now. The other two don't need that syntactic support. The point of the await keyword is to mark explicit switch points (yield from also does that, but it's also used in traditional generators, which can be confusing), while async is to mark functions that need to be awaited (yield or yield from also does that, but again, that can be confusing--plus, sometimes you need to make a function awaitable even though it doesn't await anything, which in 3.4 required either a meaningless yield or a special decorator). The fact that coroutines and generators are the same thing under the covers is a very nifty feature for interpreter implementors and maybe library implementors, but end users who just want to write coroutines shouldn't have to understand that. (This was obvious to Greg Ewing when he proposed cofunctions a few years ago, but it looks like nobody else really got it until people had experience using asyncio.) Since threads and processes both do implicit switching, they have no use for anything similar. Every expression may switch, not just await expressions, and every function may get switched out, not just async functions. One way to look at it is that the syntactic supports makes asyncio look almost as nice as threads--as nice as it can given that switches have to be explicit. (You can always use a third-party greenlet based library like gevent to give you implicit but still cooperative switching, which looks just like threads--although that can be misleading because it doesn't act just like threads.) From abarnert at yahoo.com Mon Jul 27 08:16:55 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 27 Jul 2015 08:16:55 +0200 Subject: [Python-ideas] Syntax for key-value iteration over mappings In-Reply-To: <55B58EDD.4020209@btinternet.com> References: <55B58EDD.4020209@btinternet.com> Message-ID: <85676CEA-A39B-46A7-9C59-91265687D2AF@yahoo.com> On Jul 27, 2015, at 03:52, Rob Cliffe wrote: > > One downside: Whatever implementation is chosen, it will not be "the one obvious way to do it". > E.g. an .iteritems()-like implementation will fail if the dictionary is modified during the loop. > an .items()-like implementation will be expensive on a huge dictionary. Why? The items method returns a view, an object that's backed by the dict itself. There is a bit of overhead, but it's constant, not linear on the dict size. You may be thinking of 2.7, where items creates a list of pairs. In 3.x, it's equivalent to the 2.7 viewitems, not the 2.7 items. From stephen at xemacs.org Mon Jul 27 12:21:25 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 27 Jul 2015 19:21:25 +0900 Subject: [Python-ideas] Syntax for key-value iteration over mappings In-Reply-To: References: Message-ID: <878ua1ew9m.fsf@uwakimon.sk.tsukuba.ac.jp> Petr Viktorin writes: > Currently, the way to iterate over keys and values of a mapping > is to call items() and iterate over the resulting view:: > > for key, value in a_dict.items(): > print(key, value) > > I believe that looping over all the data in a dict is a very imporant > operation, and I find myself writing this quite often. Sure, but the obvious syntax: for key, value in a_dict: is already taken: it unpacks the key if it happens to be a tuple. I've always idly wondered why iteration over a mapping was taken to be an iteration over keys rather than over items. Idling just a little bit faster, I wonder if this isn't a throwback to the days when sets were emulated by dictionaries with constant value (eg, None). I'm hard put to think of the last time I wanted to actually iterate over keys, doing something *other* than extracting the value. However, given that the choice was made to iterate over keys rather than items, it doesn't bother me to put in explicit calls to .items or .values where needed. > In dict comprehensions and literals, key-value pairs are separated by > colons. How about allowing that in for loops as well? > > for key: value in a_dict: > print(key, value) This screams SyntaxError to me. Sure, I can figure out what's meant, but the cognitive burden would be large every time I saw it. More generally, YMMV but I don't see any real point in adding syntax for this. Steve From regebro at gmail.com Mon Jul 27 13:42:16 2015 From: regebro at gmail.com (Lennart Regebro) Date: Mon, 27 Jul 2015 13:42:16 +0200 Subject: [Python-ideas] Syntax for key-value iteration over mappings In-Reply-To: <20150727021209.GW25179@ando.pearwood.info> References: <20150727021209.GW25179@ando.pearwood.info> Message-ID: On Mon, Jul 27, 2015 at 4:12 AM, Steven D'Aprano wrote: > It's one more special case syntax for beginners to learn. And it really > is a special case: there's nothing about "for k:v in iterable" that > tells you that iterable must have an items() method. You have to > memorise that fact. This I think is a strong argument. What error would you get when it's the wrong type? An attribute error on .items(), or a special SyntaxError "This syntax can only be used on mappings". Both are quite incomprehensible unless you know exactly what is going on and that this is a shortcut for "fox x,y in foo.items():" From ncoghlan at gmail.com Mon Jul 27 16:35:14 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 28 Jul 2015 00:35:14 +1000 Subject: [Python-ideas] Concurrency Modules In-Reply-To: <55B55706.2080907@mail.de> References: <559EFB73.5050606@mail.de> <9c139305-f583-46c1-b819-6a98dbd04acc@googlegroups.com> <55B2B0FB.1060409@mail.de> <55B3C93D.9090601@mail.de> <87egjvn3a6.fsf@vostro.rath.org> <55B4B149.1090408@mail.de> <59F2450C-FF41-4C62-9924-432993BC8A55@yahoo.com> <55B4C83B.2030904@mail.de> <556A4DCB-1E86-4FFF-B546-913BF3434E18@yahoo.com> <55B55706.2080907@mail.de> Message-ID: On 27 July 2015 at 07:54, Sven R. Kunze wrote: > Something that struck me as odd was that asyncio got syntactic sugar > although the module itself is actually quite young compared to the support > of processes and of threads. These two alternatives have actually no a > single bit of syntax support until now. Their shared abstraction layer is the concurrent.futures module: https://docs.python.org/3/library/concurrent.futures.html (available for Python 2 as the "futures" module on PyPI) For "call and response" use cases involving pools of worker threads or processes, concurrent.futures is a better option than hand rolling our own pool management and request dispatch and response processing code. That model is integrated into the asyncio event loop to support dispatching blocking tasks to a background thread or process. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Mon Jul 27 17:04:24 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 28 Jul 2015 01:04:24 +1000 Subject: [Python-ideas] Syntax for key-value iteration over mappings In-Reply-To: <878ua1ew9m.fsf@uwakimon.sk.tsukuba.ac.jp> References: <878ua1ew9m.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 27 July 2015 at 20:21, Stephen J. Turnbull wrote: > Petr Viktorin writes: > > > Currently, the way to iterate over keys and values of a mapping > > is to call items() and iterate over the resulting view:: > > > > for key, value in a_dict.items(): > > print(key, value) > > > > I believe that looping over all the data in a dict is a very imporant > > operation, and I find myself writing this quite often. > > Sure, but the obvious syntax: > > for key, value in a_dict: > > is already taken: it unpacks the key if it happens to be a tuple. > I've always idly wondered why iteration over a mapping was taken to be > an iteration over keys rather than over items. Idling just a little > bit faster, I wonder if this isn't a throwback to the days when sets > were emulated by dictionaries with constant value (eg, None). I'm > hard put to think of the last time I wanted to actually iterate over > keys, doing something *other* than extracting the value. Looking up the original iterator PEP shows it was done to enforce the container invariant "for x in y: assert x in y". So that has_key() -> __contains__() change came first, and drove the subsequent selection of iterkeys() as the meaning of mapping iteration. At least, that's my reading of the dictionary iterator section in https://www.python.org/dev/peps/pep-0234/ > > In dict comprehensions and literals, key-value pairs are separated by > > colons. How about allowing that in for loops as well? > > > > for key: value in a_dict: > > print(key, value) > > This screams SyntaxError to me. Sure, I can figure out what's meant, > but the cognitive burden would be large every time I saw it. > > More generally, YMMV but I don't see any real point in adding syntax > for this. One point in favour is that many, many, years after ABCs were introduced at least in part to disambiguate the Sequence and Mapping APIs, we'd finally have a separate ducktyping protocol that was unique to mappings :) However, overall, I have to come down in the "-1" camp as well. With dict comprehensions, the dict syntax changes the type of the object produced, and matches the syntax of normal dict displays. In this case, the colon is present without its surrounding curly braces, so the prompts to think "dictionary" aren't as strong as they are in the comprehension case. Embedding this novel iteration syntax in comprehensions would make that confusion even worse. Since there'd still be a method call under the hood, the new syntax also wouldn't offer a performance benefit over calling the items() method explicitly. I actually quite liked the idea on a first impression, but it doesn't appear to hold up to closer scrutiny. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From rosuav at gmail.com Mon Jul 27 17:19:48 2015 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 28 Jul 2015 01:19:48 +1000 Subject: [Python-ideas] Syntax for key-value iteration over mappings In-Reply-To: <20150727021209.GW25179@ando.pearwood.info> References: <20150727021209.GW25179@ando.pearwood.info> Message-ID: On Mon, Jul 27, 2015 at 12:12 PM, Steven D'Aprano wrote: > Being a special case, you can only use this for iterables that have an > items() method. You can't do: > > for k:v in [(1, 'a'), (2, 'b')]: ... > > because the list doesn't have an items() method. > Here's a crazy alternative: Generalize it to subsume the common use of enumerate(). Iterate over a dict thus: for name:obj in globals(): # do something with the key and/or value And iterate over a list, generator, or any other simple linear iterable thus: for idx:val in sys.argv: # do something with the arg and its position In other words, the two-part iteration mode gives you values *and their indices*. If an object declares its own way of doing this, it provides the keys and values itself; otherwise, the default is equivalent to passing it through enumerate, so you'll get sequential numbers from zero. I don't know that this is a *good* idea (for one thing, simple iteration is equivalent to the first part for a dict, but the second part for everything else), but it does give a plausible meaning to two-part iteration that isn't over a dictionary. ChrisA From liik.joonas at gmail.com Mon Jul 27 17:25:19 2015 From: liik.joonas at gmail.com (Joonas Liik) Date: Mon, 27 Jul 2015 18:25:19 +0300 Subject: [Python-ideas] Syntax for key-value iteration over mappings In-Reply-To: References: <20150727021209.GW25179@ando.pearwood.info> Message-ID: > Here's a crazy alternative: Generalize it to subsume the common use of > enumerate(). Iterate over a dict thus: > > for name:obj in globals(): > # do something with the key and/or value > > And iterate over a list, generator, or any other simple linear iterable thus: > > for idx:val in sys.argv: > # do something with the arg and its position > > In other words, the two-part iteration mode gives you values *and > their indices*. If an object declares its own way of doing this, it > provides the keys and values itself; otherwise, the default is > equivalent to passing it through enumerate, so you'll get sequential > numbers from zero. > Well it may well be crazy but somewhere deep inside i actually quite like it.. Certainly more than a special syntax that only works on dicts.., and its quite a common use case imo. From srkunze at mail.de Mon Jul 27 17:30:38 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 27 Jul 2015 17:30:38 +0200 Subject: [Python-ideas] Syntax for key-value iteration over mappings In-Reply-To: References: <20150727021209.GW25179@ando.pearwood.info> Message-ID: <55B64E9E.8000900@mail.de> On 27.07.2015 13:42, Lennart Regebro wrote: > On Mon, Jul 27, 2015 at 4:12 AM, Steven D'Aprano wrote: >> It's one more special case syntax for beginners to learn. And it really >> is a special case: there's nothing about "for k:v in iterable" that >> tells you that iterable must have an items() method. You have to >> memorise that fact. > This I think is a strong argument. I cannot follow. There is nothing about 'await' that tells me it can only be used with coroutines. I need to memorize that fact, too. > What error would you get when it's the wrong type? An attribute error > on .items(), or a special SyntaxError "This syntax can only be used on > mappings". I would like such an error. Because it tells me that it is not what I wanted. The current methods silently works and I get an error later. I value the fact of seeing an error as soon as possible. Btw. if the proposed syntax is appropriate is another issue. But I would love to see an improvement on this field. > Both are quite incomprehensible unless you know exactly what is going > on and that this is a shortcut for "fox x,y in foo.items():" Same goes for .items(). It took some time to internalize this special case (at least from my perspective). From steve at pearwood.info Mon Jul 27 17:39:11 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 28 Jul 2015 01:39:11 +1000 Subject: [Python-ideas] Syntax for key-value iteration over mappings In-Reply-To: References: <20150727021209.GW25179@ando.pearwood.info> Message-ID: <20150727153911.GB25179@ando.pearwood.info> On Tue, Jul 28, 2015 at 01:19:48AM +1000, Chris Angelico wrote: > Here's a crazy alternative: Generalize it to subsume the common use of > enumerate(). Iterate over a dict thus: > > for name:obj in globals(): > # do something with the key and/or value > > And iterate over a list, generator, or any other simple linear iterable thus: > > for idx:val in sys.argv: > # do something with the arg and its position Yep, that's a crazy alternative alright :-) Okay, so we start with this: mapping = {'key': 'value', ...} for name:obj in mapping: log(name) process_some_object(obj) Then, one day, somebody passes this as mapping: mapping = [('key', 'value'), ...] and the only hint that something has gone wrong is that your logs contain 0 1 2 3 ... instead of the expected names. That will be some fun debugging, I'm sure. -- Steve From regebro at gmail.com Mon Jul 27 17:45:25 2015 From: regebro at gmail.com (Lennart Regebro) Date: Mon, 27 Jul 2015 17:45:25 +0200 Subject: [Python-ideas] Syntax for key-value iteration over mappings In-Reply-To: <55B64E9E.8000900@mail.de> References: <20150727021209.GW25179@ando.pearwood.info> <55B64E9E.8000900@mail.de> Message-ID: On Mon, Jul 27, 2015 at 5:30 PM, Sven R. Kunze wrote: > I cannot follow. There is nothing about 'await' that tells me it can only be > used with coroutines. I need to memorize that fact, too. No, because you get a syntax error when you use it incorrectly, so you don't need to memorize that. But here it works only with specific types. > I would like such an error. Because it tells me that it is not what I > wanted. The current methods silently works and I get an error later. Well, that is going to be the case now as well, you can't get away from that. >> Both are quite incomprehensible unless you know exactly what is going >> on and that this is a shortcut for "fox x,y in foo.items():" > > Same goes for .items(). It took some time to internalize this special case > (at least from my perspective). Sure, but now you have to learn what it is a special case of. All you did was hide that it calls .items(), so the error message "foo does not have an attribute 'items'" becomes harder to understand. You would need to change that error to something else. And it really should be, as you say, a SyntaxError, but it's a SyntaxError that can only be raise in runtime. Which I think breaks most peoples understandning of what a SyntaxError is... //Lennart From rosuav at gmail.com Mon Jul 27 17:54:09 2015 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 28 Jul 2015 01:54:09 +1000 Subject: [Python-ideas] Syntax for key-value iteration over mappings In-Reply-To: <20150727153911.GB25179@ando.pearwood.info> References: <20150727021209.GW25179@ando.pearwood.info> <20150727153911.GB25179@ando.pearwood.info> Message-ID: On Tue, Jul 28, 2015 at 1:39 AM, Steven D'Aprano wrote: > Okay, so we start with this: > > mapping = {'key': 'value', ...} > for name:obj in mapping: > log(name) > process_some_object(obj) > > > Then, one day, somebody passes this as mapping: > > mapping = [('key', 'value'), ...] > > and the only hint that something has gone wrong is that your logs > contain 0 1 2 3 ... instead of the expected names. That will be some > fun debugging, I'm sure. Except that that transformation already wouldn't work. How do you currently do the iteration over a dictionary? # Boom! AttributeError. for name,obj in mapping.items(): # Completely different semantics for name in mapping: obj = mapping[name] I don't know of any iteration method that's oblivious to the difference between a dict and a list of pairs; can you offer (toy) usage examples? ChrisA From steve at pearwood.info Mon Jul 27 17:55:19 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 28 Jul 2015 01:55:19 +1000 Subject: [Python-ideas] Syntax for key-value iteration over mappings In-Reply-To: <55B64E9E.8000900@mail.de> References: <20150727021209.GW25179@ando.pearwood.info> <55B64E9E.8000900@mail.de> Message-ID: <20150727155519.GC25179@ando.pearwood.info> On Mon, Jul 27, 2015 at 05:30:38PM +0200, Sven R. Kunze wrote: > On 27.07.2015 13:42, Lennart Regebro wrote: > >On Mon, Jul 27, 2015 at 4:12 AM, Steven D'Aprano > >wrote: > >>It's one more special case syntax for beginners to learn. And it really > >>is a special case: there's nothing about "for k:v in iterable" that > >>tells you that iterable must have an items() method. You have to > >>memorise that fact. > >This I think is a strong argument. > > I cannot follow. There is nothing about 'await' that tells me it can > only be used with coroutines. I need to memorize that fact, too. Yes, and you need to memorise what "for" loops do, and "len()", etc. But if you know English, the name is an aid to memory. There's no aid to memory with a:b syntax, and googling for it will be a pain. Not everything is important enough to be given its own syntax. That way leads past Perl and into APL. (At least APL tries to follow standard mathematical notation, rather than being a collection of arbitrary symbols.) `await` gives us a whole lot of new functionality that was hard or impossible to do before. What does this give us that we couldn't do before? What's so special about spam.items() that it needs dedicated syntax for it? These are not rhetorical questions. If you can answer those positively, then I'll reconsider my opposition to this. But if the only thing this syntax gains us is to avoid an explicit call to .items(), then it just adds unnecessary cruft to the language. > >What error would you get when it's the wrong type? An attribute error > >on .items(), or a special SyntaxError "This syntax can only be used on > >mappings". > > I would like such an error. Because it tells me that it is not what I > wanted. The current methods silently works and I get an error later. I don't understand you. If you write `for k,v in spam.items()` and spam has no items method, you get an AttributeError immediately. How does it silently work? -- Steve From srkunze at mail.de Mon Jul 27 18:02:18 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 27 Jul 2015 18:02:18 +0200 Subject: [Python-ideas] Syntax for key-value iteration over mappings In-Reply-To: References: <20150727021209.GW25179@ando.pearwood.info> <55B64E9E.8000900@mail.de> Message-ID: <55B6560A.4000207@mail.de> On 27.07.2015 17:45, Lennart Regebro wrote: > On Mon, Jul 27, 2015 at 5:30 PM, Sven R. Kunze wrote: >> I cannot follow. There is nothing about 'await' that tells me it can only be >> used with coroutines. I need to memorize that fact, too. > No, because you get a syntax error when you use it incorrectly, so you > don't need to memorize that. > But here it works only with specific types. What's the difference? > Well, that is going to be the case now as well, you can't get away from that. Is it? I don't think so. There are many case where this is not the case. >>> Both are quite incomprehensible unless you know exactly what is going >>> on and that this is a shortcut for "fox x,y in foo.items():" >> Same goes for .items(). It took some time to internalize this special case >> (at least from my perspective). > Sure, but now you have to learn what it is a special case of. All you > did was hide that it calls .items(), so the error message "foo does > not have an attribute 'items'" becomes harder to understand. You would > need to change that error to something else. And it really should be, > as you say, a SyntaxError, but it's a SyntaxError that can only be > raise in runtime. Which I think breaks most peoples understandning of > what a SyntaxError is... Nobody said it should be either. That is tiny detail and of course it should be a comprehensible error message. Btw. no newbie really knows what happens if they execute the default 'for' loop. You could say as well: "don't implement 'for' loops because they hide the fact of calling 'next'". From steve at pearwood.info Mon Jul 27 18:12:13 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 28 Jul 2015 02:12:13 +1000 Subject: [Python-ideas] Syntax for key-value iteration over mappings In-Reply-To: References: <20150727021209.GW25179@ando.pearwood.info> <20150727153911.GB25179@ando.pearwood.info> Message-ID: <20150727161213.GD25179@ando.pearwood.info> On Tue, Jul 28, 2015 at 01:54:09AM +1000, Chris Angelico wrote: > On Tue, Jul 28, 2015 at 1:39 AM, Steven D'Aprano wrote: > > Okay, so we start with this: > > > > mapping = {'key': 'value', ...} > > for name:obj in mapping: > > log(name) > > process_some_object(obj) > > > > > > Then, one day, somebody passes this as mapping: > > > > mapping = [('key', 'value'), ...] > > > > and the only hint that something has gone wrong is that your logs > > contain 0 1 2 3 ... instead of the expected names. That will be some > > fun debugging, I'm sure. > > Except that that transformation already wouldn't work. Exactly! The fact that it *doesn't work* with an explicit call to .items() is a good thing. You get an immediate error, the code doesn't silently do the wrong thing. Your suggestion silently does the wrong thing. If you want to support iteration over both mappings and sequences of (key,value) tuples, you need to make a deliberate decision to do so. You might use a helper function: for name, obj in pairwise_mapping(items): ... where pairwise_mapping contains the smarts to handle mappings and (key,value) tuples. And that's fine, because it is deliberate and explicit, not an accident of the syntax. In effect, your suggestion makes the a:b syntax a "Do What I Mean" operation. It tries to gues whether you want to call expr.items() or enumerate(expr). Building DWIM into the language is probably not a good idea. > I don't know of any iteration method that's oblivious to the > difference between a dict and a list of pairs; can you offer (toy) > usage examples? You have to handle it yourself. The dict constructor, and update method, do that. dict.update's docstring says: | update(...) | D.update(E, **F) -> None. Update D from dict/iterable E and F. | If E has a .keys() method, does: for k in E: D[k] = E[k] | If E lacks .keys() method, does: for (k, v) in E: D[k] = v | In either case, this is followed by: for k in F: D[k] = F[k] But notice that in your case, passing (key,value) doesn't give you the name=key, obj=value results you wanted. You get name=index, obj=(key,value) instead! DWIM is fine and dandy when it guesses what you want correctly, but when it doesn't, it silently does the wrong thing instead of giving you an immediate exception. -- Steve From regebro at gmail.com Mon Jul 27 18:23:04 2015 From: regebro at gmail.com (Lennart Regebro) Date: Mon, 27 Jul 2015 18:23:04 +0200 Subject: [Python-ideas] Syntax for key-value iteration over mappings In-Reply-To: <55B6560A.4000207@mail.de> References: <20150727021209.GW25179@ando.pearwood.info> <55B64E9E.8000900@mail.de> <55B6560A.4000207@mail.de> Message-ID: On Mon, Jul 27, 2015 at 6:02 PM, Sven R. Kunze wrote: > On 27.07.2015 17:45, Lennart Regebro wrote: >> >> On Mon, Jul 27, 2015 at 5:30 PM, Sven R. Kunze wrote: >>> >>> I cannot follow. There is nothing about 'await' that tells me it can only >>> be >>> used with coroutines. I need to memorize that fact, too. >> >> No, because you get a syntax error when you use it incorrectly, so you >> don't need to memorize that. >> But here it works only with specific types. > > What's the difference? Well, for one, one is a runtime error and the other is not. >> Well, that is going to be the case now as well, you can't get away from >> that. > > Is it? I don't think so. There are many case where this is not the case. No, there isn't. The proposed syntax will work if the variable is a mapping, but fail if it is any other type. The type will *only* be known once it's time to execute that statement. But sure, the same goes for "for x in y:" really. That only works with iterables. So maybe this isn't a problem. From rosuav at gmail.com Mon Jul 27 18:25:00 2015 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 28 Jul 2015 02:25:00 +1000 Subject: [Python-ideas] Syntax for key-value iteration over mappings In-Reply-To: <20150727161213.GD25179@ando.pearwood.info> References: <20150727021209.GW25179@ando.pearwood.info> <20150727153911.GB25179@ando.pearwood.info> <20150727161213.GD25179@ando.pearwood.info> Message-ID: On Tue, Jul 28, 2015 at 2:12 AM, Steven D'Aprano wrote: > In effect, your suggestion makes the a:b syntax a "Do What I Mean" > operation. It tries to gues whether you want to call expr.items() or > enumerate(expr). Building DWIM into the language is probably not a good > idea. I see what you mean. Yes, there's no easy way to iterate over either type, but that isn't the point. What my suggestion was positing was not so much DWIM as "iterate over the keys and values of anything". A mapping type has a concept of keys and values; an indexable sequence (list, tuple, etc) uses sequential numbers as indices and its members as values. (A set might choose to iterate this way by calling its members the indices, and using a fixed True as the value every time.) This would create a new iteration invariant. We currently have: for x in y: assert x in y With this, we would have: for k:v in x: assert x[k] is v And it should ideally raise an exception if this can't be done. (Which currently would be the case for sets, so my suggestion above would have to be accompanied by a set indexing definition that returns the same fixed value for anything that's in it - something like "def __getitem__(self, item): return item in self".) Now, I'm still not saying this is a *good* idea. But I do think it's internally consistent. Note that a simple definition using enumerate() would violate the assertion, as you could use this to iterate over a non-sequence and get indices and values. I'm not sure whether it's better to promise a simple invariant (ie non-sequences should raise TypeError if used in this way), or to adopt the stance of practicality and permit this. Both make sense. ChrisA From rosuav at gmail.com Mon Jul 27 20:38:13 2015 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 28 Jul 2015 04:38:13 +1000 Subject: [Python-ideas] Concurrency Modules In-Reply-To: References: <559EFB73.5050606@mail.de> <9c139305-f583-46c1-b819-6a98dbd04acc@googlegroups.com> <55B2B0FB.1060409@mail.de> Message-ID: On Mon, Jul 27, 2015 at 12:19 AM, Nick Coghlan wrote: > On 25 July 2015 at 15:32, Chris Angelico wrote: >> On a more serious note, I'd like to see some throughput tests for >> process-pool, thread-pool, and asyncio on a single thread. That'd make >> a great PyCon talk; make sure it's videoed, as I'd likely be linking >> to it a lot. > > Dave Beazley's "Python Concurrency from the Ground Up" talk at PyCon > US this year was almost exactly that: > https://us.pycon.org/2015/schedule/presentation/374/ > > Video: https://www.youtube.com/watch?v=MCs5OvhV9S4 > Demo code: https://github.com/dabeaz/concurrencylive Thanks for posting that, Nick! To everyone else who's trying to get their heads around where 'yield from' compares to threads, this is a great run-down. ChrisA From steve at pearwood.info Mon Jul 27 21:17:10 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 28 Jul 2015 05:17:10 +1000 Subject: [Python-ideas] Syntax for key-value iteration over mappings In-Reply-To: References: <20150727021209.GW25179@ando.pearwood.info> <20150727153911.GB25179@ando.pearwood.info> <20150727161213.GD25179@ando.pearwood.info> Message-ID: <20150727191710.GF25179@ando.pearwood.info> On Tue, Jul 28, 2015 at 02:25:00AM +1000, Chris Angelico wrote: > What my suggestion was positing was not so much DWIM as "iterate over > the keys and values of anything". A mapping type has a concept of keys > and values; an indexable sequence (list, tuple, etc) uses sequential > numbers as indices and its members as values. .............^^^^^^^ Keys and indices are not the same thing, and Python is not Lua. While there are certain similarities between the indices of a sequence and the keys of a mapping, they really don't play the same role in any meaningful sense. Consider: x in mapping # looks for a key x x in sequence # looks for a value x, not an index ("key") x Consider this too: mapping = {i:i**2 for i in range(1000)} sequence = [i**2 for i in range(1000)] for obj in (mapping, sequence): while 0 in obj.keys(): del obj[0] assert 0 not in obj.keys() The first problem is that lists don't have a keys() method. But that's okay, pretend that we've added one. Now your problems have only begun: len(mapping) # returns 1000-1 len(sequence) # returns 0 Well that sucks. Deleting (or inserting) a item into a sequence potentially changes the "keys" of all the other items. What sort of a mapping does that? Despite the apparent analogy of key <=> index, it's remarkable hard to think of any practical use for such a thing. I cannot think of any time I have wanted to, or might want to in the future, ducktype lists as dicts, with the indices treated as keys. The closest I can come up to is to support Lua-like arrays implemented as tables (mappings). When you create an array in Lua, it's not actually an array like in Python or a linked-list like in Lisp, but a (hash) table where the keys are automatically set to 1, 2, ... n by the interpreter. But that's just sugar for convenience. Lua arrays are still tables, they merely emulate arrays. And besides, that's the opposite: treating keys as indices, not indices as keys. Apart from practical problems such as the above, there's also a conceptual problem. Keys of a mapping are *intrinsic* properties of the mapping. But indices of a sequence are *extrinsic* properties. They aren't actually part of the sequence. Given the list [2,4,6] the "key" (actually index) 0 is not part of the list in any way. Some languages, like C and Python, treat those indices as starting from 0. Others treat them as starting from 1. Fortran and Pascal, if I remember correctly, let you index arrays over any contiguous range of integers, including negatives: foo = array[-20...20] of integer; or something like that. Conveniently the way we access keys and indices reflects this. Keys, being intrinsic to the mapping, is a method: mapping.keys() while indices, being extrinsic, is a function which can be applied to any iterable, with any starting value: enumerate(sequence, 1) enumerate(mapping, -5) [... snip proposal to treat sets {element} as {element:True} ...] > This would create a new iteration invariant. We currently have: Why do we need this invariant? What does it gain us to be able to say myset[element] and get True back, regardless of the value of element? Why not just say: True We can invent any invariants we like, but if they're not useful, why add cruft to the language to support something that we aren't going to use? -- Steve From rosuav at gmail.com Mon Jul 27 21:44:13 2015 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 28 Jul 2015 05:44:13 +1000 Subject: [Python-ideas] Syntax for key-value iteration over mappings In-Reply-To: <20150727191710.GF25179@ando.pearwood.info> References: <20150727021209.GW25179@ando.pearwood.info> <20150727153911.GB25179@ando.pearwood.info> <20150727161213.GD25179@ando.pearwood.info> <20150727191710.GF25179@ando.pearwood.info> Message-ID: On Tue, Jul 28, 2015 at 5:17 AM, Steven D'Aprano wrote: > On Tue, Jul 28, 2015 at 02:25:00AM +1000, Chris Angelico wrote: > >> What my suggestion was positing was not so much DWIM as "iterate over >> the keys and values of anything". A mapping type has a concept of keys >> and values; an indexable sequence (list, tuple, etc) uses sequential >> numbers as indices and its members as values. > .............^^^^^^^ > > > Keys and indices are not the same thing, and Python is not Lua. > ... > Despite the apparent analogy of key <=> index, it's remarkable hard to > think of any practical use for such a thing. I cannot think of any time > I have wanted to, or might want to in the future, ducktype lists as > dicts, with the indices treated as keys. A namedtuple is completely different from a list, too. But you can iterate over both. A generator is utterly different again, and you can iterate over that the exact same way. Are you ducktyping namedtuples as lists, with their attributes in definition order? Or are all of the above simply special cases of "thing you can iterate over to get a series of values"? Many types have a concept of keys/indices and their associated values. Yes, you're right, removing an element from a list changes the indices of all those after it; but the same goes for any sort of mutation of an iterable. With a dict, adding a new key/value pair can change the order of all the others. Does the fact that a list would never dare do such a thing mean that you shouldn't iterate over dicts and lists using the same syntax? Clearly not, because we can already do precisely that. You already have to be careful of mutating the thing you're iterating over: >>> l=[1,2,3] >>> for x in l: ... l.remove(x) ... print(x) ... 1 3 >>> l [2] > Apart from practical problems such as the above, there's also a > conceptual problem. Keys of a mapping are *intrinsic* properties of the > mapping. But indices of a sequence are *extrinsic* properties. They > aren't actually part of the sequence. Given the list [2,4,6] the "key" > (actually index) 0 is not part of the list in any way. Not sure the significance of this; whatever the indices are, they do exist. There is a canonical index for the value 2, and it can be determined by the aptly-named index() method: >>> [2,4,6].index(2) 0 If you were to iterate over that list in some way which pairs indices and values, it would give index 0 with value 2, index 1 with value 4, index 2 with value 6, and StopIteration. This is the behaviour of enumerate(), and nobody has ever complained that this is a bad way to work with list indices. > Conveniently the way we access keys and indices reflects this. Keys, > being intrinsic to the mapping, is a method: > > mapping.keys() > > while indices, being extrinsic, is a function which can be applied to > any iterable, with any starting value: > > enumerate(sequence, 1) > enumerate(mapping, -5) Not sure the point of this distinction, especially given that the starting value has to be 0 if the indexing into the original sequence is to work. > [... snip proposal to treat sets {element} as {element:True} ...] > > >> This would create a new iteration invariant. We currently have: > > Why do we need this invariant? What does it gain us to be able to say > > myset[element] > > and get True back, regardless of the value of element? Why not just say: > > True > > We can invent any invariants we like, but if they're not useful, why add > cruft to the language to support something that we aren't going to use? The invariant is nothing to do with treating {element} as {element:True}, that was just an example of how different types could viably respond to this kind of protocol. The invariant comes from the definition of index-value iteration, which is that iterable[index] is value. ChrisA From tjreedy at udel.edu Mon Jul 27 23:48:21 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 27 Jul 2015 17:48:21 -0400 Subject: [Python-ideas] Syntax for key-value iteration over mappings In-Reply-To: <20150727191710.GF25179@ando.pearwood.info> References: <20150727021209.GW25179@ando.pearwood.info> <20150727153911.GB25179@ando.pearwood.info> <20150727161213.GD25179@ando.pearwood.info> <20150727191710.GF25179@ando.pearwood.info> Message-ID: On 7/27/2015 3:17 PM, Steven D'Aprano wrote: > On Tue, Jul 28, 2015 at 02:25:00AM +1000, Chris Angelico wrote: > >> What my suggestion was positing was not so much DWIM as "iterate over >> the keys and values of anything". A mapping type has a concept of keys >> and values; an indexable sequence (list, tuple, etc) uses sequential >> numbers as indices and its members as values. > .............^^^^^^^ > > > Keys and indices are not the same thing, and Python is not Lua. Both sequences and dicts can be viewed and used as functions over a finite domain. This is pretty common. If one does, *then* the keys and indices serve the same role as inputs. > While there are certain similarities between the indices of a sequence > and the keys of a mapping, they really don't play the same role in any > meaningful sense. If one uses lists or dicts to implement functions as sets with efficient access, then indecex/keys both play the same role as inputs. In general, keys/indexes are both efficient means to retrieve objects. The access issue is precisely why we use dicts more that sets. > Consider: > x in mapping # looks for a key x > x in sequence # looks for a value x, not an index ("key") x For a function, 'in' looks for an input/output pair, so both the above are wrong for this usage. > Consider this too: > > mapping = {i:i**2 for i in range(1000)} > sequence = [i**2 for i in range(1000)] Construct a function as 'set' by adding one pair at a time. > for obj in (mapping, sequence): > while 0 in obj.keys(): > del obj[0] Deleting a pair from a function is a dubious operation. > assert 0 not in obj.keys() > The first problem is that lists don't have a keys() method. But that's > okay, pretend that we've added one. Now your problems have only begun: > > len(mapping) # returns 1000-1 > len(sequence) # returns 0 > > Well that sucks. Right. If deletion *is* allowed, then it must be limited to .pop() for list implementations of functions. > Despite the apparent analogy of key <=> index, it's remarkable hard to > think of any practical use for such a thing. I cannot think of any time > I have wanted to, or might want to in the future, ducktype lists as > dicts, with the indices treated as keys. Python already ducktypes list/dict and index/key by using the same subscript notation and corresponding special methods for both. Because of this, it is possible to write algorithm that work with both a list (or list of lists) and a dict or DefaultDict or subset of either. > Apart from practical problems such as the above, there's also a > conceptual problem. We seem to have different ideas of 'practical' versus 'conceptual'. > Keys of a mapping are *intrinsic* properties of the mapping. To me, this is as least in part a practical implementation issue. In a function, keys are intrinsically part of pairs. For a dict in general, they are not necessarily intrinsic properties of the values. In a namespace, the names are arbitrary access tools and not necessarily unique. There is no sense in which the objects are functionally derived from the names. > But indices of a sequence are *extrinsic* properties. Once the first item is given an index, the rest of the indexes follow. If the items are functionally derived from the indexes, then even the first index is not arbitrary. In spite of everything above, I am pretty dubious about adding x:y as an iteration target: I have no problem with mapping.items() and enumerate(iterable). -- Terry Jan Reedy From abarnert at yahoo.com Tue Jul 28 03:29:21 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 28 Jul 2015 03:29:21 +0200 Subject: [Python-ideas] Syntax for key-value iteration over mappings In-Reply-To: <20150727191710.GF25179@ando.pearwood.info> References: <20150727021209.GW25179@ando.pearwood.info> <20150727153911.GB25179@ando.pearwood.info> <20150727161213.GD25179@ando.pearwood.info> <20150727191710.GF25179@ando.pearwood.info> Message-ID: On Jul 27, 2015, at 21:17, Steven D'Aprano wrote: > > Keys and indices are not the same thing, and Python is not Lua. It may be worth doing a survey of other languages to see how they handle this. I think the most common thing is to have dict iteration yield either (key, value) pairs, especially in languages with pattern matching or at least basic Python-style tuple decomposition. For example, in Swift, you write `for (key, val) in d {...}`. Another common thing in more Java-ish languages is to yield special item objects, like C# KeyValuePair, which you'd use as `foreach(var item in myDictionary { spam(item.Key, item.Value); }`. Some languages treat dictionaries as iterables of keys, like Python. PHP does have something like this proposal: `foreach ($d as $k=>$v) {...}` vs. `foreach ($d as $k) {...}`. So does Go, although its syntax is `k, v` for a key-value pair vs. `k` for just the key (which obviously wouldn't work with Python-style tuple decomposition). I vaguely remember Tcl having something relevant here but I can't remember what it was. The only other language I can think of that does anything like allowing you treat a list as a mapping from indices is JS (and its various offshoots), but their for loop is really treating everything as an object, iterating both keys and methods (since they're both the same thing), and in the case of an array you get the indices in arbitrary order, which is why documentation tells you that you probably don't want a for loop over an array. (They did add a foreach method to solve that, but it gives you key, index pairs for objects and value, index pairs for arrays.) Anyway, if someone can think of a language that does what's being proposed here, it should be easier to find out whether users' experience with that feature is positive or negative. From rosuav at gmail.com Tue Jul 28 03:43:11 2015 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 28 Jul 2015 11:43:11 +1000 Subject: [Python-ideas] Syntax for key-value iteration over mappings In-Reply-To: References: <20150727021209.GW25179@ando.pearwood.info> <20150727153911.GB25179@ando.pearwood.info> <20150727161213.GD25179@ando.pearwood.info> <20150727191710.GF25179@ando.pearwood.info> Message-ID: On Tue, Jul 28, 2015 at 11:29 AM, Andrew Barnert via Python-ideas wrote: > On Jul 27, 2015, at 21:17, Steven D'Aprano wrote: >> >> Keys and indices are not the same thing, and Python is not Lua. > > It may be worth doing a survey of other languages to see how they handle this. Pike has two different forms of iteration: foreach (some_object, value) foreach (some_object; index; value) The first form works on arrays and such - sequences. It's fundamentally the same thing as Python's existing iteration. The second form behaves the way I'm describing, and was the inspiration for it :) But Pike's iterables are a lot more restricted than Python's, so it's easier there. ChrisA From steve at pearwood.info Tue Jul 28 05:42:25 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 28 Jul 2015 13:42:25 +1000 Subject: [Python-ideas] Syntax for key-value iteration over mappings In-Reply-To: References: <20150727021209.GW25179@ando.pearwood.info> <20150727153911.GB25179@ando.pearwood.info> <20150727161213.GD25179@ando.pearwood.info> <20150727191710.GF25179@ando.pearwood.info> Message-ID: <20150728034224.GG25179@ando.pearwood.info> On Mon, Jul 27, 2015 at 05:48:21PM -0400, Terry Reedy wrote: > On 7/27/2015 3:17 PM, Steven D'Aprano wrote: > >On Tue, Jul 28, 2015 at 02:25:00AM +1000, Chris Angelico wrote: > > > >>What my suggestion was positing was not so much DWIM as "iterate over > >>the keys and values of anything". A mapping type has a concept of keys > >>and values; an indexable sequence (list, tuple, etc) uses sequential > >>numbers as indices and its members as values. > >.............^^^^^^^ > > > > > >Keys and indices are not the same thing, and Python is not Lua. > > Both sequences and dicts can be viewed and used as functions over a > finite domain. This is pretty common. If one does, *then* the keys and > indices serve the same role as inputs. If you are talking about the fact that both dict and list subscript notation spam[x] is, in some sense, equivalent to the mathematical concept of a function that maps a single argument to some value, x -> f(x), then I understand *what* you are saying, but not *why* it is relevant. In Python, neither sequences nor mappings have the same API as functions, or are considered to be the same type of object. [...] > > Consider: > > >x in mapping # looks for a key x > >x in sequence # looks for a value x, not an index ("key") x > > For a function, 'in' looks for an input/output pair, so both the above > are wrong for this usage. For a function, `in` fails with TypeError: py> 42 in chr Traceback (most recent call last): File "", line 1, in TypeError: argument of type 'builtin_function_or_method' is not iterable I'm afraid the gist of your post and the connection between functions and mappings is to abstract for me to understand. -- Steve From steve at pearwood.info Tue Jul 28 06:05:10 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 28 Jul 2015 14:05:10 +1000 Subject: [Python-ideas] Syntax for key-value iteration over mappings In-Reply-To: References: <20150727021209.GW25179@ando.pearwood.info> <20150727153911.GB25179@ando.pearwood.info> <20150727161213.GD25179@ando.pearwood.info> <20150727191710.GF25179@ando.pearwood.info> Message-ID: <20150728040510.GH25179@ando.pearwood.info> On Tue, Jul 28, 2015 at 05:44:13AM +1000, Chris Angelico wrote: > On Tue, Jul 28, 2015 at 5:17 AM, Steven D'Aprano wrote: > > On Tue, Jul 28, 2015 at 02:25:00AM +1000, Chris Angelico wrote: > > > >> What my suggestion was positing was not so much DWIM as "iterate over > >> the keys and values of anything". A mapping type has a concept of keys > >> and values; an indexable sequence (list, tuple, etc) uses sequential > >> numbers as indices and its members as values. > > .............^^^^^^^ > > > > > > Keys and indices are not the same thing, and Python is not Lua. > > ... > > Despite the apparent analogy of key <=> index, it's remarkable hard to > > think of any practical use for such a thing. I cannot think of any time > > I have wanted to, or might want to in the future, ducktype lists as > > dicts, with the indices treated as keys. > > A namedtuple is completely different from a list, too. But you can > iterate over both. [...] Yes? What's your point? I fail to see how any of this is relevant to the analogy "indices of a sequence are mapping keys". Bottom line: Can you give a non-contrived, non-toy, practical example of where someone might want to seemlessly interchange (key,value) pairs from a mapping and (index,item) pairs from a sequence and expect to do something useful? Toy programming exercises like "print a table of key/index and value/item" aside: 0 1.0 1 2.0 2 4.0 3 16.0 That's a nice exercise for beginners, but doesn't justify new syntax. As I point out with the example of Lua, you can get quite far with the analogy "consecutive integer mapping keys are like indices". It's the other way which is dubious: indices aren't like keys in general, and I don't think there are many, if any, use-cases for treating sequences (let alone sets, let alone arbitrary iterables) as if they were a special case of mapping. But, you're proposing this. It shouldn't be up to me to prove that it's not useful. It should be up to you to prove that it is. > Many types have a concept of keys/indices and their associated values. > Yes, you're right, removing an element from a list changes the indices > of all those after it; but the same goes for any sort of mutation of > an iterable. With a dict, adding a new key/value pair can change the > order of all the others. The order of a mapping is generally not part of it's API. Your observation that adding an item to a dict may change the order of other items is not relevant. The point I am making is that deleting a key from a mapping doesn't change the *keys* of all the other items: del mapping[0] does not change the key 1 into 0, or key 2 into 1. But del sequence[0] does change the index of the items. Indices don't behave like keys! If you want to unify (key,value) and (index,item) as special cases of the same kind of thing, then you need to - justify how they can be the same when they behave so differently; and - explain how this makes Python a better language. -- Steve From rosuav at gmail.com Tue Jul 28 06:25:17 2015 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 28 Jul 2015 14:25:17 +1000 Subject: [Python-ideas] Syntax for key-value iteration over mappings In-Reply-To: <20150728040510.GH25179@ando.pearwood.info> References: <20150727021209.GW25179@ando.pearwood.info> <20150727153911.GB25179@ando.pearwood.info> <20150727161213.GD25179@ando.pearwood.info> <20150727191710.GF25179@ando.pearwood.info> <20150728040510.GH25179@ando.pearwood.info> Message-ID: On Tue, Jul 28, 2015 at 2:05 PM, Steven D'Aprano wrote: > Bottom line: > > Can you give a non-contrived, non-toy, practical example of where > someone might want to seemlessly interchange (key,value) pairs from a > mapping and (index,item) pairs from a sequence and expect to do > something useful? Toy programming exercises like "print a table of > key/index and value/item" aside: > > 0 1.0 > 1 2.0 > 2 4.0 > 3 16.0 > > That's a nice exercise for beginners, but doesn't justify new syntax. The most common case where you need the keys as well as the values is when you're working with parallel structures. Here's one with lists: tags = ["p", "li", "div", "body"] weights = [50, 30, 60, 10] counts = [0]*4 for idx, tag in enumerate(tags): if blob.find(tag) > weights[idx]: counts[idx] += 1 Yes, there are other ways you can structure this, but sometimes other considerations mean it's better to keep them separate and then iterate together. For read-only iteration you can of course zip() them together, but if you need to update something, that's a bit harder. In fact, that's probably a use-case as well, although I've never personally used it in real-world code: for idx, val in some_list: if condition: some_list[idx] *= whatever Now here's a dictionary-based equivalent: # Parallel iteration/mutation questions = { "color": "What color would you like your bikeshed to be?", "size": "How many bikes do you need to house?", "material": "Should the shed be made of metal, wood, or paper?", "location": "Whose backyard should we not build this in?", } defaults = {"color": "red", "size": "2", "material": "wood", "location": "City Hall"} answers = {} for kwd, msg in questions.items(): response = input("%s [%s] " % (msg, defaults[kwd])) if response == "q": break # see, can't use a list comp here answers[kwd] = response or defaults[kwd] You could think of this as a sequence (tuple or list), or as a keyword mapping. Both ways make reasonable sense, and either way, you need to know what the key/index is that you're working on. > But, you're proposing this. It shouldn't be up to me to prove that it's > not useful. It should be up to you to prove that it is. Well, I'm not pushing for this to be added to the language. I'm aiming much lower than that: merely that the idea is internally consistent, and satisfies the OP's need. I fully expect it to still be YAGNI rejected, but I believe it makes sense to ask the question, at least. ChrisA From stephen at xemacs.org Tue Jul 28 11:46:17 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 28 Jul 2015 18:46:17 +0900 Subject: [Python-ideas] Syntax for key-value iteration over mappings In-Reply-To: References: <878ua1ew9m.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <871tfsehsm.fsf@uwakimon.sk.tsukuba.ac.jp> Nick Coghlan writes: > Looking up the original iterator PEP shows it was done to enforce the > container invariant "for x in y: assert x in y". I would argue that in the context of a given mapping, and [, ] are equivalent when is [], so that we shou ld have the signature __contains__(self, key, value=self[key]), which is a NameError, but the intent should be obvious. The evident suggestion that we distinguish between tuples (which can be used as keys) and items by implementing the latter as lists (which can't, at least in the case of dictionaries) seems fragile and icky, though, and unacceptable at this point since it's incompatible with the .items() iterator. From toddrjen at gmail.com Tue Jul 28 15:28:17 2015 From: toddrjen at gmail.com (Todd) Date: Tue, 28 Jul 2015 15:28:17 +0200 Subject: [Python-ideas] Loop manager syntax Message-ID: Following the discussion of the new "async" keyword, I think it would be useful to provide a generic way to alter the behavior of loops. My idea is to allow a user to take control over the operation of a "for" or "while" loop. The basic idea is similar to context managers, where an object implementing certain magic methods, probably "__for__" and "__while__", could be placed in front of a "for" or "while" statement, respectively. This class would then be put in charge of carrying out the loop. Due to the similarity to context managers, I am tentatively calling this a "loop manager". What originally prompted this idea was parallelization. For example the "multiprocessing.Pool" class could act as a "for" loop manager, allowing you to do something like this: >>> from multiprocessing import Pool >>> >>> Pool() for x in range(20): ... do_something ... >>> The body of the "for" loop would then be run in parallel. However, there are other uses as well. For example, Python has no "do...while" structure, because nobody has come up with a clean way to do it (and probably nobody ever will). However, under this proposal it would be possible for a third-party package to implement a "while" loop manager that can provide this functionality: >>> from blah import do >>> >>> x = 10 >>> do while x < 20: ... x += 1 ... >>> The "do" class would just defer running the conditional until after executing the body of the "while" loop once. Another possible use-case would be to alter how the loop interacts with the surrounding namespace. It would be possible to limit the loop so only particular variables become part of the local namespace after the loop is finished, or just prevent the index from being preserved after a "for" loop is finished. I think, like context managers, this would provide a great deal of flexibility to the language and allow a lot of useful behaviors. Of course the syntax and details are just strawmen examples at this point, there may be much better syntaxes. But I think the basic idea of being able to control a loop in a manner like this is important. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ron3200 at gmail.com Tue Jul 28 17:26:12 2015 From: ron3200 at gmail.com (Ron Adam) Date: Tue, 28 Jul 2015 11:26:12 -0400 Subject: [Python-ideas] Syntax for key-value iteration over mappings In-Reply-To: <871tfsehsm.fsf@uwakimon.sk.tsukuba.ac.jp> References: <878ua1ew9m.fsf@uwakimon.sk.tsukuba.ac.jp> <871tfsehsm.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 07/28/2015 05:46 AM, Stephen J. Turnbull wrote: > Nick Coghlan writes: > > > Looking up the original iterator PEP shows it was done to enforce the > > container invariant "for x in y: assert x in y". > > I would argue that in the context of a given mapping, and > [, ] are equivalent when is [], so that > we shou ld have the signature __contains__(self, key, value=self[key]), > which is a NameError, but the intent should be obvious. If keys were an actual objects, with a binding to a value, then I think they would be equivalent. But I don't think we can change dictionaries to use them. Having a Key objects might not be a bad addition on it's own. It's a fundamental data object (like a lisp pair) that may be useful for creating other data objects. It would be like a named tuple except the value is mutable. And direct comparisons are done on the immutable key, not the mutable value. It may be possible to do... for key in a_set: assert key in a_set print(key.name, key.value) And... for key in a_set: key.value = next(data) vs for key in a_dict: a_dict[key] = next(data) I think the set version is easier to read and understand, but the dict version is probably faster and more efficient. If there was a syntax for defining a key... {name:value, ...} Oops dict, not set. ;-) {name:=value, ...} set of keys? Cheers, Ron From abarnert at yahoo.com Tue Jul 28 19:02:09 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 28 Jul 2015 19:02:09 +0200 Subject: [Python-ideas] Loop manager syntax In-Reply-To: References: Message-ID: <7E18DE93-FA18-4120-A09E-7E979E83211C@yahoo.com> On Jul 28, 2015, at 15:28, Todd wrote: > > Following the discussion of the new "async" keyword, I think it would be useful to provide a generic way to alter the behavior of loops. My idea is to allow a user to take control over the operation of a "for" or "while" loop. > > The basic idea is similar to context managers, where an object implementing certain magic methods, probably "__for__" and "__while__", could be placed in front of a "for" or "while" statement, respectively. This class would then be put in charge of carrying out the loop. Due to the similarity to context managers, I am tentatively calling this a "loop manager". > > What originally prompted this idea was parallelization. For example the "multiprocessing.Pool" class could act as a "for" loop manager, allowing you to do something like this: > > >>> from multiprocessing import Pool > >>> > >>> Pool() for x in range(20): > ... do_something > ... > >>> > > The body of the "for" loop would then be run in parallel. First, this code would create a Pool, use it, and leak it. And yes, sure, you could wrap this all in a with statement, but then the apparent niceness that seems to motivate the idea disappears. Second, what does the Pool.__for__ method get called with here? There's an iterable, a variable name that it has to somehow assign to in the calling function's scope, and some code (in what form exactly?) that it has to execute in that calling function's scope. You could do something like this for the most trivial __for__ method: def __for__(self, scope: ScopeType, iterable: Iterable, name: str, code: CodeType): for x in iterable: scope.assign(name, x) try: exec(code, scope) except LoopBreak: break except LoopContinue: continue except LoopYield as y make calling function yield?! except LoopReturn as r: make calling function return?! It would take a nontrivial change to the compiler to compile the body of the loop into a separate code object, but with assignments still counted in the outer scope's locals list, yield expressions still making the outer function into a generator function, etc. You'd need to invent this new scope object type (just passing local, nonlocal, global dicts won't work because you can have assignments inside a loop body). Making yield, yield from, and return act on the calling function is bad enough, but for the first two, you need some way to also resume into the loop code later. If you designed a full "degenerate function" that solved all of these problems, I think that would be more useful than this proposal; different people have tried to come up with ways of doing that for making continuations for various custom-control-flow-without-macros purposes, and it doesn't seem like an easy problem. But that still doesn't get you anywhere near what you need for this proposal, because your motivating example is trying to run the code in parallel. What exactly happens when one iteration does a break and 7 others are running at the same time? Do you change the semantics of break so it breaks "within a few iterations", or add a way to cancel existing iterations and roll back any changes they'd made to the scope, or...? And return and yield seem even more problematic here. And, beyond the problems with concurrency, you have cross-process problems. For example, how do you pickle a live scope from one interpreter, pass it to another interpreter, and make it work on the first interpreter's scope? And that may not be all the problems you'd need to solve to turn this into a real proposal. > However, there are other uses as well. For example, Python has no "do...while" structure, because nobody has come up with a clean way to do it (and probably nobody ever will). However, under this proposal it would be possible for a third-party package to implement a "while" loop manager that can provide this functionality: If someone can come up with a clean way to write this do object (even ignoring the fact that it appears to be a weird singleton global object--unless, contrary to other protocols, this one allows you do define the magic methods as @classmethods and then knows how to call them appropriately), why hasn't anyone come up with a clean way of writing a do...while structure? How would it be easier this way? > >>> from blah import do > >>> > >>> x = 10 > >>> do while x < 20: > ... x += 1 > ... > >>> > > The "do" class would just defer running the conditional until after executing the body of the "while" loop once. > > Another possible use-case would be to alter how the loop interacts with the surrounding namespace. It would be possible to limit the loop so only particular variables become part of the local namespace after the loop is finished, or just prevent the index from being preserved after a "for" loop is finished. Just designing the scope object that would give you a way to do this sounds like a big enough proposal on its own. Maybe you could do this in CPython by exposing the LocalsToFast and FastToLocals methods on frame objects, adding a frame constructor, and then wrapping that up in something (in pure Python) that has a nicer API for the purpose and disguises the fact that you're actually passing around interpreter frames. You might even be able to pull off a test implementation without hacking the interpreter by using ctypes.pythonapi? > I think, like context managers, this would provide a great deal of flexibility to the language and allow a lot of useful behaviors. Of course the syntax and details are just strawmen examples at this point, there may be much better syntaxes. But I think the basic idea of being able to control a loop in a manner like this is important. The major difference between this proposal and context managers is that you want to be able to have the loop manager drive the execution of its suite, while a context manager can't do that; it just has __enter__ and __exit__ methods that get called before and after the suite is executed normally. That's how it avoids all of the problems here. Of course it still uses a change to the interpreter to allow the __exit__ method to get called as part of exception handling, but it's easy to see how you could have implemented it as a source transformation into a try/finally, in which case it wouldn't have needed any new interpreter functionality at all. Maybe there's some way to rework your proposal into something that gets called to set up the loop, before and after the __next__ or expression test (with the after being passed the value and returning an optionally different value), and before and after each execution of the suite (the last two being very similar to what a context manager does). I don't see how any such thing could cause the suite to get executed in a process pool or in an isolated scope or any of your other motivating examples except the do...while simulator, but just because I'm not clever enough to see it doesn't mean you might not be. From rymg19 at gmail.com Tue Jul 28 19:17:44 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Tue, 28 Jul 2015 12:17:44 -0500 Subject: [Python-ideas] Loop manager syntax In-Reply-To: <7E18DE93-FA18-4120-A09E-7E979E83211C@yahoo.com> References: <7E18DE93-FA18-4120-A09E-7E979E83211C@yahoo.com> Message-ID: <291E8A33-8501-455C-8EEA-94989EB57C83@gmail.com> I feel like you're overcomplicating the internals if this. My personal implementation idea: functions. Basically, the pool example would be equivalent to: temp = Pool() def func(x): do_something temp.__for__(range(20), func) so that __for__ could be implemented like: def __for__(self, iter, func): # Do something with the pool Yields and returns could be implicitly propagated. However, this *does* start treading into the "everything is magically implicit" territory of Ruby and Perl, which completely contradicts Python's zen. On July 28, 2015 12:02:09 PM CDT, Andrew Barnert via Python-ideas wrote: >On Jul 28, 2015, at 15:28, Todd wrote: >> >> Following the discussion of the new "async" keyword, I think it would >be useful to provide a generic way to alter the behavior of loops. My >idea is to allow a user to take control over the operation of a "for" >or "while" loop. >> >> The basic idea is similar to context managers, where an object >implementing certain magic methods, probably "__for__" and "__while__", >could be placed in front of a "for" or "while" statement, respectively. >This class would then be put in charge of carrying out the loop. Due >to the similarity to context managers, I am tentatively calling this a >"loop manager". >> >> What originally prompted this idea was parallelization. For example >the "multiprocessing.Pool" class could act as a "for" loop manager, >allowing you to do something like this: >> >> >>> from multiprocessing import Pool >> >>> >> >>> Pool() for x in range(20): >> ... do_something >> ... >> >>> >> >> The body of the "for" loop would then be run in parallel. > >First, this code would create a Pool, use it, and leak it. And yes, >sure, you could wrap this all in a with statement, but then the >apparent niceness that seems to motivate the idea disappears. > >Second, what does the Pool.__for__ method get called with here? There's >an iterable, a variable name that it has to somehow assign to in the >calling function's scope, and some code (in what form exactly?) that it >has to execute in that calling function's scope. > >You could do something like this for the most trivial __for__ method: > >def __for__(self, scope: ScopeType, iterable: Iterable, name: str, >code: CodeType): > for x in iterable: > scope.assign(name, x) > try: > exec(code, scope) > except LoopBreak: > break > except LoopContinue: > continue > except LoopYield as y > make calling function yield?! > except LoopReturn as r: > make calling function return?! > >It would take a nontrivial change to the compiler to compile the body >of the loop into a separate code object, but with assignments still >counted in the outer scope's locals list, yield expressions still >making the outer function into a generator function, etc. You'd need to >invent this new scope object type (just passing local, nonlocal, global >dicts won't work because you can have assignments inside a loop body). >Making yield, yield from, and return act on the calling function is bad >enough, but for the first two, you need some way to also resume into >the loop code later. > >If you designed a full "degenerate function" that solved all of these >problems, I think that would be more useful than this proposal; >different people have tried to come up with ways of doing that for >making continuations for various custom-control-flow-without-macros >purposes, and it doesn't seem like an easy problem. > >But that still doesn't get you anywhere near what you need for this >proposal, because your motivating example is trying to run the code in >parallel. What exactly happens when one iteration does a break and 7 >others are running at the same time? Do you change the semantics of >break so it breaks "within a few iterations", or add a way to cancel >existing iterations and roll back any changes they'd made to the scope, >or...? And return and yield seem even more problematic here. > >And, beyond the problems with concurrency, you have cross-process >problems. For example, how do you pickle a live scope from one >interpreter, pass it to another interpreter, and make it work on the >first interpreter's scope? > >And that may not be all the problems you'd need to solve to turn this >into a real proposal. > >> However, there are other uses as well. For example, Python has no >"do...while" structure, because nobody has come up with a clean way to >do it (and probably nobody ever will). However, under this proposal it >would be possible for a third-party package to implement a "while" loop >manager that can provide this functionality: > >If someone can come up with a clean way to write this do object (even >ignoring the fact that it appears to be a weird singleton global >object--unless, contrary to other protocols, this one allows you do >define the magic methods as @classmethods and then knows how to call >them appropriately), why hasn't anyone come up with a clean way of >writing a do...while structure? How would it be easier this way? > >> >>> from blah import do >> >>> >> >>> x = 10 >> >>> do while x < 20: >> ... x += 1 >> ... >> >>> >> >> The "do" class would just defer running the conditional until after >executing the body of the "while" loop once. >> >> Another possible use-case would be to alter how the loop interacts >with the surrounding namespace. It would be possible to limit the loop >so only particular variables become part of the local namespace after >the loop is finished, or just prevent the index from being preserved >after a "for" loop is finished. > >Just designing the scope object that would give you a way to do this >sounds like a big enough proposal on its own. > >Maybe you could do this in CPython by exposing the LocalsToFast and >FastToLocals methods on frame objects, adding a frame constructor, and >then wrapping that up in something (in pure Python) that has a nicer >API for the purpose and disguises the fact that you're actually passing >around interpreter frames. You might even be able to pull off a test >implementation without hacking the interpreter by using >ctypes.pythonapi? > >> I think, like context managers, this would provide a great deal of >flexibility to the language and allow a lot of useful behaviors. Of >course the syntax and details are just strawmen examples at this point, >there may be much better syntaxes. But I think the basic idea of being >able to control a loop in a manner like this is important. > >The major difference between this proposal and context managers is that >you want to be able to have the loop manager drive the execution of its >suite, while a context manager can't do that; it just has __enter__ and >__exit__ methods that get called before and after the suite is executed >normally. That's how it avoids all of the problems here. Of course it >still uses a change to the interpreter to allow the __exit__ method to >get called as part of exception handling, but it's easy to see how you >could have implemented it as a source transformation into a >try/finally, in which case it wouldn't have needed any new interpreter >functionality at all. > >Maybe there's some way to rework your proposal into something that gets >called to set up the loop, before and after the __next__ or expression >test (with the after being passed the value and returning an optionally >different value), and before and after each execution of the suite (the >last two being very similar to what a context manager does). I don't >see how any such thing could cause the suite to get executed in a >process pool or in an isolated scope or any of your other motivating >examples except the do...while simulator, but just because I'm not >clever enough to see it doesn't mean you might not be. > >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >https://mail.python.org/mailman/listinfo/python-ideas >Code of Conduct: http://python.org/psf/codeofconduct/ -- Sent from my Nexus 5 with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Tue Jul 28 19:29:23 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 28 Jul 2015 19:29:23 +0200 Subject: [Python-ideas] Loop manager syntax In-Reply-To: References: Message-ID: On Jul 28, 2015, at 15:28, Todd wrote: > > Following the discussion of the new "async" keyword, I think it would be useful to provide a generic way to alter the behavior of loops. My idea is to allow a user to take control over the operation of a "for" or "while" loop. It strikes me that allowing control of comprehensions rather than loop statements might get you some of the desired benefits, while avoiding most of the problems I described in my previous email. The major difference is that comprehensions can only contain expressions, not statements, so all the issues of scopes, break and friends, etc. go away (or are already handled by the way comprehension functions are compiled). While yield is allowed inside a comprehension, it's very weird to do so. (You're essentially turning the list-building function into a generator function that returns the built list as the argument to its StopIteration; I suspect this is only legal because nobody felt the need to write the code to make it illegal, not because anyone found it useful?) You could come up with a syntax like this: values = [with pool spam(x) for x in iterable] The semantics are still a bit complicated (and still need to be defined, because what method(s) this should call with what arguments and what they should do is still not obvious), but they might not require any new kinds of objects or any new compilation modes or anything like that. I'm not sure how useful this would be, because you can always just wrap the expression in a lambda (or, in this case, use spam as-is) and call pool.map instead. From encukou at gmail.com Tue Jul 28 19:39:46 2015 From: encukou at gmail.com (Petr Viktorin) Date: Tue, 28 Jul 2015 19:39:46 +0200 Subject: [Python-ideas] Syntax for key-value iteration over mappings In-Reply-To: References: <20150727021209.GW25179@ando.pearwood.info> Message-ID: On Mon, Jul 27, 2015 at 5:19 PM, Chris Angelico wrote: > On Mon, Jul 27, 2015 at 12:12 PM, Steven D'Aprano wrote: >> Being a special case, you can only use this for iterables that have an >> items() method. You can't do: >> >> for k:v in [(1, 'a'), (2, 'b')]: ... >> >> because the list doesn't have an items() method. >> > > Here's a crazy alternative: Generalize it to subsume the common use of > enumerate(). Iterate over a dict thus: > > for name:obj in globals(): > # do something with the key and/or value > > And iterate over a list, generator, or any other simple linear iterable thus: > > for idx:val in sys.argv: > # do something with the arg and its position Keys and values are very different things than indices and items. Using the same syntax for retrieval from mappings and sequences is OK, but I don't see why other operations on them, and especially this one, would need to be similar. "Two-part iteration" is not the default/obvious way to loop over a list, so I don't think it should use special syntax. A method works just fine here. From encukou at gmail.com Tue Jul 28 19:39:51 2015 From: encukou at gmail.com (Petr Viktorin) Date: Tue, 28 Jul 2015 19:39:51 +0200 Subject: [Python-ideas] Syntax for key-value iteration over mappings In-Reply-To: <85d1zemdnx.fsf@benfinney.id.au> References: <85d1zemdnx.fsf@benfinney.id.au> Message-ID: On Mon, Jul 27, 2015 at 6:23 AM, Ben Finney wrote: > Petr Viktorin writes: [...] >> In dict comprehensions and literals, key-value pairs are separated by >> colons. How about allowing that in for loops as well? >> >> for key: value in a_dict: >> print(key, value) > > Hmm, that's a bit too easy to misread for my liking. > > A colon in the middle of a line, without clear parenthesis syntax > nearby, looks too much like a single-line compound statement:: > > if foo: bar > while True: flonk > for key: value in a_dict: > > I would be only +0 on the above ?for? syntax, and would prefer that it > remains a SyntaxError. > > > Analogous to what I described above for the tuple unpacking, how about > this:: > > for {key: value} in a_dict: > # ... > > That makes the correspondence with a mapping much less ambiguous, and it > clearly marks the whole item which will be emitted by the iteration. On the other hand, parenthesizing it makes it look like an expression, that is, something that can be part of a larger expression. Key/value unpacking only works as a target of a "for". From encukou at gmail.com Tue Jul 28 19:39:59 2015 From: encukou at gmail.com (Petr Viktorin) Date: Tue, 28 Jul 2015 19:39:59 +0200 Subject: [Python-ideas] Syntax for key-value iteration over mappings In-Reply-To: References: Message-ID: On Mon, Jul 27, 2015 at 2:56 AM, Neil Girdhar wrote: > Cool suggestion, but I prefer how things are. > > (As an aside, calling getitem each time is not efficient.) It is. But, that's what dict.update() does for mappings. With a dedicated key-value iteration protocol, that could be sped up :) From encukou at gmail.com Tue Jul 28 19:39:49 2015 From: encukou at gmail.com (Petr Viktorin) Date: Tue, 28 Jul 2015 19:39:49 +0200 Subject: [Python-ideas] Syntax for key-value iteration over mappings In-Reply-To: References: <20150727021209.GW25179@ando.pearwood.info> Message-ID: On Mon, Jul 27, 2015 at 1:42 PM, Lennart Regebro wrote: > On Mon, Jul 27, 2015 at 4:12 AM, Steven D'Aprano wrote: >> It's one more special case syntax for beginners to learn. And it really >> is a special case: there's nothing about "for k:v in iterable" that >> tells you that iterable must have an items() method. You have to >> memorise that fact. > > This I think is a strong argument. > > What error would you get when it's the wrong type? An attribute error > on .items(), or a special SyntaxError "This syntax can only be used on > mappings". > Both are quite incomprehensible unless you know exactly what is going > on and that this is a shortcut for "fox x,y in foo.items():" I think that should be "TypeError: 'foo' object is not a mapping" ? similarly to: >>> for x in 123: ... pass ... Traceback (most recent call last): File "", line 1, in TypeError: 'int' object is not iterable From encukou at gmail.com Tue Jul 28 19:39:55 2015 From: encukou at gmail.com (Petr Viktorin) Date: Tue, 28 Jul 2015 19:39:55 +0200 Subject: [Python-ideas] Syntax for key-value iteration over mappings In-Reply-To: <20150727021209.GW25179@ando.pearwood.info> References: <20150727021209.GW25179@ando.pearwood.info> Message-ID: On Mon, Jul 27, 2015 at 4:12 AM, Steven D'Aprano wrote: > On Sun, Jul 26, 2015 at 06:09:17PM +0200, Petr Viktorin wrote: >> Hello, >> Currently, the way to iterate over keys and values of a mapping >> is to call items() and iterate over the resulting view:: >> >> for key, value in a_dict.items(): >> print(key, value) >> >> I believe that looping over all the data in a dict is a very imporant >> operation, and I find myself writing this quite often. Every time I do, >> it seems it's boilerplate; > > What part looks like boilerplate? The "for"? The "key,value"? The "in"? > The "a_dict"? If none of them are boilerplate, why would ".items()" be > boilerplate? Yes, the .items(). I got the courage to post here after a EuroPython talk, "Through the lens of Haskell", where we discussed that unlike other languages, where libraries can define new operators or even syntax for common operations, Python tends to standardize syntax for common operations, and ends up with a few pieces of syntax and a few common interfaces that similar objects then implement. And so, Python ends up using punctuation for common cases, like: value = mapping[key] and methods for value = mapping.get(key, default) The first is the "obvious way" to do it; I can grok its meaning quickly just from the "shape" of the line. At a glance I can tell that "mapping" needs to be some container. In the second case something extra is going on, and parsing the word "get" needs a bit of extra cognitive overhead to alert me to this. I read that "mapping" needs to be an object with the "get" method. Similarly, when I read: for key, value in mapping.items() it looks like something "extra" is going on: it's a loop over tuples that contain the key and value. On the other hand, the proposed for key: value in mapping: would read, to me, as looping over all data in a dict. Of course there is a cost: new punctuation does need to be learned. Expressions like "{x: y for x in ...}" or "head, *tail = seq" or even "p = []" aren't obvious until you go through the Python 101. My assertion was that key-value looping is common enough (i.e. used in almost every nontrivial program), and the proposed syntax is close enough to similar uses of the colon (as a key-value separator), to justify every Python developer learning it. Now I know several core devs disagree with that, which means Python will probably be better without it. Thanks for the discussion, python-ideas! From abarnert at yahoo.com Tue Jul 28 19:48:55 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 28 Jul 2015 19:48:55 +0200 Subject: [Python-ideas] Loop manager syntax In-Reply-To: <291E8A33-8501-455C-8EEA-94989EB57C83@gmail.com> References: <7E18DE93-FA18-4120-A09E-7E979E83211C@yahoo.com> <291E8A33-8501-455C-8EEA-94989EB57C83@gmail.com> Message-ID: On Jul 28, 2015, at 19:17, Ryan Gonzalez wrote: > > I feel like you're overcomplicating the internals if this. Only because the internals _have_ to be overcomplicated if this is going to work. Unless you can come up with something simpler that actually works. Your solution doesn't solve any of the problems, so being simpler doesn't really matter. > My personal implementation idea: functions. > > Basically, the pool example would be equivalent to: > > temp = Pool() > def func(x): > do_something > temp.__for__(range(20), func) > > so that __for__ could be implemented like: > > def __for__(self, iter, func): > # Do something with the pool But now func has a different scope from the caller, so all of its assignments don't work. Also, it's illegal to put break or continue statements directly inside a function. So the inner function still has to be compiled in some special way. But what should outer break and continue get compiled to? It can't be the jump opcodes they normally become. And whatever you do, surely the break and continue have to be communicated to the controlling function, or it's not controlling the loop. Hence the LoopBreak and LoopContinue exceptions from my version. How can you simplify those away? (And this still doesn't answer the question of what break is supposed to do to a parallel loop.) And, while you say "yields and returns could be implicitly propagated", I'm not sure what that actually means semantically. For returns, how does the caller or the interpreter loop or anyone else even know whether the inner function did an explicit return (which has to get implicitly propagated) vs. just falling off the end (which can't be)? That's why I included the LoopReturn exception; I don't see how you can do without that either. If you want to make it more implicit, you can just make it so that if the controlling function doesn't handle LoopReturn, it's swallowed and the value of the LoopReturn is used as the value of the controlling function (not too different from StopIteration today), but that's adding more complexity onto my solution, not removing it. And yield and yield from have the same problems as return, plus the much more serious question of what it means to implicitly propagate them, and to implicitly propagate the next or send that continues the generator. Also the question of how the __for__ method gets the generator flag set at runtime depending on whether the function it's going to call has that flag set--or, if not that, how it can yield (implicitly or otherwise) without being a generator. And so on. Go through each of the problems I raised; your solution doesn't solve any of them, doesn't make any of them easier to solve, and makes some of them harder to solve. As I mentioned, this might be easier if you were trying to control comprehensions instead of for statements, but it also seems less useful there because it really adds nothing you can't do with an explicit function call like pool.map. From python-ideas at mgmiller.net Tue Jul 28 22:10:37 2015 From: python-ideas at mgmiller.net (Mike Miller) Date: Tue, 28 Jul 2015 13:10:37 -0700 Subject: [Python-ideas] Additional datetime-like module Message-ID: <55B7E1BD.8030509@mgmiller.net> (Apologies, I've gotten a bit lost in the recent PEP-431 discussion on -dev. To recap, it concerns changing datetime to use UTC internally.) When doing datetime calculations, I read two major use cases, one I'll call "simple" (aka naive), and the other "robust", that comes into play when multiple users, zones, dst, leaps, and/or calendar requirements surface. Instead of ignoring one case, or trying to shoehorn several into one module, I submit the cases could be broken into at least two modules, to make it very easy for the end-developer to understand which approach to use and when. As a non-expert-in-the-area developer, I can say that it is difficult to look at the docs of datetime (and time) and know which approach to choose. Then when you do choose, it isn't clear which functions and constants to use with each other. Could we leave datetime supporting the simple/naive approach, deprecating robust features, and have another module built on top that expected/enforced robust UTC-based calculations? Could we also focus the documentation of each module to that end? -Mike From gmludo at gmail.com Tue Jul 28 22:15:05 2015 From: gmludo at gmail.com (Ludovic Gasc) Date: Tue, 28 Jul 2015 22:15:05 +0200 Subject: [Python-ideas] Concurrency Modules In-Reply-To: <55B5508E.1000201@mail.de> References: <559EFB73.5050606@mail.de> <9c139305-f583-46c1-b819-6a98dbd04acc@googlegroups.com> <55B2B0FB.1060409@mail.de> <55B3C93D.9090601@mail.de> <55B5508E.1000201@mail.de> Message-ID: Hello, This discussion is pretty interesting to try to list when each architecture is the most efficient, based on the need. However, just a small precision: multiprocess/multiworker isn't antinomic with AsyncIO: You can have an event loop in each process to try to combine the "best" of two "worlds". As usual in IT, it isn't a silver bullet that will care the cancer, however, at least to my understanding, it should be useful for some business needs like server daemons. It isn't a crazy new idea, this design pattern is implemented since a long time ago at least in Nginx: http://www.aosabook.org/en/nginx.html If you are interested in to use this design pattern to build a HTTP server only, you can use easily aiohttp.web+gunicorn: http://aiohttp.readthedocs.org/en/stable/gunicorn.html If you want to use any AsyncIO server protocol (aiohttp.web, panoramisk, asyncssh, irc3d), you can use API-Hour: http://www.api-hour.io And if you want to implement by yourself this design pattern, be my guest, if a Python peon like me has implemented API-Hour, everybody on this mailing-list can do that. For communication between workers, I use Redis, however, you have plenty of solutions to do that. As usual, before to select a communication mechanism you should benchmark based on your use cases: some results should surprise you. Have a nice week. PS: Thank you everybody for EuroPython, it was amazing ;-) -- Ludovic Gasc (GMLudo) http://www.gmludo.eu/ 2015-07-26 23:26 GMT+02:00 Sven R. Kunze : > Next update: > > > Improving Performance by Running Independent Tasks Concurrently - A Survey > > > | processes | threads | > coroutines > > ---------------+-------------------------+----------------------------+------------------------- > purpose | cpu-bound tasks | cpu- & i/o-bound tasks | > i/o-bound tasks > | | > | > managed by | os scheduler | os scheduler + interpreter | customizable > event loop > controllable | no | no | > yes > | | > | > parallelism | yes | depends (cf. GIL) | > no > switching | at any time | after any bytecode | at > user-defined points > shared state | no | yes | > yes > | | > | > startup impact | biggest/medium* | medium | > smallest > cpu impact** | biggest | medium | > smallest > memory impact | biggest | medium | > smallest > | | > | > pool module | multiprocessing.Pool | multiprocessing.dummy.Pool | > asyncio.BaseEventLoop > solo module | multiprocessing.Process | threading.Thread | > --- > > > * > biggest - if spawn (fork+exec) and always on Windows > medium - if fork alone > > ** > due to context switching > > > On 26.07.2015 14:18, Paul Moore wrote: > > Just as a note - even given the various provisos and "it's not that > simple" comments that have been made, I found this table extremely > useful. Like any such high-level summary, I expect to have to take it > with a pinch of salt, but I don't see that as an issue - anyone who > doesn't fully appreciate that there are subtleties, probably wouldn't > read a longer explanation anyway. > > So many thanks for taking the time to put this together (and for > continuing to improve it). > > You are welcome. :) > > +1 on something like this ending up in the Python docs somewhere. > > Not sure how the process for this is but I think the Python gurus will > find a way. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From toddrjen at gmail.com Tue Jul 28 22:39:37 2015 From: toddrjen at gmail.com (Todd) Date: Tue, 28 Jul 2015 22:39:37 +0200 Subject: [Python-ideas] Loop manager syntax In-Reply-To: <7E18DE93-FA18-4120-A09E-7E979E83211C@yahoo.com> References: <7E18DE93-FA18-4120-A09E-7E979E83211C@yahoo.com> Message-ID: On Jul 28, 2015 7:02 PM, "Andrew Barnert" wrote: > > On Jul 28, 2015, at 15:28, Todd wrote: > > > > Following the discussion of the new "async" keyword, I think it would be useful to provide a generic way to alter the behavior of loops. My idea is to allow a user to take control over the operation of a "for" or "while" loop. > > > > The basic idea is similar to context managers, where an object implementing certain magic methods, probably "__for__" and "__while__", could be placed in front of a "for" or "while" statement, respectively. This class would then be put in charge of carrying out the loop. Due to the similarity to context managers, I am tentatively calling this a "loop manager". > > > > What originally prompted this idea was parallelization. For example the "multiprocessing.Pool" class could act as a "for" loop manager, allowing you to do something like this: > > > > >>> from multiprocessing import Pool > > >>> > > >>> Pool() for x in range(20): > > ... do_something > > ... > > >>> > > > > The body of the "for" loop would then be run in parallel. > > First, this code would create a Pool, use it, and leak it. And yes, sure, you could wrap this all in a with statement, but then the apparent niceness that seems to motivate the idea disappears. > > Second, what does the Pool.__for__ method get called with here? There's an iterable, a variable name that it has to somehow assign to in the calling function's scope, and some code (in what form exactly?) that it has to execute in that calling function's scope. > I wanted to avoid too much bikeshedding, but my thinking for the "__for__" method is that it would be passed an iterator (not an iterable), a variable name, four dicts containing the local, nonlocal, higher enclosing, and global namespaces, and a function-like object. Mutating the dicts would NOT alter the corresponding namespaces. In cases where one or more of the namespaces doesn't make sense the corresponding dict would be empty. The method would return three dicts containing the local, nonlocal, and global namespaces, any or all of which could be empty. Returning a non-empty dict in a case where the corresponding namespace doesn't make sense would raise an exception. The interpreter would merge these dicts back into the corresponding namespaces. The function-like object would be passed four dicts corresponding to the same namespaces, and would return a tuple of three dicts corresponding to the same namespaces. The interpreter would again be responsible for initializing the function-like object's namespaces with the contents of the dicts and pulling out those namespaces at the end. In the case of yield, the returned tuple will have one additional element for the yielded value. The interpreter would be in charge of remembering which yield it is at, but the function-like object would still be initialized with the namespaces provided by the method. So any loop handler that allows yielding will need to be able to get the correct values in the namespace, failing to do so will raise an exception. The function-like object always has an optional argument for injecting values into the yield, but passing anything to it when the function-like object is not at a yield that accepts a value would raise an exception. Returns and breaks will be exceptions, which contain the namespaces as extra data. Continues will work similar to returns in normal functions, causing the function to terminate normally and return the namespaces at the point the continue was encountered. The "__for__" class is in charge of putting the iterator values into the local namespace dict passed to the function-like object (or not), for determining what should be in the namespace dicts passed to the function-like object, and for figuring out what, if anything, should be in the namespace dicts returned at the end. How to deal with yields, breaks, and returns is up to the class designer. There is no reason all loop handlers would need to handle all possible loop behaviour. It would be possible to catch and re-raise break or return exceptions, or simply not handle them at all, in cases where they shouldn't be used. Similarly, a class could simply raise an exception if the function-like object tries to yield anything if yielding didn't make sense. While loop managers would be similar, except instead of a variable name and iterator it would be passed a second function-like object for the conditional and a tuple of variable names used in the conditional. This function-like object would return a namespace dict for the local namespace and a boolean for the result of the conditional. Ideally this namespace dict would be empty or None if it is identical to the input namespace. It would also be possible to have an alternative context manager implementation that works in the same way. It would just be passed namespace dicts and a function-like object and return namespace dicts. > It would take a nontrivial change to the compiler to compile the body of the loop into a separate code object, but with assignments still counted in the outer scope's locals list, yield expressions still making the outer function into a generator function, etc. You'd need to invent this new scope object type (just passing local, nonlocal, global dicts won't work because you can have assignments inside a loop body). Right, this is why the loop handler is passed namespace dicts and returns namespace dicts. Changes to any namespace will remain isolated until everything is done and the handler can determine what to do with them. > Making yield, yield from, and return act on the calling function is bad enough, but for the first two, you need some way to also resume into the loop code later. I think I addressed this. > But that still doesn't get you anywhere near what you need for this proposal, because your motivating example is trying to run the code in parallel. What exactly happens when one iteration does a break and 7 others are running at the same time? Do you change the semantics of break so it breaks "within a few iterations", or add a way to cancel existing iterations and roll back any changes they'd made to the scope, or...? And return and yield seem even more problematic here. In these cases it would probably just raise an exception telling you you can't use breaks or yields. > And, beyond the problems with concurrency, you have cross-process problems. For example, how do you pickle a live scope from one interpreter, pass it to another interpreter, and make it work on the first interpreter's scope? That is the whole point of passing namespace dicts around. > > However, there are other uses as well. For example, Python has no "do...while" structure, because nobody has come up with a clean way to do it (and probably nobody ever will). However, under this proposal it would be possible for a third-party package to implement a "while" loop manager that can provide this functionality: > > If someone can come up with a clean way to write this do object (even ignoring the fact that it appears to be a weird singleton global object--unless, contrary to other protocols, this one allows you do define the magic methods as @classmethods and then knows how to call them appropriately), why hasn't anyone come up with a clean way of writing a do...while structure? How would it be easier this way? It would be easier because it can be uglier. The bar for new statements is necessarily much, much, much higher than for third-party packages. I certainly wouldn't propose loop handlers solely or even primarily to allow do...while loops, this is more of a side benefit and an example of the sorts of variations on existing loop behaviour that would be possible. > > I think, like context managers, this would provide a great deal of flexibility to the language and allow a lot of useful behaviors. Of course the syntax and details are just strawmen examples at this point, there may be much better syntaxes. But I think the basic idea of being able to control a loop in a manner like this is important. > > The major difference between this proposal and context managers is that you want to be able to have the loop manager drive the execution of its suite, while a context manager can't do that; it just has __enter__ and __exit__ methods that get called before and after the suite is executed normally. That's how it avoids all of the problems here. Of course it still uses a change to the interpreter to allow the __exit__ method to get called as part of exception handling, but it's easy to see how you could have implemented it as a source transformation into a try/finally, in which case it wouldn't have needed any new interpreter functionality at all. Yes, that is why I said it was similar "in principle". The implementation is different, but I think the concepts have a lot in common. -------------- next part -------------- An HTML attachment was scrubbed... URL: From regebro at gmail.com Wed Jul 29 05:40:40 2015 From: regebro at gmail.com (Lennart Regebro) Date: Wed, 29 Jul 2015 05:40:40 +0200 Subject: [Python-ideas] Additional datetime-like module In-Reply-To: <55B7E1BD.8030509@mgmiller.net> References: <55B7E1BD.8030509@mgmiller.net> Message-ID: On Tue, Jul 28, 2015 at 10:10 PM, Mike Miller wrote: > (Apologies, I've gotten a bit lost in the recent PEP-431 discussion on -dev. > To recap, it concerns changing datetime to use UTC internally.) > > When doing datetime calculations, I read two major use cases, one I'll call > "simple" (aka naive), and the other "robust", that comes into play when > multiple users, zones, dst, leaps, and/or calendar requirements surface. I agree with that, but I don't see a problem with having them in the same module. In my opinion you should get the simple case when you don't have time zone information added to your datetime objects, and the robust calculations when you do. The only problem from where I stand is that today's robust calculations are incorrect. > As a non-expert-in-the-area developer, I can say that it is difficult to > look at the docs of datetime (and time) and know which approach to choose. Documentation could probably be better... /Lennart From bussonniermatthias at gmail.com Wed Jul 29 06:42:48 2015 From: bussonniermatthias at gmail.com (Matthias Bussonnier) Date: Tue, 28 Jul 2015 21:42:48 -0700 Subject: [Python-ideas] Additional datetime-like module In-Reply-To: References: <55B7E1BD.8030509@mgmiller.net> Message-ID: Hi, For whatever reason that make me strangely think of bytes vs unicode strings? Where you just s/simple case/str/s ; s/robust case/unicode/g and s/timezone/encoding/g Than being said, a new module might allow more flexibility in playing with the API, like in Delorean[1] for example. -- M [1] : http://delorean.readthedocs.org/en/latest/ On Tue, Jul 28, 2015 at 8:40 PM, Lennart Regebro wrote: > On Tue, Jul 28, 2015 at 10:10 PM, Mike Miller wrote: >> (Apologies, I've gotten a bit lost in the recent PEP-431 discussion on -dev. >> To recap, it concerns changing datetime to use UTC internally.) >> >> When doing datetime calculations, I read two major use cases, one I'll call >> "simple" (aka naive), and the other "robust", that comes into play when >> multiple users, zones, dst, leaps, and/or calendar requirements surface. > > I agree with that, but I don't see a problem with having them in the > same module. In my opinion you should get the simple case when you > don't have time zone information added to your datetime objects, and > the robust calculations when you do. The only problem from where I > stand is that today's robust calculations are incorrect. > >> As a non-expert-in-the-area developer, I can say that it is difficult to >> look at the docs of datetime (and time) and know which approach to choose. > > Documentation could probably be better... > > /Lennart > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From regebro at gmail.com Wed Jul 29 06:48:37 2015 From: regebro at gmail.com (Lennart Regebro) Date: Wed, 29 Jul 2015 06:48:37 +0200 Subject: [Python-ideas] Additional datetime-like module In-Reply-To: References: <55B7E1BD.8030509@mgmiller.net> Message-ID: On Wed, Jul 29, 2015 at 6:42 AM, Matthias Bussonnier wrote: > Hi, > > For whatever reason that make me strangely think of bytes vs unicode strings? > Where you just s/simple case/str/s ; s/robust case/unicode/g and > s/timezone/encoding/g > > Than being said, a new module might allow more flexibility in playing > with the API, like in Delorean[1] for example. Two datetime modules with different API is definitely not a good idea. I do think a new module is necessary, but it should completely replace not just datetime, but also time. Maybe even calendar. There is overlap (and some cruft) between these modules, and it can be hard to understand exactly what function to use when, etc. //Lennart From ncoghlan at gmail.com Wed Jul 29 07:00:00 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 29 Jul 2015 15:00:00 +1000 Subject: [Python-ideas] Loop manager syntax In-Reply-To: References: Message-ID: On 28 July 2015 at 23:28, Todd wrote: > Following the discussion of the new "async" keyword, I think it would be > useful to provide a generic way to alter the behavior of loops. My idea is > to allow a user to take control over the operation of a "for" or "while" > loop. > > The basic idea is similar to context managers, where an object implementing > certain magic methods, probably "__for__" and "__while__", could be placed > in front of a "for" or "while" statement, respectively. This class would > then be put in charge of carrying out the loop. Due to the similarity to > context managers, I am tentatively calling this a "loop manager". Guido's original PEP 340 (what eventually became PEP 343's context managers) is worth reading for background here: https://www.python.org/dev/peps/pep-0340/ And then the motivation section in PEP 343 covers why he changed his mind away from introducing a new general purpose looping construct and proposed the simpler context management protocol instead: https://www.python.org/dev/peps/pep-0343/ As such, rather than starting from the notion of a general purpose loop manager, we're likely better off focusing specifically on the parallelisation problem as Andrew suggests, and figuring out how we might go about enabling parallel execution of the components of a generator expression or container comprehension for at least the following cases: * native coroutine (async/await) * concurrent.futures.ThreadPoolExecutor.map * concurrent.futures.ProcessPoolExecutor.map Consider the following serial operation: result = sum(process(x, y, z) for x, y, z in seq) If "process" is a time consuming function, we may want to dispatch it to different processes in order to exploit all cores. Currently that looks like: with concurrent.futures.ProcessPoolExector() as pool: result = sum(pool.map(process, seq)) If "process" is a blocking IO operation rather than a CPU bound one, we may decide to save some IPC overhead, and use local threads instead (there's no default pool size for a thread executor): with concurrent.futures.ThreadPoolExector(max_workers=10) as pool: result = sum(pool.map(process, seq)) And if we're working with natively asynchronous algorithms: result = sum(await asyncio.gather(process_async(x, y, z) for x, y, z in seq)) That's what parallel dispatch of a loop with independent iterations already looks like today, with the key requirement being that you name the operation performed on each iteration (or use a lambda expression in the case of concurrent.futures). PEP 492 deliberately postponed the question of "What does an asynchronous comprehension look like?", because it wasn't clear what either the syntax *or* semantics should be, and as the above example shows, it's already fairly tidy if you're working with an already defined coroutine. Given the current suite level spelling for the concurrent.futures case, one could easily imagine a syntax like: result = sum(process(x, y, z) with pool for x, y, z in seq) That translated to: def _parallel_genexp(pool, seq): futures = [] with pool: for x, y, z in seq: futures.append(pool.__submit__(lambda x=x, y=y, z=z: process(x, y, z)) for future in futures: yield future.result() result = sum(_parallel_genexp(pool, seq)) Container comprehensions would replace the "yield future.result()" with "expr_result.append(item)", "expr_result.add(item)" or "expr_result[key] = value" as usual. To avoid destroying the executor with each use, a "persistent pool" wrapper could be added that delegated __submit__, but changed __enter__ and __exit__ into no-ops. Native coroutine syntax could then potentially be added using the async keyword already introduced in PEP 492, where: result = sum(process(x, y, z) with async for x, y, z in seq) May mean something like: async def _async_genexp(seq): futures = [] async for x, y, z in seq: async def _iteration(x=x, y=y, z=z): return process(x, y, z) futures.append(asyncio.ensure_future(_iteration())) return asyncio.gather(futures) result = sum(await _async_genexp(seq)) Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From pierre.quentel at gmail.com Wed Jul 29 08:22:57 2015 From: pierre.quentel at gmail.com (Pierre Quentel) Date: Wed, 29 Jul 2015 08:22:57 +0200 Subject: [Python-ideas] Syntax for key-value iteration over mappings In-Reply-To: References: <85d1zemdnx.fsf@benfinney.id.au> Message-ID: 2015-07-28 19:39 GMT+02:00 Petr Viktorin : > On Mon, Jul 27, 2015 at 6:23 AM, Ben Finney > wrote: > > On the other hand, parenthesizing it makes it look like an expression, > that is, something that can be part of a larger expression. > Key/value unpacking only works as a target of a "for". > If the proposal was accepted for "for k:v in iterable" then I suppose that "if k:v in iterable" would also be valid, meaning that for a dict, there is a pair (k, v) such that _dict[k] = v, and for a list that there is an index k such that _list[k] = v. for k:v in iterable: assert k:v in iterable > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Wed Jul 29 08:29:15 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Wed, 29 Jul 2015 08:29:15 +0200 Subject: [Python-ideas] Concurrency Modules In-Reply-To: References: <559EFB73.5050606@mail.de> <9c139305-f583-46c1-b819-6a98dbd04acc@googlegroups.com> <55B2B0FB.1060409@mail.de> <55B3C93D.9090601@mail.de> <55B5508E.1000201@mail.de> Message-ID: <55B872BB.5080603@mail.de> Thanks Ludovic. On 28.07.2015 22:15, Ludovic Gasc wrote: > Hello, > > This discussion is pretty interesting to try to list when each > architecture is the most efficient, based on the need. > > However, just a small precision: multiprocess/multiworker isn't > antinomic with AsyncIO: You can have an event loop in each process to > try to combine the "best" of two "worlds". > As usual in IT, it isn't a silver bullet that will care the cancer, > however, at least to my understanding, it should be useful for some > business needs like server daemons. I think that should be clear for everybody using any of these modules. But you are right to point it out explicitly. > > It isn't a crazy new idea, this design pattern is implemented since a > long time ago at least in Nginx: http://www.aosabook.org/en/nginx.html > > If you are interested in to use this design pattern to build a HTTP > server only, you can use easily aiohttp.web+gunicorn: > http://aiohttp.readthedocs.org/en/stable/gunicorn.html > If you want to use any AsyncIO server protocol (aiohttp.web, > panoramisk, asyncssh, irc3d), you can use API-Hour: http://www.api-hour.io > > And if you want to implement by yourself this design pattern, be my > guest, if a Python peon like me has implemented API-Hour, everybody on > this mailing-list can do that. > > For communication between workers, I use Redis, however, you have > plenty of solutions to do that. > As usual, before to select a communication mechanism you should > benchmark based on your use cases: some results should surprise you. > I hope not to disappoint you. I actually strive not to do that manually for each tiny bit of program (assuming there are many place in the code base where a project could benefit from concurrency). Personally, I use benchmarks for optimizing problematic code. But if Python would be able to do that without choosing the right and correctly configured approach (to be determined by benchmarks) that would be awesome. As usual, that needs time to evolve. I found that benchmark resulted improvements do not last forever, unfortunately, and that most of the time nobody is able to keep track of everything. So, as soon as something changes, you need to start anew. That is not acceptable for me. Btw. that is also a reason why a I said recently (another topic on this list), 'if Python could optimize that without my attention that would be great'. The simplest solution and therefore the easiest to comprehend for all team members is the way to go. If that is not efficient enough that is actually a Python issue. Readability counts most. And fortunately, most of the cases that attitude works perfectly with Python. :) > Have a nice week. > > PS: Thank you everybody for EuroPython, it was amazing ;-) > > -- > Ludovic Gasc (GMLudo) > http://www.gmludo.eu/ > > 2015-07-26 23:26 GMT+02:00 Sven R. Kunze >: > > Next update: > > > Improving Performance by Running Independent Tasks Concurrently - > A Survey > > > | processes | threads > | coroutines > ---------------+-------------------------+----------------------------+------------------------- > purpose | cpu-bound tasks | cpu- & i/o-bound > tasks | i/o-bound tasks > | | | > managed by | os scheduler | os scheduler + > interpreter | customizable event loop > controllable | no | > no | yes > | | | > parallelism | yes | depends (cf. > GIL) | no > switching | at any time | after any > bytecode | at user-defined points > shared state | no | yes | yes > | | | > startup impact | biggest/medium* | > medium | smallest > cpu impact** | biggest | > medium | smallest > memory impact | biggest | medium | smallest > | | | > pool module | multiprocessing.Pool | > multiprocessing.dummy.Pool | asyncio.BaseEventLoop > solo module | multiprocessing.Process | > threading.Thread | --- > > > * > biggest - if spawn (fork+exec) and always on Windows > medium - if fork alone > > ** > due to context switching > > > On 26.07.2015 14:18, Paul Moore wrote: >> Just as a note - even given the various provisos and "it's not that >> simple" comments that have been made, I found this table extremely >> useful. Like any such high-level summary, I expect to have to take it >> with a pinch of salt, but I don't see that as an issue - anyone who >> doesn't fully appreciate that there are subtleties, probably wouldn't >> read a longer explanation anyway. >> >> So many thanks for taking the time to put this together (and for >> continuing to improve it). > You are welcome. :) >> +1 on something like this ending up in the Python docs somewhere. > Not sure how the process for this is but I think the Python gurus > will find a way. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Wed Jul 29 08:33:31 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Wed, 29 Jul 2015 08:33:31 +0200 Subject: [Python-ideas] Loop manager syntax In-Reply-To: References: Message-ID: <55B873BB.1060508@mail.de> Guys, that is awesome. Nice spirit. I actually had another idea in mind regarding the 'concurrency syntax issue'. Not sure if you want to discuss the Pool manager syntax first. Or if I should start a new thread on this list. Best, Sven From python-ideas at mgmiller.net Wed Jul 29 10:38:43 2015 From: python-ideas at mgmiller.net (Mike Miller) Date: Wed, 29 Jul 2015 01:38:43 -0700 Subject: [Python-ideas] Additional datetime-like module In-Reply-To: References: <55B7E1BD.8030509@mgmiller.net> Message-ID: <55B89113.10900@mgmiller.net> Right, I was thinking largely the same API for both, with subclasses of the datetime objects with new behavior in the new module. I suppose, that the opposite could be done as well, write a new module with the new behavior, then replace datetime with a compatible api built from the new one if it made sense. -Mike On 07/28/2015 09:48 PM, Lennart Regebro wrote: > > Two datetime modules with different API is definitely not a good idea. > I do think a new module is necessary, but it should completely replace > not just datetime, but also time. Maybe even calendar. There is > overlap (and some cruft) between these modules, and it can be hard to > understand exactly what function to use when, etc. > > //Lennart > From ncoghlan at gmail.com Wed Jul 29 11:10:34 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 29 Jul 2015 19:10:34 +1000 Subject: [Python-ideas] Syntax for key-value iteration over mappings In-Reply-To: References: <85d1zemdnx.fsf@benfinney.id.au> Message-ID: On 29 July 2015 at 16:22, Pierre Quentel wrote: > If the proposal was accepted for "for k:v in iterable" then I suppose that > "if k:v in iterable" would also be valid, meaning that for a dict, there is > a pair (k, v) such that _dict[k] = v, and for a list that there is an index > k such that _list[k] = v. > > for k:v in iterable: > assert k:v in iterable This actually made me think of the quirky signatures of the dict constructor and dict.update, where it's possible to pass in either a mapping *or* an iterable of two-tuples: https://docs.python.org/3/library/stdtypes.html#dict.update If we went with the assumption that this syntax, if added, used those semantics, then you could reliably build (assuming hashable values) a reverse lookup table as: reverse_lookup = {v:k for k:v in data_source} At the moment, you have to restrict your input to mappings specifically: reverse_lookup = {v:k for k,v in data_source.items()} Or an iterable of 2-tuples: reverse_lookup = {v:k for k,v in data_source} Or use duck-typing: if hasattr(data_source, "items"): data_source = data_source.items() reverse_lookup = {v:k for k,v in data_source} I'd still be -0 on such a proposal with dict.update iteration semantics (as much as I think it's neat, I don't think the practical benefit is there to justify it), but it does have the virtue of extracting a particular iteration pattern from an existing builtin type. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From abarnert at yahoo.com Wed Jul 29 11:47:18 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 29 Jul 2015 11:47:18 +0200 Subject: [Python-ideas] Loop manager syntax In-Reply-To: References: <7E18DE93-FA18-4120-A09E-7E979E83211C@yahoo.com> Message-ID: On Jul 28, 2015, at 22:39, Todd wrote: > > On Jul 28, 2015 7:02 PM, "Andrew Barnert" wrote: > > > > On Jul 28, 2015, at 15:28, Todd wrote: > > > > > > Following the discussion of the new "async" keyword, I think it would be useful to provide a generic way to alter the behavior of loops. My idea is to allow a user to take control over the operation of a "for" or "while" loop. > > > > > > The basic idea is similar to context managers, where an object implementing certain magic methods, probably "__for__" and "__while__", could be placed in front of a "for" or "while" statement, respectively. This class would then be put in charge of carrying out the loop. Due to the similarity to context managers, I am tentatively calling this a "loop manager". > > > > > > What originally prompted this idea was parallelization. For example the "multiprocessing.Pool" class could act as a "for" loop manager, allowing you to do something like this: > > > > > > >>> from multiprocessing import Pool > > > >>> > > > >>> Pool() for x in range(20): > > > ... do_something > > > ... > > > >>> > > > > > > The body of the "for" loop would then be run in parallel. > > > > First, this code would create a Pool, use it, and leak it. And yes, sure, you could wrap this all in a with statement, but then the apparent niceness that seems to motivate the idea disappears. > > > > Second, what does the Pool.__for__ method get called with here? There's an iterable, a variable name that it has to somehow assign to in the calling function's scope, and some code (in what form exactly?) that it has to execute in that calling function's scope. > > > > I wanted to avoid too much bikeshedding, > But when you propose something this complicated, at least a sketch of the implementation is pretty much necessary, or nobody has any idea of the scope of what you're proposing. (Also, that sketch highlights all the pieces that Python is currently missing that would make this suggestion trivial, and I think some of those pieces are themselves interesting. And it would especially be a shame to do 90% of the work of building those new pieces just to implement this, but then not expose any of it.) > but my thinking for the "__for__" method is that it would be passed an iterator (not an iterable), a variable name, four dicts containing the local, nonlocal, higher enclosing, and global namespaces, and a function-like object. Mutating the dicts would NOT alter the corresponding namespaces. > That's a lot more complicated than it at first sounds, partly because Python currently doesn't have any notion of the "higher enclosing namespace", partly because the implicit thing that represents that namespace is made of cells, which are only constructed when needed. In particular, Python has to be able to see, at compile time, that your function is accessing a variable from an outer function so that it can mark the outer function as a cell variable and the inner function as a free variable so they can be connected when the inner function is created at runtime. Comprehensions and lambdas can ignore most of this because they can't contain statements, but your implicit loop functions can, so the compiler has to look inside them. And then, at runtime, you have to do something different--because the cells aren't actually being passed in to the implicit function, only their values, you have to add free variables to the outer function to make the post-call "namespace merging" work. It's not like you couldn't come up with the right algorithm to do all of this the way you want, it's just that it's very different from anything Python currently does, so you have to design it in detail before you can be sure it makes sense, not just hand wave it. Meanwhile, if mutating the dicts (that is, the inner function assigning to variables) doesn't affect the outer namespace, what about mutating the values in the dict? Do you deep-copy everything into the dicts, then "deep-update" back out later? If not, how does a line like "d[k] += 1" inside the loop end up working at all (where d is a global or nonlocal)? Meanwhile, this scope semantic is very weird if it interacts with the rest of Python. For example, if you call a closure that was built outside, but you've modified one of the variables that closure uses, it'll be called the with old value, right? That would be more than a little confusing. And finally, how does this copy-and-merge work in parallel? Let's say you do x=0 outside the loop, then inside the loop you do x+=i. So, the first iteration starts with x=0, changes it to x=1, and you merge back x=1. The second iteration starts with x=0, changes it to x=2, and you merge back x=2. This seems like a guaranteed race, with no way either the interpreter or the Pool.__for__ mechanism could even have a possibility of ending up with 3, much less guarantee it. I think what you actually need here is not copied and merged scopes, but something more like STM, which is a lot more complicated. > In cases where one or more of the namespaces doesn't make sense the corresponding dict would be empty. The method would return three dicts containing the local, nonlocal, and global namespaces, any or all of which could be empty. Returning a non-empty dict in a case where the corresponding namespace doesn't make sense would raise an exception. The interpreter would merge these dicts back into the corresponding namespaces. > > The function-like object would be passed four dicts corresponding to the same namespaces, and would return a tuple of three dicts corresponding to the same namespaces. The interpreter would again be responsible for initializing the function-like object's namespaces with the contents of the dicts and pulling out those namespaces at the end. > What does this "function-like object" look like? How do you call it with these dicts in such a way that their locals, nonlocals, and globals get set up as the variables the function was compiled to expect? (For just locals and globals, you can get the code object out, and exec that instead of calling the function, as I suggested--but you clearly don't want that, so what do you want instead?) Again, this is a new feature that I think would be more complicated, and more broadly useful, than your suggested feature, so it ought to be specified. > In the case of yield, the returned tuple will have one additional element for the yielded value. > How you do distinguish between "exited normally" and "yielded None"? > The interpreter would be in charge of remembering which yield it is at, but the function-like object would still be initialized with the namespaces provided by the method. So any loop handler that allows yielding will need to be able to get the correct values in the namespace, failing to do so will raise an exception. The function-like object always has an optional argument for injecting values into the yield, but passing anything to it when the function-like object is not at a yield that accepts a value would raise an exception. > Well, you also need to handle throwing exceptions into the function-like object. More importantly, if you think about this, you're proposing to have something like explicit generator states and/or continuations that can be manually passed around and called (and even constructed?). Together with your manual scopes (which can at least be merged into real scopes, if not allowing real scopes to be constructed), this gives you all kinds of cool things--I think you could write Stackless or greenlets in pure Python if you had this. But again, this is something we definitely don't have today, that has to be designed before you can just assume it for a new feature. > Returns and breaks will be exceptions, which contain the namespaces as extra data. > Which, as I said, obviously means that you've created a new compiler mode that, among other things, compiled returns and breaks into raises. > Continues will work similar to returns in normal functions, causing the function to terminate normally and return the namespaces at the point the continue was encountered. > > The "__for__" class is in charge of putting the iterator values into the local namespace dict passed to the function-like object (or not), for determining what should be in the namespace dicts passed to the function-like object, and for figuring out what, if anything, should be in the namespace dicts returned at the end. > Fine, but it's just filtering the four dicts the interpreter is magicking up, right? (And the dicts the function-like object is returning.) So the part the interpreter does is the interesting bit; the __for__ method will usually just be passing them along unchanged. > How to deal with yields, breaks, and returns is up to the class designer. There is no reason all loop handlers would need to handle all possible loop behaviour. It would be possible to catch and re-raise break or return exceptions, or simply not handle them at all, in cases where they shouldn't be used. Similarly, a class could simply raise an exception if the function-like object tries to yield anything if yielding didn't make sense. > But how does the class know whether, e.g., yielding from the calling function makes sense? It doesn't know what function(s) it's going to be called from. And, again, _how_ can it handle yielding if it decides to do so? You can't write a method that's sometimes a generator function and sometimes not (depending on whether some function-like object returns a special value or not). > While loop managers would be similar, except instead of a variable name and iterator it would be passed a second function-like object for the conditional and a tuple of variable names used in the conditional. > And it has to manually do the LEGB rule to dynamically figure out where to look up those names? Also, you realize that, even if it uses exactly the same rules as the normal interpreter, the effect will be different, because the interpreter applies the rule partly at compile time, not completely dynamically, right? > This function-like object would return a namespace dict for the local namespace and a boolean for the result of the conditional. Ideally this namespace dict would be empty or None if it is identical to the input namespace. > Why is it returning a namespace dict? An expression can't assign to a variable, so the namespace will always be identical. An expression can, of course, call a mutating method, but unless you're suggesting another deep-copy/deep-update (or STM transaction) here, the value is already going to be mutated, and the dict won't help anyway. > It would also be possible to have an alternative context manager implementation that works in the same way. It would just be passed namespace dicts and a function-like object and return namespace dicts. > Sure, but it would be a much, much larger change than what we currently have, or even than what was initially proposed, and it would have to answer most of the same questions raised here. Which may be why nobody suggested that as an implementation for context managers in the first place. > > It would take a nontrivial change to the compiler to compile the body of the loop into a separate code object, but with assignments still counted in the outer scope's locals list, yield expressions still making the outer function into a generator function, etc. You'd need to invent this new scope object type (just passing local, nonlocal, global dicts won't work because you can have assignments inside a loop body). > > Right, this is why the loop handler is passed namespace dicts and returns namespace dicts. Changes to any namespace will remain isolated until everything is done and the handler can determine what to do with them. > > > Making yield, yield from, and return act on the calling function is bad enough, but for the first two, you need some way to also resume into the loop code later. > > I think I addressed this. > > > But that still doesn't get you anywhere near what you need for this proposal, because your motivating example is trying to run the code in parallel. What exactly happens when one iteration does a break and 7 others are running at the same time? Do you change the semantics of break so it breaks "within a few iterations", or add a way to cancel existing iterations and roll back any changes they'd made to the scope, or...? And return and yield seem even more problematic here. > > In these cases it would probably just raise an exception telling you you can't use breaks or yields. > > > And, beyond the problems with concurrency, you have cross-process problems. For example, how do you pickle a live scope from one interpreter, pass it to another interpreter, and make it work on the first interpreter's scope? > > That is the whole point of passing namespace dicts around. > OK, think about this: I have a list a=[0]*8. Now, for i in range(8), I set a[i] = i**2. If I do that with a pool for, each instance gets its own copy of the scope with its own copy of a, which is modifies. How does the interpreter or the Pool.__for__ merge those 8 separate copies of a back in those 8 separate namespaces back into the original namespace? > > > However, there are other uses as well. For example, Python has no "do...while" structure, because nobody has come up with a clean way to do it (and probably nobody ever will). However, under this proposal it would be possible for a third-party package to implement a "while" loop manager that can provide this functionality: > > > > If someone can come up with a clean way to write this do object (even ignoring the fact that it appears to be a weird singleton global object--unless, contrary to other protocols, this one allows you do define the magic methods as @classmethods and then knows how to call them appropriately), why hasn't anyone come up with a clean way of writing a do...while structure? How would it be easier this way? > > It would be easier because it can be uglier. The bar for new statements is necessarily much, much, much higher than for third-party packages. I certainly wouldn't propose loop handlers solely or even primarily to allow do...while loops, this is more of a side benefit and an example of the sorts of variations on existing loop behaviour that would be possible. > > > > I think, like context managers, this would provide a great deal of flexibility to the language and allow a lot of useful behaviors. Of course the syntax and details are just strawmen examples at this point, there may be much better syntaxes. But I think the basic idea of being able to control a loop in a manner like this is important. > > > > The major difference between this proposal and context managers is that you want to be able to have the loop manager drive the execution of its suite, while a context manager can't do that; it just has __enter__ and __exit__ methods that get called before and after the suite is executed normally. That's how it avoids all of the problems here. Of course it still uses a change to the interpreter to allow the __exit__ method to get called as part of exception handling, but it's easy to see how you could have implemented it as a source transformation into a try/finally, in which case it wouldn't have needed any new interpreter functionality at all. > > Yes, that is why I said it was similar "in principle". The implementation is different, but I think the concepts have a lot in common. > But my point is that they're very different even in principle. One just supplies functions to get called around the suite, the other tries to control the way the suite is executed, using some kind of novel mechanism that still hasn't been designed. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Wed Jul 29 11:52:23 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 29 Jul 2015 11:52:23 +0200 Subject: [Python-ideas] Additional datetime-like module In-Reply-To: References: <55B7E1BD.8030509@mgmiller.net> Message-ID: <915F863B-21CC-4682-B179-43EB861720B1@yahoo.com> On Jul 29, 2015, at 06:48, Lennart Regebro wrote: > > On Wed, Jul 29, 2015 at 6:42 AM, Matthias Bussonnier > wrote: >> Hi, >> >> For whatever reason that make me strangely think of bytes vs unicode strings? >> Where you just s/simple case/str/s ; s/robust case/unicode/g and >> s/timezone/encoding/g >> >> Than being said, a new module might allow more flexibility in playing >> with the API, like in Delorean[1] for example. > > Two datetime modules with different API is definitely not a good idea. > I do think a new module is necessary, but it should completely replace > not just datetime, but also time. Maybe even calendar. There is > overlap (and some cruft) between these modules, and it can be hard to > understand exactly what function to use when, etc. That's not a bad idea. (Assuming you mean a very long-term deprecation or just "you probably shouldn't use this in new code, and it may be deprecated one of these decades" a la asyncore.) That would mean we get a sleep function, timing functions, etc. that take and return timedeltas instead of second or millis floats or ints, etc. You might want to go even farther and use datetime and timedelta objects in things like file times. It's a pretty big change, with a lot of backward compat risk, but the end result could be a much nicer stdlib. From regebro at gmail.com Wed Jul 29 12:07:14 2015 From: regebro at gmail.com (Lennart Regebro) Date: Wed, 29 Jul 2015 12:07:14 +0200 Subject: [Python-ideas] Additional datetime-like module In-Reply-To: <915F863B-21CC-4682-B179-43EB861720B1@yahoo.com> References: <55B7E1BD.8030509@mgmiller.net> <915F863B-21CC-4682-B179-43EB861720B1@yahoo.com> Message-ID: On Wed, Jul 29, 2015 at 11:52 AM, Andrew Barnert wrote: > On Jul 29, 2015, at 06:48, Lennart Regebro wrote: >> I do think a new module is necessary, but it should completely replace >> not just datetime, but also time. Maybe even calendar. There is >> overlap (and some cruft) between these modules, and it can be hard to >> understand exactly what function to use when, etc. > > That's not a bad idea. (Assuming you mean a very long-term deprecation or just "you probably shouldn't use this in new code, and it may be deprecated one of these decades" a la asyncore.) Exactly. The time and datetime modules would then hang around until some sort of big "we are gonna remove the stdlib cruft" in a future Python X.0. But I only have vague ideas so far. //Lennart From guido at python.org Wed Jul 29 12:20:48 2015 From: guido at python.org (Guido van Rossum) Date: Wed, 29 Jul 2015 12:20:48 +0200 Subject: [Python-ideas] Additional datetime-like module In-Reply-To: <915F863B-21CC-4682-B179-43EB861720B1@yahoo.com> References: <55B7E1BD.8030509@mgmiller.net> <915F863B-21CC-4682-B179-43EB861720B1@yahoo.com> Message-ID: Honestly, I think some of these proposals are way too ambitious and will drown in a sea of bike-shedding. Here's what I think is reasonable: - Keep the time module; POSIX timestamps represented as floats are still very useful (and ubiquitous in other APIs), and it would take decades to deprecate. - Keep the datetime module; it has its uses (and even fans), and again, deprecating would take decades. - Add a few convenience APIs to one or both of the above for better conversion between various forms of datetime and POSIX timestamps (I think there are a few missing APIs in this area). - Design a new module with a separate API for "human time manipulation". This is where operations belong like "add 1 month" or "add 1 day" with heuristics for edge cases (e.g. months of different lengths due to calendar conventions, and the occasional short or long day due to DST). Possibly this can borrow some API and/or implementation from pytz. Possibly it could be made available as a 3rd party module on PyPI first, to get it user-tested in quick iterations (quicker than core Python releases, anyway). But that doesn't preclude also writing a PEP. This module needs to interface somehow with the datetime module (and possibly also with the time module) in ways that need some serious thinking -- that's where I expect the hard part of writing the PEP will be. (Maybe a "human timedelta" type would be useful here? Eventually this might even be added to the datetime module.) There are a few additional issues which I'm not sure are included in this discussion or not: - A unified API for accessing an up-to-date timezone database. This has been discussed in the past, and we couldn't find a solution that satisfied all requirements (the main problem being different platform expectations IIRC), but it's probably worth it trying again. - Nanosecond precision? I'm not against adding this to the datetime module, as long as backward compatibility is somehow preserved, especially for pickles (maybe nanoseconds default to zero unless explicitly requested or specified). - An "is_dst" flag for datetime objects. Again, this should default to "backwards compatible" (== "don't care", the -1 value in struct tm) so that pickles are two-way compatible with Python 3.5 and before. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From rob.cliffe at btinternet.com Wed Jul 29 13:10:16 2015 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Wed, 29 Jul 2015 12:10:16 +0100 Subject: [Python-ideas] Additional datetime-like module In-Reply-To: References: <55B7E1BD.8030509@mgmiller.net> Message-ID: <55B8B498.30301@btinternet.com> On 29/07/2015 05:48, Lennart Regebro wrote: > Two datetime modules with different API is definitely not a good idea. > I do think a new module is necessary, but it should completely replace > not just datetime, but also time. Maybe even calendar. There is > overlap (and some cruft) between these modules, and it can be hard to > understand exactly what function to use when, etc. > > +1. From guido at python.org Wed Jul 29 12:42:09 2015 From: guido at python.org (Guido van Rossum) Date: Wed, 29 Jul 2015 12:42:09 +0200 Subject: [Python-ideas] Additional datetime-like module In-Reply-To: References: <55B7E1BD.8030509@mgmiller.net> <915F863B-21CC-4682-B179-43EB861720B1@yahoo.com> Message-ID: Oops, let's move this to datetime-sig. I've reposted this there under the subject "The BDFL's take". On Wed, Jul 29, 2015 at 12:20 PM, Guido van Rossum wrote: > Honestly, I think some of these proposals are way too ambitious and will > drown in a sea of bike-shedding. > [...] > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From toddrjen at gmail.com Wed Jul 29 13:16:22 2015 From: toddrjen at gmail.com (Todd) Date: Wed, 29 Jul 2015 13:16:22 +0200 Subject: [Python-ideas] Loop manager syntax In-Reply-To: References: <7E18DE93-FA18-4120-A09E-7E979E83211C@yahoo.com> Message-ID: On Wed, Jul 29, 2015 at 11:47 AM, Andrew Barnert wrote: > On Jul 28, 2015, at 22:39, Todd wrote: > > On Jul 28, 2015 7:02 PM, "Andrew Barnert" wrote: > > > > On Jul 28, 2015, at 15:28, Todd wrote: > > > > > > Following the discussion of the new "async" keyword, I think it would > be useful to provide a generic way to alter the behavior of loops. My idea > is to allow a user to take control over the operation of a "for" or "while" > loop. > > > > > > The basic idea is similar to context managers, where an object > implementing certain magic methods, probably "__for__" and "__while__", > could be placed in front of a "for" or "while" statement, respectively. > This class would then be put in charge of carrying out the loop. Due to > the similarity to context managers, I am tentatively calling this a "loop > manager". > > > > > > What originally prompted this idea was parallelization. For example > the "multiprocessing.Pool" class could act as a "for" loop manager, > allowing you to do something like this: > > > > > > >>> from multiprocessing import Pool > > > >>> > > > >>> Pool() for x in range(20): > > > ... do_something > > > ... > > > >>> > > > > > > The body of the "for" loop would then be run in parallel. > > > > First, this code would create a Pool, use it, and leak it. And yes, > sure, you could wrap this all in a with statement, but then the apparent > niceness that seems to motivate the idea disappears. > > > > Second, what does the Pool.__for__ method get called with here? There's > an iterable, a variable name that it has to somehow assign to in the > calling function's scope, and some code (in what form exactly?) that it has > to execute in that calling function's scope. > > > > I wanted to avoid too much bikeshedding, > > But when you propose something this complicated, at least a sketch of the > implementation is pretty much necessary, or nobody has any idea of the > scope of what you're proposing. (Also, that sketch highlights all the > pieces that Python is currently missing that would make this suggestion > trivial, and I think some of those pieces are themselves interesting. And > it would especially be a shame to do 90% of the work of building those new > pieces just to implement this, but then not expose any of it.) > I thought it was better to first determine whether the idea had any merit at all before getting into details. If the idea turned out to be stupid then there isn't much point getting into details. > but my thinking for the "__for__" method is that it would be passed an > iterator (not an iterable), a variable name, four dicts containing the > local, nonlocal, higher enclosing, and global namespaces, and a > function-like object. Mutating the dicts would NOT alter the corresponding > namespaces. > > That's a lot more complicated than it at first sounds, partly because > Python currently doesn't have any notion of the "higher enclosing > namespace", partly because the implicit thing that represents that > namespace is made of cells, which are only constructed when needed. In > particular, Python has to be able to see, at compile time, that your > function is accessing a variable from an outer function so that it can mark > the outer function as a cell variable and the inner function as a free > variable so they can be connected when the inner function is created at > runtime. Comprehensions and lambdas can ignore most of this because they > can't contain statements, but your implicit loop functions can, so the > compiler has to look inside them. And then, at runtime, you have to do > something different--because the cells aren't actually being passed in to > the implicit function, only their values, you have to add free variables to > the outer function to make the post-call "namespace merging" work. It's not > like you couldn't come up with the right algorithm to do all of this the > way you want, it's just that it's very different from anything Python > currently does, so you have to design it in detail before you can be sure > it makes sense, not just hand wave it. > Fair enough. > Meanwhile, if mutating the dicts (that is, the inner function assigning to > variables) doesn't affect the outer namespace, what about mutating the > values in the dict? Do you deep-copy everything into the dicts, then > "deep-update" back out later? If not, how does a line like "d[k] += 1" > inside the loop end up working at all (where d is a global or nonlocal)? > > The contents of the dict are deep-copied when passed to the loop manager. Whether the loop manager deep copies those or not is up to it. It may be possible to make the dicts directly represent the namespaces, and thus have changes immediately reflected in the namespace. I thought keeping things well-isolated was more important, but it may not be worth the trouble. > Meanwhile, this scope semantic is very weird if it interacts with the rest > of Python. For example, if you call a closure that was built outside, but > you've modified one of the variables that closure uses, it'll be called the > with old value, right? That would be more than a little confusing. > > Where and when did you modify it? > And finally, how does this copy-and-merge work in parallel? Let's say you > do x=0 outside the loop, then inside the loop you do x+=i. So, the first > iteration starts with x=0, changes it to x=1, and you merge back x=1. The > second iteration starts with x=0, changes it to x=2, and you merge back > x=2. This seems like a guaranteed race, with no way either the interpreter > or the Pool.__for__ mechanism could even have a possibility of ending up > with 3, much less guarantee it. I think what you actually need here is not > copied and merged scopes, but something more like STM, which is a lot more > complicated. > Again, this would be up to the loop handler. In the case of multiprocessing, each repetition will likely get x=0, and the last value (whatever that is) will be returned at the end. The documentation would need to say that the behavior in such cases is non-deterministic and shouldn't be counted on. > In cases where one or more of the namespaces doesn't make sense the > corresponding dict would be empty. The method would return three dicts > containing the local, nonlocal, and global namespaces, any or all of which > could be empty. Returning a non-empty dict in a case where the > corresponding namespace doesn't make sense would raise an exception. The > interpreter would merge these dicts back into the corresponding namespaces. > > The function-like object would be passed four dicts corresponding to the > same namespaces, and would return a tuple of three dicts corresponding to > the same namespaces. The interpreter would again be responsible for > initializing the function-like object's namespaces with the contents of the > dicts and pulling out those namespaces at the end. > > What does this "function-like object" look like? How do you call it with > these dicts in such a way that their locals, nonlocals, and globals get set > up as the variables the function was compiled to expect? (For just locals > and globals, you can get the code object out, and exec that instead of > calling the function, as I suggested--but you clearly don't want that, so > what do you want instead?) Again, this is a new feature that I think would > be more complicated, and more broadly useful, than your suggested feature, > so it ought to be specified. > I am not sure I am understanding the question. Can you please explain in more detail? > In the case of yield, the returned tuple will have one additional element > for the yielded value. > > How you do distinguish between "exited normally" and "yielded None"? > The first would return a length-3 tuple and the second would return a length-4 tuple with the last element being "None". > The interpreter would be in charge of remembering which yield it is at, > but the function-like object would still be initialized with the namespaces > provided by the method. So any loop handler that allows yielding will need > to be able to get the correct values in the namespace, failing to do so > will raise an exception. The function-like object always has an optional > argument for injecting values into the yield, but passing anything to it > when the function-like object is not at a yield that accepts a value would > raise an exception. > > Well, you also need to handle throwing exceptions into the function-like > object. > > I am not sure what you mean by this. > More importantly, if you think about this, you're proposing to have > something like explicit generator states and/or continuations that can be > manually passed around and called (and even constructed?). Together with > your manual scopes (which can at least be merged into real scopes, if not > allowing real scopes to be constructed), this gives you all kinds of cool > things--I think you could write Stackless or greenlets in pure Python if > you had this. But again, this is something we definitely don't have today, > that has to be designed before you can just assume it for a new feature. > I considered that possibility, but it seemed overly complicated. So at least in my proposal right now there is no way to get at the generator state. > Returns and breaks will be exceptions, which contain the namespaces as > extra data. > > Which, as I said, obviously means that you've created a new compiler mode > that, among other things, compiled returns and breaks into raises. > Correct, which is why I call this a "function-like object" rather than a "function". > Continues will work similar to returns in normal functions, causing the > function to terminate normally and return the namespaces at the point the > continue was encountered. > > The "__for__" class is in charge of putting the iterator values into the > local namespace dict passed to the function-like object (or not), for > determining what should be in the namespace dicts passed to the > function-like object, and for figuring out what, if anything, should be in > the namespace dicts returned at the end. > > Fine, but it's just filtering the four dicts the interpreter is magicking > up, right? (And the dicts the function-like object is returning.) So the > part the interpreter does is the interesting bit; the __for__ method will > usually just be passing them along unchanged. > Maybe, maybe not. In the case of parallel code, then no. In this case of serial code, well that depends on exactly what you want to do. I think that in many cases, messing with the namespace would be one of the big advantages. > How to deal with yields, breaks, and returns is up to the class designer. > There is no reason all loop handlers would need to handle all possible loop > behaviour. It would be possible to catch and re-raise break or return > exceptions, or simply not handle them at all, in cases where they shouldn't > be used. Similarly, a class could simply raise an exception if the > function-like object tries to yield anything if yielding didn't make > sense. > > But how does the class know whether, e.g., yielding from the calling > function makes sense? It doesn't know what function(s) it's going to be > called from. > > I am not understanding the question. Either there is a sane way to deal with yields, in which case it would do so, or there isn't, in which case it wouldn't. Can you name a situation where allowing a yield would be sane in some cases but not in others? > And, again, _how_ can it handle yielding if it decides to do so? You can't > write a method that's sometimes a generator function and sometimes not > (depending on whether some function-like object returns a special value or > not). > There are a couple of ways I thought of to deal with this. One is that there are two versions of each loop handler, one for generators and one for regular loops. The other is that loop handler can be passed an object that it would push values to yield. > While loop managers would be similar, except instead of a variable name > and iterator it would be passed a second function-like object for the > conditional and a tuple of variable names used in the conditional. > > And it has to manually do the LEGB rule to dynamically figure out where to > look up those names? > If it doesn't want to alter them, then no it just passes along the namespaces to the function-like object. If it wants to alter them, then yes. That is, what, two or three lines of code? > > Also, you realize that, even if it uses exactly the same rules as the > normal interpreter, the effect will be different, because the interpreter > applies the rule partly at compile time, not completely dynamically, right? > Yes, if someone is using this feature they would need to be aware of the effects it has. > This function-like object would return a namespace dict for the local > namespace and a boolean for the result of the conditional. Ideally this > namespace dict would be empty or None if it is identical to the input > namespace. > > Why is it returning a namespace dict? An expression can't assign to a > variable, so the namespace will always be identical. > > An expression can, of course, call a mutating method, but unless you're > suggesting another deep-copy/deep-update (or STM transaction) here, the > value is already going to be mutated, and the dict won't help anyway. > It is up to the loop manager whether to deep copy or not. It would, of course, be possible for the loop manager to determine whether anything has mutated, but I felt this approach would be more consistent. > It would also be possible to have an alternative context manager > implementation that works in the same way. It would just be passed > namespace dicts and a function-like object and return namespace dicts. > > Sure, but it would be a much, much larger change than what we currently > have, or even than what was initially proposed, and it would have to answer > most of the same questions raised here. Which may be why nobody suggested > that as an implementation for context managers in the first place. > Yes, I know. Again, this would be a side benefit, rather than the primary purpose of the proposal. > > It would take a nontrivial change to the compiler to compile the body of > the loop into a separate code object, but with assignments still counted in > the outer scope's locals list, yield expressions still making the outer > function into a generator function, etc. You'd need to invent this new > scope object type (just passing local, nonlocal, global dicts won't work > because you can have assignments inside a loop body). > > Right, this is why the loop handler is passed namespace dicts and returns > namespace dicts. Changes to any namespace will remain isolated until > everything is done and the handler can determine what to do with them. > > > Making yield, yield from, and return act on the calling function is bad > enough, but for the first two, you need some way to also resume into the > loop code later. > > I think I addressed this. > > > But that still doesn't get you anywhere near what you need for this > proposal, because your motivating example is trying to run the code in > parallel. What exactly happens when one iteration does a break and 7 others > are running at the same time? Do you change the semantics of break so it > breaks "within a few iterations", or add a way to cancel existing > iterations and roll back any changes they'd made to the scope, or...? And > return and yield seem even more problematic here. > > In these cases it would probably just raise an exception telling you you > can't use breaks or yields. > > > And, beyond the problems with concurrency, you have cross-process > problems. For example, how do you pickle a live scope from one interpreter, > pass it to another interpreter, and make it work on the first interpreter's > scope? > > That is the whole point of passing namespace dicts around. > > OK, think about this: > > I have a list a=[0]*8. > > Now, for i in range(8), I set a[i] = i**2. > > If I do that with a pool for, each instance gets its own copy of the scope > with its own copy of a, which is modifies. How does the interpreter or the > Pool.__for__ merge those 8 separate copies of a back in those 8 separate > namespaces back into the original namespace? > It would need to check if any of the elements are not the same as before (using "is"). > > > However, there are other uses as well. For example, Python has no > "do...while" structure, because nobody has come up with a clean way to do > it (and probably nobody ever will). However, under this proposal it would > be possible for a third-party package to implement a "while" loop manager > that can provide this functionality: > > > > If someone can come up with a clean way to write this do object (even > ignoring the fact that it appears to be a weird singleton global > object--unless, contrary to other protocols, this one allows you do define > the magic methods as @classmethods and then knows how to call them > appropriately), why hasn't anyone come up with a clean way of writing a > do...while structure? How would it be easier this way? > > It would be easier because it can be uglier. The bar for new statements > is necessarily much, much, much higher than for third-party packages. I > certainly wouldn't propose loop handlers solely or even primarily to allow > do...while loops, this is more of a side benefit and an example of the > sorts of variations on existing loop behaviour that would be possible. > > > > I think, like context managers, this would provide a great deal of > flexibility to the language and allow a lot of useful behaviors. Of course > the syntax and details are just strawmen examples at this point, there may > be much better syntaxes. But I think the basic idea of being able to > control a loop in a manner like this is important. > > > > The major difference between this proposal and context managers is that > you want to be able to have the loop manager drive the execution of its > suite, while a context manager can't do that; it just has __enter__ and > __exit__ methods that get called before and after the suite is executed > normally. That's how it avoids all of the problems here. Of course it still > uses a change to the interpreter to allow the __exit__ method to get called > as part of exception handling, but it's easy to see how you could have > implemented it as a source transformation into a try/finally, in which case > it wouldn't have needed any new interpreter functionality at all. > > Yes, that is why I said it was similar "in principle". The implementation > is different, but I think the concepts have a lot in common. > > But my point is that they're very different even in principle. One just > supplies functions to get called around the suite, the other tries to > control the way the suite is executed, using some kind of novel mechanism > that still hasn't been designed. > > Perhaps it is better to say this is more similar to decorators, then. -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Wed Jul 29 16:53:38 2015 From: python at mrabarnett.plus.com (MRAB) Date: Wed, 29 Jul 2015 15:53:38 +0100 Subject: [Python-ideas] Syntax for key-value iteration over mappings In-Reply-To: References: <85d1zemdnx.fsf@benfinney.id.au> Message-ID: <55B8E8F2.9030309@mrabarnett.plus.com> On 2015-07-29 07:22, Pierre Quentel wrote: > > > 2015-07-28 19:39 GMT+02:00 Petr Viktorin >: > > On Mon, Jul 27, 2015 at 6:23 AM, Ben Finney > > > wrote: > > On the other hand, parenthesizing it makes it look like an expression, > that is, something that can be part of a larger expression. > Key/value unpacking only works as a target of a "for". > > > If the proposal was accepted for "for k:v in iterable" then I suppose > that "if k:v in iterable" would also be valid, meaning that for a dict, > there is a pair (k, v) such that _dict[k] = v, and for a list that there > is an index k such that _list[k] = v. > "if k:v in iterable" is already valid syntax. It's "if k:", which is an if statement, followed by "v in iterable". > for k:v in iterable: > assert k:v in iterable > From srkunze at mail.de Wed Jul 29 18:46:22 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Wed, 29 Jul 2015 18:46:22 +0200 Subject: [Python-ideas] fork Message-ID: <55B9035E.9070504@mail.de> Hi everybody, well during the discussion of the concurrency capabilities of Python, I found this article reading worthwhile: http://chriskiehl.com/article/parallelism-in-one-line/ His statement "Mmm.. Smell those Java roots." basically sums the whole topic up for me. That is sequential code (almost plain English): for image in images: create_thumbnail(image) In order to have a start with parallelism and concurrency, we need to do the following: pool = Pool() pool.map(create_thumbnail, images) pool.close() pool.join() Not bad (considering the other approaches), but why couldn't it not look just like the sequential one, maybe like this: for image in images: fork create_thumbnail(image) What I like about the Pool concept is that it frees me of thinking about the interprocess/-thread communication and processes/threads management (not sure how this works with coroutines, but the experts of you do know). What I would like to be freed of as well is: pool management. It actually reminds me of languages without garbage-collection. Regards, Sven -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Wed Jul 29 19:00:42 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 29 Jul 2015 19:00:42 +0200 Subject: [Python-ideas] fork In-Reply-To: <55B9035E.9070504@mail.de> References: <55B9035E.9070504@mail.de> Message-ID: <9D20C464-4BE5-4EB5-AC7D-A9900A0578C6@yahoo.com> On Jul 29, 2015, at 18:46, Sven R. Kunze wrote: > > Hi everybody, > > well during the discussion of the concurrency capabilities of Python, I found this article reading worthwhile: http://chriskiehl.com/article/parallelism-in-one-line/ His statement "Mmm.. Smell those Java roots." basically sums the whole topic up for me. > > > That is sequential code (almost plain English): > > for image in images: > create_thumbnail(image) > > > In order to have a start with parallelism and concurrency, we need to do the following: > > pool = Pool() > pool.map(create_thumbnail, images) > pool.close() > pool.join() No you don't, because this isn't Java: with Pool() as pool: pool.map(create_thumbnail, images) Also note that if create_thumbnail returns a value, to assemble all the values into a list is just as simple: with Pool() as pool: thumbnails = pool.map(create_thumbnail, images) And if you want to iterate over the thumbnails as created and don't care about the order, you can just replace the map with imap_unordered. (Or, of course, you can use an executor and just iterate as_completed on the list of futures.) > Not bad (considering the other approaches), but why couldn't it not look just like the sequential one, maybe like this: > > for image in images: > fork create_thumbnail(image) To me, this strongly implies that you're actually forking a new child process (or at least a new thread) for every thumbnail. Which is probably a really bad idea if you have, say, 1000 of them. It definitely doesn't say "find/create some implicit pool somewhere, wrap this in a task, and submit it to the pool". And I'm not sure I'd want it to. What if I want to use a process pool instead of a thread pool, or to use a pool of 12 threads instead of NCPU because I know I'm mostly waiting on a particular HTTP server farm and 12x concurrency is ideal? Also, how would you extend this to return results? A statement can't have a result. And, even if this were an expression, it would look pretty ugly to do this: thumbnails = [] for image in images: thumbnails.append(fork create_thumbnail(image)) Or: thumbnails = [fork create_thumbnail(image) for image in images] And, even if you liked the look of that, what exactly could thumbnails be? Obviously not a list of thumbnails. At best, a list of futures that you'd then still have to loop over with as_completed or similar. Of course you could design a new language with implicit futures built into the core (or, even better, a two-level variable store with dataflow variables and implicit blocking) to solve this, but it would be very different from Python semantics. > What I like about the Pool concept is that it frees me of thinking about the interprocess/-thread communication and processes/threads management (not sure how this works with coroutines, but the experts of you do know). > > What I would like to be freed of as well is: pool management. It actually reminds me of languages without garbage-collection. That's a good parallel--but that's exactly what's so nice about "with Pool() as pool:". When you need a pool to be deterministically managed, this is the nicest syntax in any language to do it (except maybe C++ with its RAII, which lets you hide deterministic destruction inside wrapper objects). It's hard to see how it could be any more minimal. After all, if you don't wait on the pool to finish, and you don't collect a bunch of futures to wait on, how do you know when all the thumbnails are created? -------------- next part -------------- An HTML attachment was scrubbed... URL: From ron3200 at gmail.com Wed Jul 29 19:09:10 2015 From: ron3200 at gmail.com (Ron Adam) Date: Wed, 29 Jul 2015 13:09:10 -0400 Subject: [Python-ideas] Syntax for key-value iteration over mappings In-Reply-To: References: <85d1zemdnx.fsf@benfinney.id.au> Message-ID: On 07/29/2015 05:10 AM, Nick Coghlan wrote: > On 29 July 2015 at 16:22, Pierre Quentel wrote: >> >If the proposal was accepted for "for k:v in iterable" then I suppose that >> >"if k:v in iterable" would also be valid, meaning that for a dict, there is >> >a pair (k, v) such that _dict[k] = v, and for a list that there is an index >> >k such that _list[k] = v. >> > >> >for k:v in iterable: >> > assert k:v in iterable > This actually made me think of the quirky signatures of the dict > constructor and dict.update, where it's possible to pass in either a > mapping*or* an iterable of two-tuples: > https://docs.python.org/3/library/stdtypes.html#dict.update > > If we went with the assumption that this syntax, if added, used those > semantics, then you could reliably build (assuming hashable values) a > reverse lookup table as: > > reverse_lookup = {v:k for k:v in data_source} > > At the moment, you have to restrict your input to mappings specifically: > > reverse_lookup = {v:k for k,v in data_source.items()} > > Or an iterable of 2-tuples: > > reverse_lookup = {v:k for k,v in data_source} I'm still wondering how it would work underneath and what other things it implies. kv = "red":6 # What would this do? k:v = ("red", 6) # Would this work too? k, v = "red":6 # Or this? Currently __contains__ on dictionaries only checks the keys. So can this be made to work? k:v in D # Test for key:value pair. Currently... D[k] == v I wonder if k:v was to create an key_value object. This isn't that different than an operator/object that consumes other objects. bool = key:value(dict) # KeyValue(key, value)(dict) Or it could work like slice objects. def __contains__(self, other): if isinstance(other, KeyValue): ... ... Or maybe have it as a special case on for loops only, but is that special case special enough? > Or use duck-typing: > > if hasattr(data_source, "items"): > data_source = data_source.items() > reverse_lookup = {v:k for k,v in data_source} > > I'd still be -0 on such a proposal with dict.update iteration > semantics (as much as I think it's neat, I don't think the practical > benefit is there to justify it), but it does have the virtue of > extracting a particular iteration pattern from an existing builtin > type. I'm -0.1, but only because I think it could create confusing cases like ... if this works, then why can't this... kind of things. Cheers, Ron From abarnert at yahoo.com Wed Jul 29 19:16:33 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 29 Jul 2015 19:16:33 +0200 Subject: [Python-ideas] Fwd: Loop manager syntax References: <84F6B93A-7B9E-4838-9F19-227FBBEC03E9@yahoo.com> Message-ID: <8FD22460-87D1-46F7-81C3-779EACE45CF3@yahoo.com> On Jul 29, 2015, at 13:16, Todd wrote: > I thought it was better to first determine whether the idea had any merit at all before getting into details. If the idea turned out to be stupid then there isn't much point getting into details. I can't speak for anyone else, but I don't think the idea itself is stupid, or I wouldn't have responded at all. After all, it's not far from the way you do parallel loops via OMP in C, or in Cython. >> Meanwhile, if mutating the dicts (that is, the inner function assigning to variables) doesn't affect the outer namespace, what about mutating the values in the dict? Do you deep-copy everything into the dicts, then "deep-update" back out later? If not, how does a line like "d[k] += 1" inside the loop end up working at all (where d is a global or nonlocal)? > > The contents of the dict are deep-copied when passed to the loop manager. Whether the loop manager deep copies those or not is up to it. Well, if it's already deep-copied once, then it doesn't really matter if you deep-copy it again; it's still separate objects. That means you can't use any variables in a loop whose values aren't deep-copyable (=picklable), or you'll get some kind of exception. It also means, as I pointed out above, that you need some kind of deep-update mechanism to get all the values back into the parent scope, not just the ones directly in variables. Since Python doesn't have any such mechanism, you have to invent one. And I don't know if there is any sensible algorithm for such a thing in general. (Maybe something where you version or timestamp all the values, but even that fails the i+=1 case, which is why it's not sufficient for STM without also adding rollback and retryable commits.) And meanwhile, it means you don't have the option of delayed vs. live updates as you suggested, it means you have the option of delayed vs. no updates at all. > It may be possible to make the dicts directly represent the namespaces, and thus have changes immediately reflected in the namespace. I thought keeping things well-isolated was more important, but it may not be worth the trouble. > >> Meanwhile, this scope semantic is very weird if it interacts with the rest of Python. For example, if you call a closure that was built outside, but you've modified one of the variables that closure uses, it'll be called the with old value, right? That would be more than a little confusing. > > Where and when did you modify it? Inside the loop, for example. That presumably means you've modified the deep copy. But, assuming closures can be deep-copied at all, they'll presumably have independent copies, not copies still linked to the variables you have (otherwise they're not deep copies). Which would be very different from normal semantics. For example: def spam(x): def adder(y): return x+y for i in range(2): x=10; print(adder(i)) spam(0) This is obviously silly code, but it's the shortest example I could think of on the spur of the moment that demonstrates most of the problems. This prints 10 then 11. But if you change it to "eggs for i in range(2):", there is no way that any eggs could be implemented (with your design) that gives you the same 10 and 11, because adder is going to be deep-copied with x=0, not x as a closure cell. Or, even if x _is_ a closure cell somehow, it's not the same thing as the x in the enclosing dict passed into the function, so assigning x=10 still doesn't affect it. Again, this is a silly toy example, but it demonstrates real problems that treating scopes as just dictionaries, and deep-copying them, create. >> And finally, how does this copy-and-merge work in parallel? Let's say you do x=0 outside the loop, then inside the loop you do x+=i. So, the first iteration starts with x=0, changes it to x=1, and you merge back x=1. The second iteration starts with x=0, changes it to x=2, and you merge back x=2. This seems like a guaranteed race, with no way either the interpreter or the Pool.__for__ mechanism could even have a possibility of ending up with 3, much less guarantee it. I think what you actually need here is not copied and merged scopes, but something more like STM, which is a lot more complicated. > > Again, this would be up to the loop handler. In the case of multiprocessing, each repetition will likely get x=0, and the last value (whatever that is) will be returned at the end. The documentation would need to say that the behavior in such cases is non-deterministic and shouldn't be counted on. But that makes parallel for basically useless unless you're doing purely-immutable code. If that's fine, you might as well not try to make mutability work in the first place, in which case you might as well just do comprehensions, in which case you might as well just call the map method. A design that's much more complicated, but still doesn't add any functionality in a usable way for the paradigm use case, doesn't seem worth it. I think you _could_ come up with something that actually works by exposing a bunch of new features for dealing with cells and frames from Python, which could lead to other cool functionality (again, like pure-Python greenlets), which I think may be worth exploring. >>> In cases where one or more of the namespaces doesn't make sense the corresponding dict would be empty. The method would return three dicts containing the local, nonlocal, and global namespaces, any or all of which could be empty. Returning a non-empty dict in a case where the corresponding namespace doesn't make sense would raise an exception. The interpreter would merge these dicts back into the corresponding namespaces. >>> >>> The function-like object would be passed four dicts corresponding to the same namespaces, and would return a tuple of three dicts corresponding to the same namespaces. The interpreter would again be responsible for initializing the function-like object's namespaces with the contents of the dicts and pulling out those namespaces at the end. >>> >> >> What does this "function-like object" look like? How do you call it with these dicts in such a way that their locals, nonlocals, and globals get set up as the variables the function was compiled to expect? (For just locals and globals, you can get the code object out, and exec that instead of calling the function, as I suggested--but you clearly don't want that, so what do you want instead?) Again, this is a new feature that I think would be more complicated, and more broadly useful, than your suggested feature, so it ought to be specified. > > I am not sure I am understanding the question. Can you please explain in more detail? Can you sketch out the API of what creating and calling these function-like objects looks like? It can't be just like defining and calling a function. Extracting the code object from a function (or compiling one) and calling exec with it are closer, but still not sufficient, since function objects, and exec, only have locals and globals from the Python side (to deal with nonlocals you have to drop down to C), and since the mechanism to update the outer scope doesn't exist (without dropping down to C). But if you can look at what's already there and show what you'd need to add, even if you can't work out how to implement it as a CPython patch or anything, that could still be a very useful idea. >>> In the case of yield, the returned tuple will have one additional element for the yielded value. >>> >> >> How you do distinguish between "exited normally" and "yielded None"? > > The first would return a length-3 tuple and the second would return a length-4 tuple with the last element being "None". OK, that makes sense. > >>> The interpreter would be in charge of remembering which yield it is at, but the function-like object would still be initialized with the namespaces provided by the method. So any loop handler that allows yielding will need to be able to get the correct values in the namespace, failing to do so will raise an exception. The function-like object always has an optional argument for injecting values into the yield, but passing anything to it when the function-like object is not at a yield that accepts a value would raise an exception. >>> >> >> Well, you also need to handle throwing exceptions into the function-like object. > > I am not sure what you mean by this. Generator objects have not only __next__ and send, but also throw. >> More importantly, if you think about this, you're proposing to have something like explicit generator states and/or continuations that can be manually passed around and called (and even constructed?). Together with your manual scopes (which can at least be merged into real scopes, if not allowing real scopes to be constructed), this gives you all kinds of cool things--I think you could write Stackless or greenlets in pure Python if you had this. But again, this is something we definitely don't have today, that has to be designed before you can just assume it for a new feature. > > I considered that possibility, but it seemed overly complicated. So at least in my proposal right now there is no way to get at the generator state. But if there's no way to get at the generator state, I don't see how you can implement what you want. What is the method holding? How does it know whether it called a function or a generator, and whether it needs to act as a function or a generator? >>> Returns and breaks will be exceptions, which contain the namespaces as extra data. >>> >> >> Which, as I said, obviously means that you've created a new compiler mode that, among other things, compiled returns and breaks into raises. > > Correct, which is why I call this a "function-like object" rather than a "function". Sure, but the point is that compiling any function that has a custom loop has to switch to a new mode that had to do various things differently than just compiling a suite, or an explicit function, or a comprehension. >>> How to deal with yields, breaks, and returns is up to the class designer. There is no reason all loop handlers would need to handle all possible loop behaviour. It would be possible to catch and re-raise break or return exceptions, or simply not handle them at all, in cases where they shouldn't be used. Similarly, a class could simply raise an exception if the function-like object tries to yield anything if yielding didn't make sense. >>> >> >> But how does the class know whether, e.g., yielding from the calling function makes sense? It doesn't know what function(s) it's going to be called from. > > I am not understanding the question. Either there is a sane way to deal with yields, in which case it would do so, or there isn't, in which case it wouldn't. Can you name a situation where allowing a yield would be sane in some cases but not in others? Yielding from inside a loop is a very common thing to do, so I think any proposal had better be able to handle it. The problem is that whether a function is a generator function or not is determined at compile time, not run time. A __for__ method can't know at compile time whether the loop suites (function-like objects) that will be passed to it will yield or not, or whether the functions that will call it will yield or not. (Especially since the same method may get called once with a suite that yields, and once with a suite that doesn't.) So, how can you write something that works? Either your method isn't a generator function, and therefore it can't do anything useful if the function-like object gives it a yield value, or it is, in which case you have the opposite problem. Or you need some new way to implement the generator protocol in a way that's explicit (i.e., doesn't actually use generators) in the middle layers but transparent at the compiler level and then again at the level of calling the function with the loop in it. Providing explicit generator state objects isn't the only way to solve the problem, but it is _a_ way to solve it, and it could be useful elsewhere. If you have something simpler than works, it may be worth doing that something simpler instead, but I can't think of anything. >> And, again, _how_ can it handle yielding if it decides to do so? You can't write a method that's sometimes a generator function and sometimes not (depending on whether some function-like object returns a special value or not). > > There are a couple of ways I thought of to deal with this. One is that there are two versions of each loop handler, one for generators and one for regular loops. The other is that loop handler can be passed an object that it would push values to yield. The latter would be very weird, because generators are driven from the outside, pull-based, and the inner function-like object is also going to be pull-based; putting a push-based interface in between will be misleading or confusing at best. >>> While loop managers would be similar, except instead of a variable name and iterator it would be passed a second function-like object for the conditional and a tuple of variable names used in the conditional. >>> >> >> And it has to manually do the LEGB rule to dynamically figure out where to look up those names? > > If it doesn't want to alter them, then no it just passes along the namespaces to the function-like object. If it wants to alter them, then yes. That is, what, two or three lines of code? But it's not two or three lines of code. I'm not even sure it's doable at all, e.g., because, again, assignments are handled mostly at compile time, not run time. >> Also, you realize that, even if it uses exactly the same rules as the normal interpreter, the effect will be different, because the interpreter applies the rule partly at compile time, not completely dynamically, right? > > Yes, if someone is using this feature they would need to be aware of the effects it has. Having loop modifiers that break all kinds of standard loop features and change the basic semantics of variable lookup doesn't seem like a good addition to the language. If it were _possible_ to do things right, but also possible to do them weirdly, you could just file that under "consenting adults", but if any loop manager that can be written is guaranteed to break a large chunk of Python and there's no way around that, I don't think it's a feature anyone would want. >>> This function-like object would return a namespace dict for the local namespace and a boolean for the result of the conditional. Ideally this namespace dict would be empty or None if it is identical to the input namespace. >>> >> Why is it returning a namespace dict? An expression can't assign to a variable, so the namespace will always be identical. >> >> An expression can, of course, call a mutating method, but unless you're suggesting another deep-copy/deep-update (or STM transaction) here, the value is already going to be mutated, and the dict won't help anyway. > > It is up to the loop manager whether to deep copy or not. It would, of course, be possible for the loop manager to determine whether anything has mutated, but I felt this approach would be more consistent. What use are you envisioning for taking a dict and returning a different dict, other than a deep copy, and a subsequent deep update (and who does that deep update if not the loop manager)? >>> > And, beyond the problems with concurrency, you have cross-process problems. For example, how do you pickle a live scope from one interpreter, pass it to another interpreter, and make it work on the first interpreter's scope? >>> >>> That is the whole point of passing namespace dicts around. >>> >> >> OK, think about this: >> >> I have a list a=[0]*8. >> >> Now, for i in range(8), I set a[i] = i**2. >> >> If I do that with a pool for, each instance gets its own copy of the scope with its own copy of a, which is modifies. How does the interpreter or the Pool.__for__ merge those 8 separate copies of a back in those 8 separate namespaces back into the original namespace? > > It would need to check if any of the elements are not the same as before (using "is"). It's going to walk the whole chain of everything reachable from the dict? I'm not sure how you even do that in general (you could make assumptions about how the copy protocol is implemented for each type, but if those assumptions were good enough, we wouldn't have the whole copy protocol in the first place...). And what is it comparing them to? If it's got deep copies--which it will--everything is guaranteed to not the same (except maybe for ints and other simple immutable objects that the interpreter is allowed to collapse, but in that case it's just a coincidence). >>> > > However, there are other uses as well. For example, Python has no "do...while" structure, because nobody has come up with a clean way to do it (and probably nobody ever will). However, under this proposal it would be possible for a third-party package to implement a "while" loop manager that can provide this functionality: >>> > >>> > If someone can come up with a clean way to write this do object (even ignoring the fact that it appears to be a weird singleton global object--unless, contrary to other protocols, this one allows you do define the magic methods as @classmethods and then knows how to call them appropriately), why hasn't anyone come up with a clean way of writing a do...while structure? How would it be easier this way? >>> >>> It would be easier because it can be uglier. The bar for new statements is necessarily much, much, much higher than for third-party packages. I certainly wouldn't propose loop handlers solely or even primarily to allow do...while loops, this is more of a side benefit and an example of the sorts of variations on existing loop behaviour that would be possible. >>> >>> > > I think, like context managers, this would provide a great deal of flexibility to the language and allow a lot of useful behaviors. Of course the syntax and details are just strawmen examples at this point, there may be much better syntaxes. But I think the basic idea of being able to control a loop in a manner like this is important. >>> > >>> > The major difference between this proposal and context managers is that you want to be able to have the loop manager drive the execution of its suite, while a context manager can't do that; it just has __enter__ and __exit__ methods that get called before and after the suite is executed normally. That's how it avoids all of the problems here. Of course it still uses a change to the interpreter to allow the __exit__ method to get called as part of exception handling, but it's easy to see how you could have implemented it as a source transformation into a try/finally, in which case it wouldn't have needed any new interpreter functionality at all. >>> >>> Yes, that is why I said it was similar "in principle". The implementation is different, but I think the concepts have a lot in common. >>> >> >> But my point is that they're very different even in principle. One just supplies functions to get called around the suite, the other tries to control the way the suite is executed, using some kind of novel mechanism that still hasn't been designed. > Perhaps it is better to say this is more similar to decorators, then. But it's not really similar to decorators either. A decorator is just like any normal higher-order function. It takes a function, it returns a function, that's it. No monkeying with scopes or capturing and resuming yields or anything of the sort you're proposing. And that's my point: you seem to think your proposal is just a simple thing that will fit on top of Python 3.5, but it actually relies on a whole slew of new things that still need to be invented before it makes sense. Or, to put it another way, if you intentionally designed a proposal to demonstrate why Scheme continuations and environments are cooler than anything Python has, I think it would look a lot like this... (And the usual counter is that Python doesn't have everything to let you implement custom control flow at the language level because allowing that is actually a _bad_ thing for readability, not a feature.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Wed Jul 29 22:07:03 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 29 Jul 2015 16:07:03 -0400 Subject: [Python-ideas] fork In-Reply-To: <55B9035E.9070504@mail.de> References: <55B9035E.9070504@mail.de> Message-ID: On 7/29/2015 12:46 PM, Sven R. Kunze wrote: > Hi everybody, > > well during the discussion of the concurrency capabilities of Python, I > found this article reading worthwhile: > http://chriskiehl.com/article/parallelism-in-one-line/ I found this very helpful. > That is sequential code (almost plain English): > > for image in images: > create_thumbnail(image) Write this more succinctly as map(create_thumbnail, images) > In order to have a start with parallelism and concurrency, we need to do > the following: > > pool = Pool() > pool.map(create_thumbnail, images) > pool.close() > pool.join() and define def pmap(func, iterable, *args, **kwargs): pool = Pool(*args, **kwargs) pool.map(func, iterable) pool.close() pool.join() then the replacement requires only 1 char. pmap(create_thumbnail, images) This is, of course, limited to making exactly one .map call and closing, but if this is the common case, it might be sensible to request that this be added to multiprocessing (and m.dummy) as a utility function. > Not bad (considering the other approaches), but why couldn't it not look > just like the sequential one, maybe like this: > > for image in images: > fork create_thumbnail(image) An new keyword, which is a pain it itself, cannot take arguments. Blogger Chris Kiehl why they are needed. > What I like about the Pool concept is that it frees me of thinking about > the interprocess/-thread communication and processes/threads management A keyword would not offer the choice of threads versus processes. > What I would like to be freed of as well is: pool management. Then use the wrapper function above. -- Terry Jan Reedy From mistersheik at gmail.com Wed Jul 29 20:08:52 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Wed, 29 Jul 2015 11:08:52 -0700 (PDT) Subject: [Python-ideas] Additional datetime-like module In-Reply-To: References: <55B7E1BD.8030509@mgmiller.net> Message-ID: +1 for one unified module. On Wednesday, July 29, 2015 at 12:49:26 AM UTC-4, Lennart Regebro wrote: > > On Wed, Jul 29, 2015 at six forty-two AM, Matthias Bussonnier > < Bussonnie ... @ gmail.com > wrote: > > Hi, > > > > For whatever reason Strangely That make me think of bytes vs unicode > strings ... > > Where you just s / simple case / str / s; s / robust case / unicode / g > and > > s / timezone / encoding / g > > > > Than being said, a new module Might Allow Flexibility in more playing > > with the API, like Delorean in [1] for example. > > Two datetime modules with different API is definitely not a good idea. > I do think a new module is Necessary, but it shouldnt completely replace > not just datetime, but Also time. Maybe even calendar. There is > overlap (and some Cruft) Between these modules, and it can be hard to > understand exactly what function to use When, etc. > > // Lennart > ______________________________ _________________ > Python-ideas mailing list Python ... @ Python.org Https://mail.python.org/ > mailman / Listinfo > / python-ideas > Code of Conduct: Http://python.org/psf/ > Codeofconduct / > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Jul 30 09:24:50 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 30 Jul 2015 17:24:50 +1000 Subject: [Python-ideas] fork In-Reply-To: <9D20C464-4BE5-4EB5-AC7D-A9900A0578C6@yahoo.com> References: <55B9035E.9070504@mail.de> <9D20C464-4BE5-4EB5-AC7D-A9900A0578C6@yahoo.com> Message-ID: On 30 July 2015 at 03:00, Andrew Barnert via Python-ideas wrote: > On Jul 29, 2015, at 18:46, Sven R. Kunze wrote: >> What I would like to be freed of as well is: pool management. It actually >> reminds me of languages without garbage-collection. > > > That's a good parallel--but that's exactly what's so nice about "with Pool() > as pool:". When you need a pool to be deterministically managed, this is the > nicest syntax in any language to do it (except maybe C++ with its RAII, > which lets you hide deterministic destruction inside wrapper objects). It's > hard to see how it could be any more minimal. After all, if you don't wait > on the pool to finish, and you don't collect a bunch of futures to wait on, > how do you know when all the thumbnails are created? asyncio offers a persistent thread-or-process pool as part of the event loop (defaulting to a thread pool). Using the call_in_background() helper from http://www.curiousefficiency.org/posts/2015/07/asyncio-tcp-echo-server.html, you can write: for image in images: call_in_background(create_thumbnail, image) And if you actually want to do something with the thumbnails: futures = [call_in_background(create_thumbnail, image) for image in images] for thumbnail in run_in_foreground(asyncio.gather(futures)): # Do something with the thumbnail Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From barry at python.org Fri Jul 31 23:37:28 2015 From: barry at python.org (Barry Warsaw) Date: Fri, 31 Jul 2015 17:37:28 -0400 Subject: [Python-ideas] Briefer string format References: <55AC2EDF.7040205@mgmiller.net> <55AC3425.5010509@mgmiller.net> Message-ID: <20150731173728.4b346a02@anarchist.wooz.org> On Jul 19, 2015, at 04:35 PM, Mike Miller wrote: >I've long wished python could format strings easily like bash or perl do, ... >and then it hit me: > > csstext += f'{nl}{key}{space}{{{nl}' > >An "f-formatted" string could automatically format with the locals dict. You might take a look at a feature of flufl.i18n, which supports automatic substitutions from locals and globals: http://flufli18n.readthedocs.org/en/latest/docs/using.html#substitutions-and-placeholders In general, flufl.i18n builds on PEP 292 $-strings and gettext to support more i18n use cases, especially in multi-language contexts. Still, the substitution features can be more or less used independently, and do a lot of what you're looking for. This feature is only supported on implementations with sys._getframe() though (e.g. CPython). Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From barry at python.org Fri Jul 31 23:55:34 2015 From: barry at python.org (Barry Warsaw) Date: Fri, 31 Jul 2015 17:55:34 -0400 Subject: [Python-ideas] A different format for PI? References: Message-ID: <20150731175534.66f0aeb8@anarchist.wooz.org> On Jul 23, 2015, at 01:26 PM, Ryan Gonzalez wrote: >Python-ideas archives: https://mail.python.org/pipermail/python-ideas/ > >Google Groups forum: https://groups.google.com/forum/m/#!forum/python-ideas Gmane is also invaluable, if you have an NNTP reader[1]. Cheers, -Barry [1] Like my favorite, claws-mail -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: